Thursday, July 19, 2012

Ordinals vs. Column Names on DataReader

There is some contention on whether you should use the name of a column versus hard coding the ordinal when using a DataReader to retrieve data from a procedure call. While using the name is more easily maintained and agnostic on the order of columns, there is an inherent overhead with using the column name. While I understand the performance hit of using the named column, I find the value in code readability and column ordering independence worth it.  While, the hit is negligible on a single row result set, the compounding nature may result in a performance change worth avoiding. I came up with a neat little trick that can be used prior to looping through the result set of a list return.  I created a helper class that will give you the ordinals back locally and allow them to be used on each loop.

public Dictionary<String, Int32> GetOrdinals(IDataReader reader)
{
   Dictionary<String, Int32> retVal = new Dictionary<String, Int32>();
   for (int i = 0; i < reader.FieldCount; i++)
   {
     retVal.Add(reader.GetName(i), i);
   }
   return retVal;
}
 
This enables us to hit the reader row one time and bring back a local variable containing all of the column names as the key of a dictionary with ordinal as a value.  At this point, you can simply use the keyed reference as follows and get the best of both worlds when retrieving large result set.

Boolean hasData = reader.Read();
List<Contract> listItems = new  List<Contract>();
Dictionary<String,Int32> ordinals = null;
if (hasData)
{
   ordinals = GetOrdinals(reader);
}
 
while (hasData)
{
   Contract item = new Contract();            
   item.ID =  Convert.ToInt64(reader[ordinals["ID"]]);
   item.Name = reader[ordinals["Name"]].ToString();
   listItems.Add(item);
   hasData = reader.Read();
}

This method would be a performance hit when using against a single row return value. And the break-even point of a large result set may take a few rows to gain back the hit of the first loop. However, it is a nice little trick to get the best of both worlds when hitting a large query.

7 comments:

  1. I think I would rather choose the ordinal by property name.

    Add a ColumnName attribute to each property.
    In GetOrdinals, match the column name to your property's columnname attribute and add the property name to the ordinals list. Then reference by property name.

    You could even completely automate this. And if the column does not match a property it would just ignore it or throw a nice exception.

    ReplyDelete
  2. We still need to match our column names with what comes back in each Get procedure though.

    ReplyDelete
  3. Yeah, you would still have to know the column names from the procedures. You could completely automate this as you have stated. At some point, you will have to know the columns coming back from your query, either by setting the property on your contract, or by knowing the name when you ask with your reader. The main idea was to become agnostic to the ordering of the columns in the return so that you could use one contract and load from one reader method with multiple queries to populate it. If you wanted to get fancy with it, you could use the reader, the ordinals loop and the column names from it to create a dynamic type. Not sure about the real world use case for that, but it would be kind of cool, in an academic way.

    ReplyDelete
  4. It just seems cleaner to me to use an attribute for the column name in the datacontract definition and the property name in the GetFromReader method.

    ReplyDelete
  5. I'm not sure I agree with this Dictionary method of storing the ordinals. The idea of resolving the ordinal of a column name before entering the loop is to increase the performance of filling your data contract. Putting these ordinals into a Dictionary causes you to have to resolve the location of the ordinal within the Dictionary each iteration of the loop that fills your data contract. This would lead back to degraded performance. That would seem like we're adding complexity to it while not gaining much (if any) performance vs doing the reader["ColumnName"] method. While declaring an Int32 variable to store each columns ordinal does decrease code readability due to more lines, it will give the top performance of any method.

    ReplyDelete
  6. This is fun discussion. I did some timings and research to see the the reality of what we are talking about here. Came away with a new view of the entire thing.

    ReplyDelete
  7. I agree with Todd, the string search for the column name is what causes the slowness. String searches are just slllllooooowww. I have profiling results that show the benefits of using ordinals very clearly. I'll send those on to Jim.

    ReplyDelete