Friday, July 27, 2012

Ordinals vs Column Names - Part Deux

Since the first post turned into a discussion of readability versus performance, maintainability versus speed and other very worthwhile topics, I decided to put this entire process to a road test.  I created unit tests that called a DAL services basically retrieving an entire table in each of the three presented methodologies.  To get the output, I enabled our handy-dandy unit test timer to trap the performance of the DAL list functions. In order for this to be good science, I restarted the database 3 times over the course of the testing and ran the tests in different orders each time. For retrieving just under 17000 rows consisting of 38 columns, here are the results and averages for the test run (values in Milliseconds).


Run  String  Dictionary Ordinal String Order Dictionary Order Ordinal Order Difference
1 2705 2469 2505 1 3 2 236
2 2648 2575 2506 2 3 1 142
3 2669 2520 2501 2 1 3 168
4 2570 2648 2445 3 1 2 203
5 2577 2566 2397 3 1 2 180
6 2450 2646 2575 3 1 2 196
7 2555 2606 2479 2 1 3 127
8 2843 2621 2446 1 2 3 397
9 2628 2485 2388 1 2 3 240
10 2580 2525 2400 3 1 2 180
Total 2622.50 2566.10 2464.20 0.21

So, at the end of the day, here is what I learned.  It really doesn't matter enough to worry about it.  Sure, the ordinals are consistently faster than the dictionary and the dictionary is faster than string, but the variance is hardly worth worrying about.  When retrieving 17000 rows, the average difference from the best performance to the worst performance was a matter of 2/10 of one second.  I think I got wrapped up in a common distraction know as micro-optimization, where we spend a ton of mental energy and rewrite code to gain a millisecond on a routine. Yes, you should care about performance, and yes, faster code should always be your goal, but there comes a point in time where it stops being worth the development time.  Yes, we should avoid the blatantly poor performing code mistakes that everyone knows about. But after that, we should be equally worried about the scalability, portability, maintainability and readability of our code. We should ask if saving 2/10 second while retrieving 17000 rows is the matter to discuss, or should we be discussing why we would ever be retrieving 17000 often enough in our application to have 2/10 second be an issue.  Yes, I started the discussion, thinking it was a creative way to gain performance.  Honestly, I enjoy trying to work through things like this too, which fed the distraction.  I believe now that these mental exercises should always be framed with realistic improvement potential weighed against the time and effort of the pursuit. At that point, only pursue until it starts becoming a net loss of productivity.  On this particular subject, after doing this research, count me firmly in the "It really doesn't matter, aim at code readability and object design" camp.

No comments:

Post a Comment