Julie Lerman's DevLife

DevLife Part I [May 2005 - March 2007]

My Links

Blog Stats

News

A blog for DevSource.com.

This blog was originally part of the blogs.ziffdavis.com site from May 2005 through June 2007 when the blog was moved to the Movable Type blog engine and hosted at blog.devsource.com/devlife.
The original blog was eventually shut down and I was given the posts so that I could host them on my own site.


Archives

Investigating DataTable indexing in ADO.NET 2.0 and a surprise

I was fiddling with some [unofficial] benchmarking of DataTable.Select and DataView to look at the differences in performance. And because I have heard Pablo Castro (ADO.NET Team Technical Lead) talk many times about using MILLIONS of rows, I duplicated my database table and made sure it was populated with over 1 million records.

I was very surprised by the results of my experiments because they demonstrate something different than what I was expecting.

The DataTable.Select is pretty flexible because it can take a column expression with a variety of operators as it's select expression (=, >, <, etc.) whereas a DataView is for straight comparison with =.  You can use the rowfilter of the DataView to get more flexibility with  your operators, but the DataView will perform two indexes on the table that way. So I was curious about the impact on the performance of the DataTable.Select.

Note that select  indexes on the fly and doesn't retain it. This is obvious when I repeatedly do the same selection “projectid>2000“. The time does not decrease. After I created the DataView, with a sort on the "projectid" field which I am using in the Select, I expected the Select to be much faster, based on the fact that the DataView creates a permanent index. But it wasn't faster at all. There are probably specific conditions that I am not meeting to get this performance gain. I will have to find out what they are.

What DID make a huge impact on my Select, though, was doing an extra Select using an = operator.

In the screenshot below you can see my first test with the DataView created in the middle. Then in my second test, you can see that after I do a select with "equals", my greater than select is 90% faster! And the straight equals select screams after that: only 17 milliseconds to find 10,000 rows out of over one million! Holy Canoli! This is before I do the DataView. The times do not decrease after the DataView.

I'm going to continue to dig into this and compare it to using primary keys. But it would be pretty interesting if it turns out that doing an extra select might be a good prescription for enhancing performance on repeated in-cache querying.

Peter Bromberg has a good article on In-Memory Data Caching for Performance for some further reading.

Maybe when the snow returns to Vermont I'll find a better way to while away the hours on a Sunday afternoon.

posted on Sunday, February 19, 2006 2:11 PM