Monday, August 30, 2004

In Defence Of DataSets

The DataSet vs business objects debate has flared up on the project I am working on, with pro-business object lobby pushing for the removal of all DataSet traces from the system. Up front I should declare my preference - I am pretty sold on the benefits of DataSets, and while I wouldn't go to the same extent as Adam Cogan ("There are only two type of programmers - those that use DataSets, and those that which they did"), I'd want to be sure that the motivations for ditching DataSets where solid before they got the cut.

The argument against DataSets in this case come down to two main factors:

  • They are Microsoft-specific, and don't play well with other technologies and platforms. This criticism is entirely valid, and I don't disagree with it, as far is it goes. The counter-argument is that wrapping DataSet-centric systems with business object facades so that they play well in the SOA world is certainly possible, and while it isn't easy to accommodate all the semantics of DataSets with a business objects, it is a bridge that can be crossed. For the case of a service that provides data from a database, this is essentially what the anti-DataSet crowd are suggesting that we do from the start, so if it is something that can be accomplished ahead of time, there is no reason that it can't be achieved just in time.
  • DataSets perform worse than business objects. A number of benchmarks exist that show using business objects can result in greater throughput than DataSets. Any hand-crafted type or algorithm is going to perform better than a general-purpose equivalent. Performing the comparison is valid, because you want to understand the cost of the general-purpose solution. It is critical to interpret the raw results of performance comparisons intelligently - is the performance cost of the general-purpose solution offset by the other features it offers, and are those other features (which include having the code already built) worth the cost?

    Before putting the raw performance figures to bed, it is worth addressing two of the DataSet's performance issues - the serialization/ deserialization cost of persisting schema as well as data, and the fatness of the bits sent across the wire caused by the schema. By changing the custom tool associated with DataSets from MSDataSetGenerator to XsdCodeGen, it is a pretty simple task to get rid of the schema, Sharing schema information out-of-band with WSDL or project references is fine in many situations, so the loss of schema information in every persisted DataSet instance is not a drama.

    Small DataSets are typical for the project in discussion, so a test case of a single Order with three OrderDetails children from Northwind was chosen to test the performance improvement that could be won by removing the schema from the persisted format. Benchmarking showed that the schema-free DataSets where five times quicker to serialize and three time quicker to deserialize, with about half as much data transmitted over the wire.

    Given the potential performance issues and Microsoft-specific nature of DataSets, why bother with them? To me, DataSets have the following benefits or features that are either impossible, difficult or tedious to achieve with a business object framework:

  • Developer familiarity. All this isn't a show-stopper for business object frameworks, it should not be under-estimated.
  • No need to bridge the object-relational Impedance Mismatch bridge. This is a huge one. Look at the success of Object Spacing at crossing this bridge.
  • Good designer support in VS.NET.
  • In built support for concurrency management, and the ability to retrieve only data changes.
  • The ability to merge two sets of data that share the same schema. Important for data-binding when data is being updated from external sources, as it means that you don't have to do a full re-bind every time this occurs.
  • In-built filtering support with DataView
  • Data query capabilities ("give me all the employess who joined before 1 Mar 2000")
  • Support for storing error information inline with the data (SetColumnError)
  • Excellent binding capabilities, both at runtime and design time.
  • Rich (if slightly imperfect) eventing infrastructure.
  • Support for any-relation navigation. Object graphs typically only offer parent-child (or child only) navigation.
  • Loss-less persistance with the XML Serializer.
  • Rich in-built XML support (with the help of XmlDataDocument)
  • Ability to extract type information (in the form of an XSD) without needing to use the reflection API.
  • Ability to merge and split an arbitrary number of "instance graphs" together for storage or transport.


    Post a Comment

    << Home