Reading Data Reader Secrets
Dino Esposito
April 12, 2001
When talking about ADO.NET, you can hardly ignore ADO. If not for other reasons, you end up mentioning ADO just because its name is hard-coded in some of the .NET exclusive classes for data access.
In its simplest form, ADO.NET is just what the name suggests: ADO adapted to .NET. While this definition is true, it does require some further explanation.
ADO and ADO.NET have different design centers, but both attempt to provide the same logical set of functionality. It's only that in doing so that each looks after its own design pattern. As a result, you have two rather disjointed APIs with consequent issues in terms of backward compatibility.
However, looking beyond the immediate differences of syntax and semantics, you cannot help but recognize a common substrate that is the foundation of the most recent Microsoft data access strategy.
In my previous column, I explained that .NET Managed providers are the next step in the Darwinistic evolution of OLE DB according to UDA natural laws. Likewise, ADO.NET is the next step in the natural evolution of ADO—evolution always intended in the Darwinistic sense of the word.
Evolution is about adaptation in a new environment. The new environment is .NET, so ADO requires modified behavior and attitude to fit in the new computing scenarios. ADO is not the only software technology that is undergoing such an evolutionary change. It is in good company with OLE DB, COM(+), MSMQ, and ASP.
Reading Records with ADO
ADO was introduced and thrived in a time-frame when connected and (mostly) two-tier applications were moving to a more layered and Web-based architecture. The overall ADO object model design reflects this transition.
The Web forced the adoption of a disconnected model where more and more information is downloaded and cached on the client. Within ADO, you find two souls hidden just under the skin of the Recordset object.
The Recordset can leverage server cursors and do what it is supposed to do being connected to the data source all the time. At the same time, another facet of the same object can work disconnected and build up a correct representation of records in two ways. Either it can fetch rows from the data source and then drop the connection, or it can read all the needed information from a client disk file, including an XML file with a fixed schema.
The disconnected facilities of ADO have been bolted on the Recordset object interface, paying careful attention to make connected and disconnected functionalities available through equivalent APIs. As a result, the Recordset became hard to wield, rather overwhelming and with significant housekeeping.
Anyway, in ADO you can get data in three ways:
- Through full-time connection using server cursors.
- By the means of a fast, read-only, forward-only mechanism specifically thought to read through and process a set of records.
- Getting a static snapshot of data, processing them disconnected from the data source and submitting changes at a later time.
Whatever way you work, you always use the Recordset object and choose the operating mode by selecting the proper cursor type and location. (See the ADO documentation for more details.)
An obvious drawback is that no matter how optimized and well-designed the Recordset object is, you load in memory more stuff than you actually need. In addition, it adds an extra layer of code to process input parameters and possibly fixes their values. Given the overall programming interface of ADO, this is an absolute necessity as well as a performance hit.
ADO.NET simplifies the data reading infrastructure by applying the old "divide et impera" motto. It introduces two new objects that don't have particularly familiar names but expose certainly well-known functionalities. They are the DataReader and the DataSet.
The DataReader object is the ADO.NET counterpart of the read-only, forward-only default ADO cursor. The DataSet is a container that can be programmatically filled with static snapshot of data. In this sense, it can be seen as a repository of disconnected recordsets. There's no Recordset component in the ADO.NET object model, but the DataTable object is the .NET double of a disconnected recordset.
In ADO.NET there's no explicit support for server cursors even though, in a future version, you can expect to see ADO.NET to support server cursors when the data provider is SQL Server 7.0 or later.
If you need server cursors tody, just import the ADO type library in your .NET application and code through it. To learn more on this good, bad and ugly, see the previous column, ADO Rocks and Rolls in .NET Applications.
ADO.NET is designed around XML to work seamlessly in highly interoperable and disconnected scenarios. From this point of view, it is certainly more appropriate than ADO for the breed of applications that the majority of people are writing and planning today. It simplifies the coding and optimizes the working of data access components that have Web-based clients.
It behaves well also when you have a Web-enabled database server such as SQL Server 2000 with the XMLhttp support.
Despite the actual syntax differences, the programming philosophy behind ADO and ADO.NET is aligned as much as possible to reduce the learning curve for data access developers and the quantity of new concepts they have to familiarize with.
Signs of an ADO Presence
More often than not, you can see signs of a supernatural ADO presence in several ADO.NET code snippets. Want an example? Let's consider a piece of ADO code that, although in slightly different flavors, runs buried in the body of thousands of ASP pages and middle-tier components.
Set oCN = Server.CreateObject("ADODB.Connection")
oCN.Open strConn
Set oCMD = Server.CreateObject("ADODB.Command")
Set oCMD.ActiveConnection = oCN
Set oRS = oCMD.Execute(strCmd)
In this fragment, you explicitly create a Connection and a Command object, and when this executes, you are returned a new Recordset object. The Connection object contains information about the desired cursor type and location. If you choose the adOpenStatic type of cursor, and necessarily the client-side location, you can safely close the connection and walk through the records. Otherwise, you keep the connection open until you finish navigating through the fetched rows.
Once you have the recordset, you scroll it using a loop as follows:
While Not oRS.EOF
Response.Write(oRS("lastname") & "<BR>")
oRS.MoveNext
Wend
This type of code won't work as-is in ADO.NET. However, the ADO.NET DataReader object lets you write code that, at least functionally speaking, is nearly identical.
First off, you create a special object to govern the execution of the command. The base .NET class is DBCommand. You normally don't use this class; you'd use one of the more specialized .NET classes like ADOCommand and SQLCommand or any user-defined derived class. DBCommand represents a command that the data source can understand.
The SQLCommand class embeds the functionality of SQL Server commands while ADOCommand wraps any access made the COM way against any OLE DB provider. These objects have the same programming interface and a number of constructors. One of the most frequently used is the following:
Dim oCMD As New SQLCommand(strCmd)
You also need to associate a connection object to a command. Like with ADO, this can be done both explicitly and implicitly. In the former case, you usually create an ADOConnection or a SQLConnection object depending on the data provider you want to target. ADOConnection addresses OLE DB providers through the COM Interop bridge, whereas SQLConnection connects to a SQL Server registration.
Dim oCN As New SQLConnection(strConn)
By using a different constructor for the DBCommand-based object you can force the ADO.NET classes to create an implicit connection object that will be exposed through the DBCommand's ActiveConnection property.
Dim oCMD As New SQLCommand(strCmd, strConn)
oCMD.ActiveConnection.Open()
It goes almost without saying that using an explicit connection object allows you to reuse the same object instance repeatedly in the code. You can use the same object variable to connect to different data sources, thus slightly reducing the memory footprint of the application.
Whichever way you create the connection object, bear in mind that you always must open it explicitly and then close it.
oCMD.ActiveConnection.Open()
If you're going to launch a stored procedures, let the ADO.NET runtime know about that by setting the CommandType property.
oCMD.CommandType = CommandType.StoredProcedure
The ADO.NET code snippet that matches the ADO code discussed earlier looks like this:
Dim oCN As New SQLConnection(strConn)
Dim oCMD As New SQLCommand(strCmd, oCN)
Dim oDR As SQLDataReader = Nothing
oCMD.ActiveConnection.Open()
oCMD.Execute(oDR)
While(oDR.Read)
Response.Write(oDR["lastname"].ToString() + "<br>")
End While
Despite the presence of objects with different names and a different set of methods and properties, the logical schema matches closely to what happens in ADO.
In this ADO.NET fragment, the SQLDataReader object plays the role of the ADO Recordset although it lacks a lot of the Recordset functionalities. SQLDataReader is simply good at scrolling data in a forward-only way. However, for such a little function you now have a little object.
Furthermore, the SQLDataReader object is more programmer-friendly because it automatically moves the record pointer one position forward upon reading. In other words, the Read method incorporates a call to the equivalent of the Recordset's MoveNext method. Consider that the default position is on the beginning of the stream and prior to the first record. You always need to call Read before accessing any data field.
So the long and short of it is that all the existing ADO code should be rewritten unless you want to build a COM bridge to it. However, the new concepts to learn are minimal and ADO developers are, as such, already more than halfway to ADO.NET.
ADO.NET Data Readers
In ADO.NET you have two flavors of data reader objects: SQLDataReader and ADODataReader. Both derive from DBDataReader—an abstract class defined in the System.Data.Internal namespace.
Such a class defines the very basic behavior for components that bind to a data source over a connection and let you scroll forward to the set of fetched rows. The picture below shows how ILDASM renders the class interface.
Figure 1. The DBDataReader class
As mentioned earlier, the .NET Framework creates two classes from DBDataReader: ADODataReader and SQLDataReader. The former manages data access that is routed through OLE DB. The latter, instead, is aimed to the SQL Server managed provider.
Here is how the two classes are declared in C#:
public sealed class ADODataReader : DBDataReader, IDataReader,
IDataRecord
public sealed class SQLDataReader : DBDataReader, IDataReader,
IDataRecord, ISQLDataRecord
Notice the use of the new sealed attribute that seals the class off with respect to inheritance.
For completeness, let's review how the same declaration looks in Visual Basic .NET:
NotInheritable Public Class ADODataReader
Inherits DBDataReader
Implements IDataReader, IDataRecord
NotInheritable Public Class SQLDataReader
Inherits DBDataReader
Implements IDataReader, IDataRecord, ISQLDataRecord
The Visual Basic .NET declaration is a bit more informative because it makes use of different keywords for logically different tasks: Implements and Inherits. It clearly shows that the data reader classes inherit from the rather minimal DBDataReader class but implement two or more specific interfaces.
Both ADODataReader and SQLDataReader cannot be inherited from. Is there a reason why?
Data readers are merely objects that provide a fast mechanism of reading a forward-only stream of data records out of a certain data source. Regardless of the data source, a data reader object is expected to expose the same, immutable programming interface. Enabling further inheritance may lead to specialized data readers that may have hidden some functions in the programming interface.
This design is particularly helpful if you need to write a data reader for a custom data source. You just inherit from DBDataReader and implement a couple of interfaces.
If you need to modify the way in which SQL Server tables are scrolled, then you can always resort to aggregation. Basically, you create a data reader class that internally hosts an instance of the standard SQLDataReader object and utilizes such methods to outfit its own data reader interface.
Figure 2. Using aggregation to "inherit" from a sealed class
What makes a data reader class unique and very special is the set of interfaces it exposes. Such interfaces are IDataReader and IDataRecord. In addition, the SQL Server data reader implements ISQLDataRecord which exposes SQL Server specific data types.
It should be clear by now that data readers are very special objects. Not because they have more or less features than other .NET classes, but because they obey particular design rules. For example, data readers can be created only through the Execute method of a DBCommand-derived object. However, they cannot be created through the direct use of the constructor. Once again, the reason must be found in the role that they play in the ADO.NET architecture. Data readers must be the immutable interface between connected data consumers and a specific data provider.
Why do you need such an immutable interface (like in OLE DB—sound familiar?—see Medidating on OLE DB and .NETdata03222001) to fetch records? Because data readers work connected! In fact, data sets—which are disconnected objects—aren't sealed classes and can be fabricated by constructors.
While the data reader is in use, the associated connection object turns out to be busy. While the connection object is serving the data reader, the only operation you can perform is dropping the connection using the data reader's Close method.
OLE DB and SQL Server data readers (ADODataReader and SQLDataReader classes) are extremely slim objects that avoid creating volatile internal objects.
Basically, if your client makes repeated calls for reading data (that is, it calls the Item property or the GetString method in the loop), then it always gets passed data through the same object. For this reason, data accessed through a data reader must be considered read-only or needs to be manipulated with extreme care and through intermediate objects.
The IDataReader Interface
The IDataReader interface is quite simple and provides two levels of navigation: by result sets and by rows.
IDataReader lets you navigate through the various results originated by a stored procedure that executes and returns more than one SELECT. The data reader's default positioning is on the first result. Use NextResult to move forward and HasMoreResults to check for any result left.
Once the data reader internal pointer is on one specific result, you can navigate its rows. Use Read to jump to the next row and Close to terminate the reading. HasMoreRows is the property that informs you of any row left between the current position and the end of the stream.
As you can see, the methods of this interface cover only the navigation model. To drill down into the single record you take advantage of another interface—IDataRecord.
The IDataRecord Interface
This interface is designed to let you access the single pieces of information available for the row. The property that you'll be using more often is certainly Item—the indexer for the data reader object. You invoke this property whenever you access a specific field by name:
Dim oDR As SQLDataReader = Nothing
oCMD.Execute(oDR)
While(oDR.Read)
Response.Write(oDR["lastname"].ToString() + "<br>")
End While
FieldCount is the property that gets the number of fields at the current record.
The rest of the interface is made of a long list of GetXXX methods where XXX stands for a type. So you have methods like GetString, GetInt32, and so on. GetInt32, for example, returns the 32-bit signed integer value of the specified field. The field can be identified by position or by name.
oDR[0].ToInt32()
oDR["lastname"].ToInt32()
You can also read the data through a generic value of type Object. In this case, use GetValue.
A particular method that deserves mention is the GetData method. Basically, if the field recursively points to a .NET object that implements IDataReader, GetData returns such an object.
The SQLDataReader object also implements ISQLDataReader—an interface that adds a bunch of other GetXXX methods specific of SQL Server data types. For example, you have a GetSQLGuid to process a SQLGuid type.
Summary
I started this column by comparing the data reading features in ADO and ADO.NET. Of the three methods you have in ADO to read records, only two are "natively" available in ADO.NET and through different objects than in ADO.
ADO.NET doesn't implement (not yet anyway) server cursors, but it lets you read a forward-only stream of data records from a data source in a rather efficient way. In addition, the support for disconnected sets of data has been significantly improved.
In this article I mostly focused on the data reader objects available for OLE DB and .NET Managed providers. In upcoming columns, I'll have more to say about DataSets and server cursors.
Dialog Box: What Have I Been Doing for the Last 2 Years?
What? OLE DB is not usable in Visual Basic or ASP? What have I been doing for the last 2 years? Have you ever heard of a connection string? Only C++ers can use OLE DB? Don't tell that to all my VB DLLs!
If "OLE DB cannot be used from VB or ASP, what have I been doing for the last two years?" Uhm, looks like another existential question.
For one thing, the incriminating sentence in the previous column contains an extra adverb that is key here: directly.
I guess that for the last two years you've just been using OLE DB consumers through ADO. In particular, you've been using connection strings to identify your target OLE DB provider. Great. This can be easily done in Visual Basic® and ASP. Better yet, ADO has been invented just for this reason!
ADO, in fact, is an automation wrapper built on top of OLE DB raw calls with the clear purpose of allowing the use of OLE DB from script-based environments, including Visual Basic.
OLE DB is a COM-based protocol that allows data-driven communications between a consumer (client) and a provider (server). By COM-based, I mean that in order to use it directly, without intermediation, you should be:
- Creating an instance of the COM component representing the provider.
- Querying for the interface that renders the data source.
- Calling a method on that interface capable to create a session and return an appropriate interface.
- Using the session interface to create a command and get another, even more appropriate, interface.
- Using the command interface to issue the command and grab the IRowset interface, which is the closest OLE DB allows you to go towards ADO recordsets.
At this point, holding an IRowset interface pointer, you're still quite far from the content of a record field. You must define an accessor element—basically, a buffer with a well-known layout—and figure out a way to fill it up with actual data.
This is real OLE DB. And we all agree it can be a nightmare.
It's rather complex and annoying even in C++, where querying for interfaces and managing pointers and types is not subject to restrictions.
In principle, you could think of porting all this direct call mechanism to raw Visual Basic code. In practice, you wouldn't. Indeed, I don't think someone ever really tried to do that. And you can't just do that from within an ASP page.
When the programming interface is so complex, wrappers are the only reasonable way to go. So wrapping layers proliferate with features strictly depending on the language they've primarily been designed for is the way to go about accopmplishing this task.
So ADO is a COM automation wrapper optimized for Visual Basic and ASP and ATL consumer classes in Visual Studio® 6.0 are a good pain-killer for C++ developers.
Using C++ to call into the raw interfaces exposed by the OLE DB provider is the fastest way to get and set your data. I don't actually know whether reliable statistics exist to illustrate the impact of the ADO layer with respect to OLE DB data access. Based on my personal experience, C++ data components natively accessing OLE DB providers (with or without ATL wrappers) usually go about twice as fast as Visual Basic components using ADO.
On the other hand, writing C++ code is certainly more error-prone and you can always try to minimize the performance hit by employing more powerful hardware. (Countless real world projects solved the issue in this way.)
Summarizing, C++ is the language of OLE DB, but ADO makes it available to Visaul Basic and ASP applications in an easy and powerful way.
With .NET managed providers, you can finally do your data access using the language with which you feel most comfortable.
Dino Esposito is Wintellect's ADO.NET expert and a trainer and consultant based in Rome, Italy. Dino is a contributing editor to MSDN Magazine and writes the Cutting Edge column. He also regularly contributes to Developer Network Journal and MSDN News. Dino is the author of the upcoming Building Web Solutions with ASP.NET and ADO.NET from Microsoft Press, and the cofounder of http://www.vb2themax.com/. You can reach Dino at dinoe@wintellect.com.