Object Persistence in MFC
By Fritz Onion
Published in C++ Report, Nov/Dec 1995 issue.
In the last column, we looked at how MFC uses its own implementation of runtime-type information to implement a document/view framework supporting multiple document types. In this column, we will look at object persistence in MFC, where the runtime class structures are again used to create new class instances. Using a simple class hierarchy and data structure, we will walk through the steps of adding persistence to an application, and gain an understanding of exactly where and how MFC will ultimately write our objects to disk. Finally, we will look at how to add persistence to classes which are not derived from MFC's CObject class (like an STL container class).
MFC provides support for object persistence through a process called serialization. Each class supporting persistence provides its own Serialize() function which is responsible for writing the class' data members to, and reading them back from a file. Typically, a class supporting serialization is derived from the CObject class, where the Serialize() function is defined as a member function. Deriving from the CObject class allows the class to take advantage of several MFC serialization features (like version checking, automatic object construction, and redundant storage checking), but it is also possible to add serialization to non-CObject derived classes, as we shall see later in this column.
Implementing the Serialize() function for a class involves determining what data associated with the class needs to be stored so that its entire state can be restored from disk. Sometimes this is as simple as storing a few integer values, or perhaps a small array. It becomes more difficult if a class contains instances of other classes as data members, or even worse, pointers to instances of other classes as data members. Fortunately, MFC provides support for making this task a relatively easy one, in almost all cases. We'll start by looking at the CArchive class, which is basically a type-safe wrapper for file i/o.
The way in which a class writes its data to a file is through an object called an archive. A reference to an instance of the CArchive class is passed into each class' Serialize() function:
void Serialize(CArchive& ar);
Each archive that is created contains a pointer to a file, and as objects are passed into the archive, it writes them to the file. In a document/view application, the document class contains the top-level Serialize() function which is called whenever files need to be saved or restored (in response to the File menu's save or open commands, for example). It is up to the programmer to implement this function to properly store or retrieve all of the objects contained in the application.
As an example of how to implement object persistence, consider a class used to represent employees at a company. To keep things simple, we will only give employees names and ages, and we will leave out the interface functions:
class CEmployee : public CObject {
pubic:
// Interface
private:
CString m_name;
WORD m_age;
protected:
virtual void Serialize(CArchive& ar);
};
The implementation of the Serialize() function for this class might look like:
void CEmployee::Serialize(CArchive& ar)
{
if (ar.IsStoring())
ar << m_name << m_age;
else
ar >> m_name >> m_age;
}
An archive is created either as reading or storing, so in a Serialize() function, one typically checks to see which type of archive has been passed in by calling the IsStoring() function. The stream operators have been overloaded to read data from and write data to the archive. These stream operators are overloaded to work with most primitive types
[1]
, and for many MFC utility classes (like CString).
Now, any time we need to store an instance of the CEmployee class to disk, or read one in from disk, we simply call its Serialize() function with an archive that has been initialized to write to (or read from) a file. Suppose, for example, we had an array of CEmployee objects in our document class:
class CMyDocument : public CDocument
{
...
CEmployee m_employees[100];
...
};
We could write this array to (or read it from) disk with the following code in our document's Serialize() function:
void CMyDocument::Serialize(CArchive& ar)
{
for (int i=0;i<100;i++)
m_employees[i].Serialize(ar);
}
As we have seen in the previous example, adding persistence to a class is relatively easy using the archive class and the Serialize() function. In our example, however, we only considered storing and retrieving instances of our class. If we look instead at storing pointers to instances of our class, the picture becomes more complex. For example, consider a class to represent a manager in our hypothetical company. Since a manager is herself an employee, we would derive the class from our CEmployee class, and since a manager usually has several employees reporting to her, we might add a list of subordinates (CEmployees) as a data member:
class CManager : public CEmployee
{
public:
// Interface
private:
CList<CEmployee*, CEmployee*> m_subordinates;
};
Suppose further, that we change the fixed array of CEmployee objects in our document class to be a dynamic linked list:
class CMyDocument : public CDocument
{
...
CList<CEmployee*, CEmployee*> m_employees;
...
};
After our program has been running for a while, our document's data structure might look something like the structure shown in .
Figure 1
We no longer have the simple serialization process of writing data to and reading data from a fixed sized array, but are faced with the challenge of saving and restoring dynamically allocated objects of a recursive data structure. There are two primary issues to face when adding persistence to a pointer-based data structure: 1) how and where should the elements of the data structure be re-allocated, and 2) how to ensure that only one copy of each object is written to disk.
In an application, dynamic data structures often grow based on input to the application. When the data structure is being read back in from disk, each element of the data structure has to be re-allocated, and filled in with data stored on disk. In our example data structure, we our now storing dynamically allocated instances of our CEmployee and CManager classes in two linked lists. In order to properly store and retrieve this data structure, we are going to have to decide how to re-allocated these objects.
MFC hides this object construction in the stream operators of the archive class. When a pointer to an object is passed into an archive, MFC stores information about what type of class that object is an instance of. Then when it reads it back in from disk, it allocates a new instance of the appropriate class, based on the type association stored with the object itself.
In order to provide this ability of dynamically constructing instances of classes from type information, MFC adds its own version of runtime-type information (actually, a superset of the C++ language RTTI) through the DECLARE_SERIAL and IMPLEMENT_SERIAL macros. These macros fill in the same CRuntimeClass structure we looked at in the last column, but they also overload the stream operators of the archive class for this class type, so that when an instance of this class is passed into (or extracted from) an archive, its class information is stored (extracted) as well. Thus, by adding the DECLARE_SERIAL macro to a class definition and the IMPLEMENT_SERIAL macros to the class implementation file, an pointer to an instance of that class can be stored and re-allocated from an archive.
The second issue to be dealt with, is that of objects being pointed to by more than one pointer. It would be incorrect to write an object twice if it were really only stored once when the application saved the data. So the issue is, how to make sure that pointers to objects write the object they are pointing to only once to disk, and that all objects pointing to that object have their pointers re-aligned to the new object once it is read in from disk. This issue arises in our example data structure because in addition to being pointed to by our document's list, a CEmployee object may also be pointed to by a CManager's list.
MFC deals with this in its archive class by maintaining a hash table of all objects being written to disk, and if the same object is passed to the archive to be written more than once, it will not store it each time, but it will instead store a unique number associated with that object so that the pointer may be re-assigned correctly when the object is read back in.
So, through runtime class structures and intelligent redundancy-prevention implemented through a hash table, MFC successfully deals with these two issues of writing and reading pointer-based data structures to and from disk. Our employee and manager classes implemented with complete serialization support, would look like:
class CEmployee : public CObject
{
public:
// Interface
private:
CString m_name;
WORD m_age;
protected:
virtual void Serialize(CArchive& ar);
DECLARE_SERIAL(CEmployee)
};
class CManager : public CEmployee
{
public:
// Interface
private:
CList<CEmployee*, CEmployee*> m_subordinates;
protected:
void Serialize(CArchive& ar);
DECLARE_SERIAL(CManager)
};
IMPLEMENT_SERIAL(CEmployee, CObject, 1)
IMPLEMENT_SERIAL(CManager, CEmployee, 1)
// This function is a helper function for the template
// collection class CList, telling it how to store objects
// of type CEmployee*
void SerializeElements(CArchive& ar, CEmployee** pElements, int nCount)
{
for (int i=0;i < nCount; i++) {
if (ar.IsStoring())
ar << pElements[i];
else
ar >> pElements[i];
}
}
void CEmployee::Serialize(CArchive& ar)
{
if (ar.IsStoring())
ar << m_name << m_age;
else
ar >> m_name >> m_age;
}
void CManager::Serialize(CArchive& ar)
{
CEmployee::Serialize(ar);
m_subordinates.Serialize(ar);
}
and the document's Serialize() function would look like:
void CMyDocument::Serialize(CArchive& ar)
{
m_employees.Serialize(ar);
}
One other topic which should be touched on relating to serialization, is that of the schema numbers. The schema number is another field of the CRuntimeClass structure, which is used to version your class definition. The schema number is specified for each class as the third parameter to the IMPLEMENT_SERIAL macro. Whenever a class changes the way that it stores its data to disk, it should have its associated schema number bumped up by one to indicate so. MFC uses this information to verify that a file being read in is being read by the same program that wrote it out, and every time an instance of a class with an associated schema number is passed into an archive, its schema number is stored as well (only once per type of class). If the schema numbers for an object do not match when the object is being read in, MFC will throw an exception, and notify the user that the file cannot be read. For backwards compatibility with object formats, there is also a way to create versionable schema numbers by setting the VERSIONABLE_SCHEMA bit in a schema number.
Most of the persistence features of MFC that we have looked at have assumed that a class was derived from CObject, but it is often useful to add persistence to classes which can not be derived from CObject. For example, if you are using another class hierarchy that is not MFC-aware, you may want to store instances of classes from that hierarchy within the MFC serialization framework. Unfortunately, it is not possible to add the runtime class information for a non CObject-derived class in MFC, so the features of pointer storing/reconstructing, redundancy checking, and object versions are not available. If you are only working with instances of your classes, and not pointers to instances, then these features are not needed anyway, so there is no loss in support.
As an example of adding serialization to a non CObject-derived class, we will replace the MFC CList class with the standard template library (STL) list class. Our approach will be to add a template helper function called SerializeList() which takes a reference to an archive and a list, and stores the list into (or reads it from) the archive. Using this technique, our classes would now look like:
// No change to the CEmployee class
class CManager : public CEmployee
{
public:
// Interface
private:
list<CEmployee*> m_subordinates;
protected:
void Serialize(CArchive& ar);
DECLARE_SERIAL(CManager)
};
IMPLEMENT_SERIAL(CEmployee, CObject, 1)
IMPLEMENT_SERIAL(CManager, CEmployee, 1)
void CEmployee::Serialize(CArchive& ar)
{
if (ar.IsStoring())
ar << m_name << m_age;
else
ar >> m_name >> m_age;
}
// Template helper function used to serialize instances of
// the list class
template <class T>
void SerializeList(CArchive& ar, list<T>& l)
{
if (ar.IsStoring()) {
// Must cast to a DWORD because archives don't
// support integers
ar << (DWORD)l.size();
for (list<T>::iterator i = l.begin();
i != l.end(); i++)
ar << *i;
}
else {
DWORD size;
ar >> size;
T newObject;
while (size--) {
ar >> newObject;
l.push_back(newObject);
}
}
}
void CManager::Serialize(CArchive& ar)
{
CEmployee::Serialize(ar);
SerializeList(ar, m_subordinates);
}
MFC provides support for object persistence through its serialization process and its archive objects. By adding a pair of macros to a CObject-derived class, and implementing that class' Serialize() function, instances (and even pointers to instances) of that class can easily be stored to or read from disk. Classes that are not derived from CObject can still be integrated into the serialization process, but cannot take advantage of the pointer reconstruction, redundancy checking, and versioning that CObject-derived classes enjoy.
[1]
The one notable exception being ints, which need to be cast either to WORDs or DWORDs for cross-platform compatibility
from: http://www.pluralsight.com/articlecontent/cpprep1295.htm