C++/CLI is a self-contained, component-based dynamic programming language that, like C# or Java, is derived from C++. Unlike those languages, however, we have worked hard to integrate C++/CLI into ISO-C++, using the historical model of evolving the C/C++ programming language to support modern programming paradigms. You can say that C++/CLI is to C++ as C++ is to C. More generally, you can view the evolution leading to C++/CLI in the following historical context:
- BCPL (Basic Computer Programming Language)
- B (Ken Thompson, original UNIX work)
- C (Dennis Ritchie, adding type and control structure to B)
- C with Classes (~1979)
- C84 (~1984)
- Cfront, release E (~1984-to universities)
- Cfront, release 1.0 (1985-to the world )—20th birthday
- Multiple/Virtual Inheritance (MI) programming (~1988)
- Generic Programming (~1991) (templates)
- ANSI C++/ ISO-C++ (~1996)
- Dynamic Component programming (~2005) (C++/CLI)
What is C++/CLI?
C++/CLI represents a tuple. C++ refers, of course, to the C++ programming language invented by Bjarne Stroustrup at Bell Laboratories. It supports a static object model that is optimized for the speed and size of its executables. However, it doesn't support run-time modification of the program other than heap allocation. It allows unlimited access to the underlying machine, but very little access to the types active in the running program and no real access to the associated infrastructure of that program. Herb Sutter, a former colleague of mine at Microsoft and the chief architect of C++/CLI, refers to C++ as a concrete language.
CLI refers to the Common Language Infrastructure, a multitiered architecture supporting a dynamic component programming model. In many ways, this represents a complete reversal of the C++ object model. A runtime software layer, the virtual execution system, runs between the program and the underlying operating system. Access to the underlying machine is fairly constrained. Access to the types active in the executing program and the associated program infrastructure—both as discovery and construction—is supported. The slash (/) represents a binding between C++ and the CLI. The details surrounding this binding make up the general topic of this column.
So, a first approximation of an answer to what is C++/CLI is that it is a binding of the static C++ object model to the dynamic component object model of the CLI. In short, it is how you do .NET programming using C++ rather than C# or Visual Basic®. Like C# and the CLI itself, C++/CLI is undergoing standardization under the European Computer Manufacturers Association (ECMA) and eventually under ISO.
The common language runtime (CLR) is the Microsoft version of the CLI that is specific to the Windows® operating system. Similarly, Visual C++® 2005 is the implementation of C++/CLI.
As a second approximation of an answer, I would say that C++/CLI integrates the .NET programming model within C++ in the same way as, back at Bell Laboratories, we integrated generic programming using templates within the then existing C++. In both of these cases your investment in an existing C++ codebase and in your existing C++ expertise are preserved. This was an essential baseline requirement of the design of C++/CLI.
Learning C++/CLI
There are three aspects in the design of a CLI language that hold true across all languages: a mapping of language-level syntax to the underlying Common Type System (CTS), the choice of a level of detail to expose the underlying CLI infrastructure to the manipulation of the programmer, and the choice of additional functionality to provide, beyond that supported directly by the CLI.
The first item is largely the same across all CLI languages. The second and third items are where one CLI language distinguishes itself from another. Depending on the kinds of problems you need to solve, you'll choose one or another language, or possibly combine multiple CLI languages. Learning C++/CLI involves understanding each of these aspects of its design.
Mapping C++/CLI to the CTS?
It is important when programming C++/CLI to learn the underlying CTS, which includes these three general class types:
- The polymorphic reference type, which is what you use for all class inheritance.
- The non-polymorphic value type, which is used for implementing concrete types requiring runtime efficiency, such as the numeric types.
- The abstract interface type, which is used for defining a set of operations common to a set of either reference or value types that implement the interface.
This design aspect, the mapping of the CTS to a set of built-in language types, is common across all CLI languages although, of course, the syntax varies in each CLI language. So, for example, in C#, you would write
abstract class Shape { ... } // C#
to define an abstract Shape base class from which specific geometric objects are to be derived, while in C++/CLI you write
ref class Shape abstract { ... }; // C++/CLI
to indicate the exact same underlying CLI reference type. The two declarations are represented exactly the same way in the underlying IL. Similarly, in C#, you write
struct Point2D { ... } // C#
to define a concrete Point2D class, while in C++/CLI you write:
value class Point2D { ... }; // C++/CLI
The family of class types supported with C++/CLI represents an integration of the CTS with the native facilities, and that determines your choice of syntax. For example:
class native {};
value class V {};
ref class R {};
interface class I {};
The CTS also supports an enumeration class type that behaves somewhat differently from the native enumeration, and support is provided for both of those as well:
enum native { fail, pass };
enum class CLIEnum : char { fail, pass};
Similarly, the CTS supports its own array type that again behaves differently from the native array. And again Microsoft provides support for both:
int native[] = { 1,1,2,3,5,8 };
array<int>^ managed = { 1,1,2,3,5,8 };
No CLI language is closer to or more nearly a mapping to the underlying CTS than another. Rather, each CLI language represents a view into the underlying CTS object model.
CLI Level of Detail
The second design aspect that must be considered when designing a CLI language is the level of detail of the underlying CLI implementation model to incorporate into the language. What kind of problems will the language be tasked to solve? Does the language have the tools necessary to do this? Also, what sort of programmers is the language likely to attract?
Take, for example, the issue of value types occurring on the managed heap. Value types can find themselves on the managed heap in a number of circumstances:
- Through implicit boxing, when an instance of a value type is assigned to an Object or when a virtual method is invoked through a value type that is not overridden.
- When that value type is serving as a member of a reference class type.
- When that value type is being stored as the element type of a CLI array.
Whether the programmer should be allowed to manipulate the address of a value type of this sort is a CLI language design consideration that must be addressed.
What Are the Issues?
Any object located on the managed heap is subject to relocation during the compaction phase of a sweep of the garbage collector. Any pointers to that object must be tracked and updated by the runtime; the programmer cannot manually track it herself. Therefore, if you were allowed to take the address of a value type that might be on the managed heap, there would need to be a tracking form of pointer in addition to the existing native pointer.
What are the trade-offs to consider? On the one hand, there's simplicity and safety. Directly introducing support in the language for either one or a family of tracking pointers makes it a more complicated language. By not supporting this, the available pool of programmers is expanded because less sophistication is required. In addition, allowing the programmer access to these ephemeral value types increases the possibility of programmer error—she may purposely or inadvertently do dangerous things to the memory. By not supporting tracking pointers, a potentially safer runtime environment is created.
On the other hand, efficiency and flexibility must be considered. Each time you assign the same Object with a value type, a new boxing of the value occurs. Allowing access to the boxed value type allows in-memory update, which may provide significant performance improvements. Without a form of tracking pointer, you cannot iterate over a CLI array using pointer arithmetic. This means that the CLI array cannot participate in the Standard Template Library (STL) iterator pattern and work with the generic algorithms. Allowing access to the boxed value type allows significant design flexibility.
Microsoft chose to provide a collection of addressing modes that handle value types on the managed heap in C++/CLI:
int ival = 1024;
int^ boxedi = ival;
array<int>^ ia = gcnew array<int>{1,1,2,3,5,8};
interior_ptr<int> begin = &ia[0];
value struct smallInt { int m_ival; ... } si;
pin_ptr<int> ppi = &si.m_ival;
The typical C++/CLI developer is a sophisticated system programmer tasked with providing infrastructure and organizationally critical applications that serve as the foundation over which a business builds its future. She must address both scalability and performance concerns and must therefore have a system-level view into the underlying CLI. The level of detail of a CLI language reflects the face of its programmer.
Complexity is not in itself a negative quality. Human beings are more complicated than single-cell bacteria, and that is certainly not a bad thing. However, when the expression of a simple concept is made complicated, that is usually considered to be a bad thing. In C++/CLI, the CLI team has tried to provide an elegant way to express complex subject matter.
Additional Functionality
A third design aspect is a language-specific layer of functionality above and beyond what is directly supported by the CLI. This may require a mapping between the language-level support and the underlying implementation model of the CLI. In some cases, this just isn't possible because the language cannot intercede with the behavior of the CLI. One example of this is the virtual function resolution in the constructor and destructor of a base class. To reflect ISO-C++ semantics in this case would require a resetting of the virtual table within each base class constructor and destructor. This is not possible because virtual table handling is managed by the runtime and not by the individual language.
So this design aspect is a compromise between what would be preferable to do, and what is feasible. The three primary areas of additional functionality that are provided by C++/CLI are the following:
- A form of Resource Acquisition is Initialization (RAII) for reference types, in particular, to provide an automated facility for what is referred to as deterministic finalization of garbage- collected types that hold scarce resources.
- A form of deep-copy semantics associated with the C++ copy constructor and copy assignment operator; however, these semantics could not be extended to value types.
- Direct support of C++ templates for CTS types in addition to the CLI generic mechanism. In addition, a verifiable version of the STL for CLI types is provided.
Let's look at a brief example: the issue of deterministic finalization. Before the memory associated with an object is reclaimed by the garbage collector, an associated Finalize method, if present, is invoked. You can think of this method as a kind of super-destructor since it is not tied to the program lifetime of the object. This is called finalization. The timing of just when or even whether a Finalize method is invoked is undefined. This is what is meant by nondeterministic finalization of the garbage collector.
Nondeterministic finalization works well with dynamic memory management. When available memory gets sufficiently scarce, the garbage collector kicks in and solves the problem. Nondeterministic finalization does not work well, however, when an object maintains a critical resource such as a database connection, a lock of some sort, or perhaps native heap memory. In this case, it would be great to release the resource as soon as it is no longer needed. The solution that is currently supported by the CLI is for a class to free the resources in its implementation of the Dispose method of the IDisposable interface. The problem here is that Dispose requires an explicit invocation, and therefore is not likely to be invoked.
A fundamental design pattern in C++ is the aforementioned Resource Acquisition is Initialization, which means that a class acquires resources within its constructor. Conversely, a class frees its resources within its destructor. This is managed automatically within the lifetime of the class object.
Here's what reference types should do in terms of the freeing of scarce resources:
- Use the destructor to encapsulate the necessary code for the freeing of any resources associated with the class.
- Have the destructor invoked automatically, tied with the lifetime of the class object.
The CLI has no notion of the class destructor for a reference type. So the destructor has to be mapped to something else in the underlying implementation. Internally, then, the compiler performs the following transformations:
- The class has its base class list extended to inherit from the IDisposable interface.
- The destructor is transformed into the Dispose method of IDisposable.
That represents half of the goal. A way to automate the invocation of the destructor is still needed. A special stack-based notation for a reference type is supported; that is, one in which its lifetime is associated within the scope of its declaration. Internally, the compiler transforms the notation to allocate the reference object on the managed heap. With the termination of the scope, the compiler inserts an invocation of the Dispose method—the user-defined destructor. Reclamation of the actual memory associated with the object remains under the control of the garbage collector. Figure 1 shows an example.
C++/CLI is not just an extension of C++ into the managed world. Rather, it represents a fully integrated programming paradigm similar in extent to the earlier integration of the multiple inheritance and generic programming paradigms into the language. I think the team has done an outstanding job.
So, What Did You Say About C++/CLI?
C++/CLI represents an integration of native and managed programming. In this iteration, that has been done through a kind of separate but equal community of source-level and binary elements, including Mixed mode (source-level mix of native and CTS types, plus a binary mix of native and CIL object files), Pure mode (source-level mix of native and CTS types, all compiled to CIL object files), Native classes (can hold CTS types through a special wrapper class only), and CTS classes (can hold native types only as pointers). Of course, the C++/CLI programmer can also choose to program in the CLI types alone, and in this way provide verifiable code that can be hosted, for example, as a stored procedure in SQL Server™ 2005.
So, returning to the question, what is C++/CLI? It is a first-class entry visa into the .NET programming model. With C++/CLI, there is a C++ migration path not just for the C++ source base but for C++ expertise as well. I find great satisfaction in that.
Send your questions and comments to purecpp@microsoft.com.