FAQs in section [28]:
[28.1] What is value and/or reference semantics, and which
is best in C++?
With reference semantics, assignment is a pointer-copy (i.e., a
reference). Value (or "copy") semantics mean assignment copies the
value, not just the pointer. C++ gives you the choice: use the assignment
operator to copy the value (copy/value semantics), or use a pointer-copy to
copy a pointer (reference semantics). C++ allows you to override the
assignment operator to do anything your heart desires, however the default
(and most common) choice is to copy the value.
Pros of reference semantics: flexibility and dynamic binding (you get dynamic
binding in C++ only when you pass by pointer or pass by reference, not when you
pass by value).
Pros of value semantics: speed. "Speed" seems like an odd benefit for a
feature that requires an object (vs. a pointer) to be copied, but the fact of
the matter is that one usually accesses an object more than one copies the
object, so the cost of the occasional copies is (usually) more than offset by
the benefit of having an actual object rather than a pointer to an object.
There are three cases when you have an actual object as opposed to a pointer to
an object: local objects, global/static objects, and fully contained member
objects in a class. The most important of these is the last ("composition").
More info about copy-vs-reference semantics is given in the next FAQs. Please
read them all to get a balanced perspective. The first few have intentionally
been slanted toward value semantics, so if you only read the first few of the
following FAQs, you'll get a warped perspective.
Assignment has other issues (e.g., shallow vs. deep copy) which are not covered
here.
[ Top | Bottom | Previous section | Next section ]
[28.2] What is "virtual data," and how-can / why-would I use it in
C++?
virtual data allows a derived class to change the exact class of a base
class's member object. virtual data isn't strictly "supported" by C++,
however it can be simulated in C++. It ain't pretty, but it works.
To simulate virtual data in C++, the base class must have a pointer to the
member object, and the derived class must provide a new object to be
pointed to by the base class's pointer. The base class would also have one
or more normal constructors that provide their own referent (again via new),
and the base class's destructor would delete the referent.
For example, class Stack might have an Array member object (using a
pointer), and derived class StretchableStack might override the base
class member data from Array to StretchableArray. For this to work,
StretchableArray would have to inherit from Array, so Stack would have an
Array*. Stack's normal constructors would initialize this Array* with a
new Array, but Stack would also have a (possibly protected:)
constructor that would accept an Array* from a derived class.
StretchableArray's constructor would provide a new StretchableArray
to this special constructor.
Pros:
- Easier implementation of StretchableStack
(most of the code is inherited)
- Users can pass a StretchableStack as a kind-of Stack
Cons:
- Adds an extra layer of indirection to access the Array
- Adds some extra freestore allocation overhead
(both new and delete)
- Adds some extra dynamic binding overhead
(reason given in next FAQ)
In other words, we succeeded at making our job easier as the
implementer of StretchableStack, but all our users pay
for it. Unfortunately the extra overhead was imposed on both users of
StretchableStack and on users of Stack.
Please read the rest of this section. (You will not get a balanced
perspective without the others.)
[ Top | Bottom | Previous section | Next section ]
[28.3] What's the difference between virtual data and
dynamic data? 
[Recently renamed "subclass" to "derived class" (on 7/00). Click here to go to the next FAQ in the "chain" of recent changes.]
The easiest way to see the distinction is by an analogy with virtual
functions: A virtual member function means the
declaration (signature) must stay the same in derived classes, but the definition
(body) can be overridden. The overriddenness of an inherited member function
is a static property of the derived class; it doesn't change dynamically throughout
the life of any particular object, nor is it possible for distinct objects of
the derived class to have distinct definitions of the member function.
Now go back and re-read the previous paragraph, but make these substitutions:
- "member function" > "member object"
- "signature" > "type"
- "body" > "exact class"
After this, you'll have a working definition of virtual data.
Another way to look at this is to distinguish "per-object" member functions
from "dynamic" member functions. A "per-object" member function is a member
function that is potentially different in any given instance of an object, and
could be implemented by burying a function pointer in the object; this pointer
could be const, since the pointer will never be changed throughout the
object's life. A "dynamic" member function is a member function that will
change dynamically over time; this could also be implemented by a function
pointer, but the function pointer would not be const.
Extending the analogy, this gives us three distinct concepts for data members:
- virtual data: the definition (class) of the member object is
overridable in derived classes provided its declaration ("type") remains the same,
and this overriddenness is a static property of the derived class
- per-object-data: any given object of a class can instantiate a
different conformal (same type) member object upon initialization (usually a
"wrapper" object), and the exact class of the member object is a static
property of the object that wraps it
- dynamic-data: the member object's exact class can change
dynamically over time
The reason they all look so much the same is that none of this is "supported"
in C++. It's all merely "allowed," and in this case, the mechanism for faking
each of these is the same: a pointer to a (probably abstract) base class. In
a language that made these "first class" abstraction mechanisms, the difference
would be more striking, since they'd each have a different syntactic variant.
[ Top | Bottom | Previous section | Next section ]
[28.4] Should I normally use pointers to freestore allocated
objects for my data members, or should I use "composition"?
Composition.
Your member objects should normally be "contained" in the composite object (but
not always; "wrapper" objects are a good example of where you want a
pointer/reference; also the N-to-1-uses-a relationship needs something like a
pointer/reference).
There are three reasons why fully contained member objects ("composition") has
better performance than pointers to freestore-allocated member objects:
- Extra layer of indirection every time you need to access the member
object
- Extra freestore allocations (new in constructor, delete in
destructor)
- Extra dynamic binding (reason given below)
[ Top | Bottom | Previous section | Next section ]
[28.5] What are relative costs of the 3 performance hits associated
with allocating member objects from the freestore?
The three performance hits are enumerated in the previous FAQ:
- By itself, an extra layer of indirection is small
potatoes
- Freestore allocations can be a performance issue (the performance of
the typical implementation of malloc() degrades when there are many
allocations; OO software can easily become "freestore bound" unless you're
careful)
- The extra dynamic binding comes from having a pointer rather than an
object. Whenever the C++ compiler can know an object's exact class,
virtual function calls can be
statically bound, which allows inlining. Inlining allows zillions
(would you believe half a dozen :-) optimization opportunities such as
procedural integration, register lifetime issues, etc. The C++ compiler can
know an object's exact class in three circumstances: local variables,
global/static variables, and fully-contained member objects
Thus fully-contained member objects allow significant optimizations that
wouldn't be possible under the "member objects-by-pointer" approach. This is
the main reason that languages which enforce reference-semantics have
"inherent" performance challenges.
Note: Please read the next three FAQs to get a balanced perspective!
[ Top | Bottom | Previous section | Next section ]
[28.6] Are "inline virtual" member functions ever actually
"inlined"?
Occasionally...
When the object is referenced via a pointer or a reference, a call to a
virtual function cannot be inlined, since the
call must be resolved dynamically. Reason: the compiler can't know which
actual code to call until run-time (i.e., dynamically), since the code may be
from a derived class that was created after the caller was compiled.
Therefore the only time an inline virtual call can be inlined is when the
compiler knows the "exact class" of the object which is the target of the
virtual function call. This can happen only when the compiler has an actual
object rather than a pointer or reference to an object. I.e., either with a
local object, a global/static object, or a fully contained object inside a
composite.
Note that the difference between inlining and non-inlining is normally
much more significant than the difference between a regular function
call and a virtual function call. For example, the difference between a
regular function call and a virtual function call is often just two extra
memory references, but the difference between an inline function and a
non-inline function can be as much as an order of magnitude (for zillions of
calls to insignificant member functions, loss of inlining virtual functions
can result in 25X speed degradation! [Doug Lea, "Customization in C++," proc
Usenix C++ 1990]).
A practical consequence of this insight: don't get bogged down in the endless
debates (or sales tactics!) of compiler/language vendors who compare the cost
of a virtual function call on their language/compiler with the same on
another language/compiler. Such comparisons are largely meaningless when
compared with the ability of the language/compiler to "inline expand" member
function calls. I.e., many language implementation vendors make a big stink
about how good their dispatch strategy is, but if these implementations don't
inline member function calls, the overall system performance would be
poor, since it is inlining not dispatching that has the greatest
performance impact.
Note: Please read the next two FAQs to see the other side of this coin!
[ Top | Bottom | Previous section | Next section ]
[28.7] Sounds like I should never use reference
semantics, right?
Wrong.
Reference semantics are A Good Thing. We can't live without pointers. We just
don't want our s/w to be One Gigantic Rats Nest Of Pointers. In C++, you can
pick and choose where you want reference semantics (pointers/references) and
where you'd like value semantics (where objects physically contain other
objects etc). In a large system, there should be a balance. However if you
implement absolutely everything as a pointer, you'll get enormous speed
hits.
Objects near the problem skin are larger than higher level objects. The
identity of these "problem space" abstractions is usually more
important than their "value." Thus reference semantics should be used for
problem-space objects.
Note that these problem space objects are normally at a higher level of
abstraction than the solution space objects, so the problem space objects
normally have a relatively lower frequency of interaction. Therefore C++ gives
us an ideal situation: we choose reference semantics for objects that
need unique identity or that are too large to copy, and we can choose value
semantics for the others. Thus the highest frequency objects will end up with
value semantics, since we install flexibility where it doesn't hurt us (only),
and we install performance where we need it most!
These are some of the many issues the come into play with real OO design.
OO/C++ mastery takes time and high quality training. If you want a powerful
tool, you've got to invest.
Don't stop now! Read the next FAQ too!!
[ Top | Bottom | Previous section | Next section ]
[28.8] Does the poor performance of reference semantics mean I
should pass-by-value? 
[Recently renamed "subclass" to "derived class" (on 7/00). Click here to go to the next FAQ in the "chain" of recent changes.]
Nope.
The previous FAQ were talking about member objects, not parameters.
Generally, objects that are part of an inheritance hierarchy should be passed
by reference or by pointer, not by value, since only then do you get
the (desired) dynamic binding (pass-by-value doesn't mix with inheritance,
since larger derived class objects get "sliced" when passed by value as a base
class object).
Unless compelling reasons are given to the contrary, member objects should be
by value and parameters should be by reference. The discussion in the previous
few FAQs indicates some of the "compelling reasons" for when member objects
should be by reference.
[ Top | Bottom | Previous section | Next section ]
E-mail the author
[ C++ FAQ Lite
| Table of contents
| Subject index
| About the author
| ©
| Download your own copy ]
Revised Jul 10, 2000
|