- •Thinking in C++ 2nd edition Volume 2: Standard Libraries & Advanced Topics
- •Preface
- •What’s new in the second edition
- •What’s in Volume 2 of this book
- •How to get Volume 2
- •Prerequisites
- •Learning C++
- •Goals
- •Chapters
- •Exercises
- •Exercise solutions
- •Source code
- •Language standards
- •Language support
- •The book’s CD ROM
- •Seminars, CD Roms & consulting
- •Errors
- •Acknowledgements
- •Library overview
- •1: Strings
- •What’s in a string
- •Creating and initializing C++ strings
- •Initialization limitations
- •Operating on strings
- •Appending, inserting and concatenating strings
- •Replacing string characters
- •Concatenation using non-member overloaded operators
- •Searching in strings
- •Finding in reverse
- •Finding first/last of a set
- •Removing characters from strings
- •Stripping HTML tags
- •Comparing strings
- •Using iterators
- •Iterating in reverse
- •Strings and character traits
- •A string application
- •Summary
- •Exercises
- •2: Iostreams
- •Why iostreams?
- •True wrapping
- •Iostreams to the rescue
- •Sneak preview of operator overloading
- •Inserters and extractors
- •Manipulators
- •Common usage
- •Line-oriented input
- •Overloaded versions of get( )
- •Reading raw bytes
- •Error handling
- •File iostreams
- •Open modes
- •Iostream buffering
- •Seeking in iostreams
- •Creating read/write files
- •User-allocated storage
- •Output strstreams
- •Automatic storage allocation
- •Proving movement
- •A better way
- •Output stream formatting
- •Internal formatting data
- •Format fields
- •Width, fill and precision
- •An exhaustive example
- •Formatting manipulators
- •Manipulators with arguments
- •Creating manipulators
- •Effectors
- •Iostream examples
- •Code generation
- •Maintaining class library source
- •Detecting compiler errors
- •A simple datalogger
- •Generating test data
- •Verifying & viewing the data
- •Counting editor
- •Breaking up big files
- •Summary
- •Exercises
- •3: Templates in depth
- •Nontype template arguments
- •Typedefing a typename
- •Using typename instead of class
- •Function templates
- •A string conversion system
- •A memory allocation system
- •Type induction in function templates
- •Taking the address of a generated function template
- •Local classes in templates
- •Applying a function to an STL sequence
- •Template-templates
- •Member function templates
- •Why virtual member template functions are disallowed
- •Nested template classes
- •Template specializations
- •A practical example
- •Pointer specialization
- •Partial ordering of function templates
- •Design & efficiency
- •Preventing template bloat
- •Explicit instantiation
- •Explicit specification of template functions
- •Controlling template instantiation
- •Template programming idioms
- •Summary
- •Containers and iterators
- •STL reference documentation
- •The Standard Template Library
- •The basic concepts
- •Containers of strings
- •Inheriting from STL containers
- •A plethora of iterators
- •Iterators in reversible containers
- •Iterator categories
- •Input: read-only, one pass
- •Output: write-only, one pass
- •Forward: multiple read/write
- •Bidirectional: operator--
- •Random-access: like a pointer
- •Is this really important?
- •Predefined iterators
- •IO stream iterators
- •Manipulating raw storage
- •Basic sequences: vector, list & deque
- •Basic sequence operations
- •vector
- •Cost of overflowing allocated storage
- •Inserting and erasing elements
- •deque
- •Converting between sequences
- •Cost of overflowing allocated storage
- •Checked random-access
- •list
- •Special list operations
- •list vs. set
- •Swapping all basic sequences
- •Robustness of lists
- •Performance comparison
- •A completely reusable tokenizer
- •stack
- •queue
- •Priority queues
- •Holding bits
- •bitset<n>
- •vector<bool>
- •Associative containers
- •Generators and fillers for associative containers
- •The magic of maps
- •A command-line argument tool
- •Multimaps and duplicate keys
- •Multisets
- •Combining STL containers
- •Creating your own containers
- •Summary
- •Exercises
- •5: STL Algorithms
- •Function objects
- •Classification of function objects
- •Automatic creation of function objects
- •Binders
- •Function pointer adapters
- •SGI extensions
- •A catalog of STL algorithms
- •Support tools for example creation
- •Filling & generating
- •Example
- •Counting
- •Example
- •Manipulating sequences
- •Example
- •Searching & replacing
- •Example
- •Comparing ranges
- •Example
- •Removing elements
- •Example
- •Sorting and operations on sorted ranges
- •Sorting
- •Example
- •Locating elements in sorted ranges
- •Example
- •Merging sorted ranges
- •Example
- •Set operations on sorted ranges
- •Example
- •Heap operations
- •Applying an operation to each element in a range
- •Examples
- •Numeric algorithms
- •Example
- •General utilities
- •Creating your own STL-style algorithms
- •Summary
- •Exercises
- •Perspective
- •Duplicate subobjects
- •Ambiguous upcasting
- •virtual base classes
- •The "most derived" class and virtual base initialization
- •"Tying off" virtual bases with a default constructor
- •Overhead
- •Upcasting
- •Persistence
- •MI-based persistence
- •Improved persistence
- •Avoiding MI
- •Mixin types
- •Repairing an interface
- •Summary
- •Exercises
- •7: Exception handling
- •Error handling in C
- •Throwing an exception
- •Catching an exception
- •The try block
- •Exception handlers
- •Termination vs. resumption
- •The exception specification
- •Better exception specifications?
- •Catching any exception
- •Rethrowing an exception
- •Uncaught exceptions
- •Function-level try blocks
- •Cleaning up
- •Constructors
- •Making everything an object
- •Exception matching
- •Standard exceptions
- •Programming with exceptions
- •When to avoid exceptions
- •Not for asynchronous events
- •Not for ordinary error conditions
- •Not for flow-of-control
- •You’re not forced to use exceptions
- •New exceptions, old code
- •Typical uses of exceptions
- •Always use exception specifications
- •Start with standard exceptions
- •Nest your own exceptions
- •Use exception hierarchies
- •Multiple inheritance
- •Catch by reference, not by value
- •Throw exceptions in constructors
- •Don’t cause exceptions in destructors
- •Avoid naked pointers
- •Overhead
- •Summary
- •Exercises
- •8: Run-time type identification
- •The “Shape” example
- •What is RTTI?
- •Two syntaxes for RTTI
- •Syntax specifics
- •Producing the proper type name
- •Nonpolymorphic types
- •Casting to intermediate levels
- •void pointers
- •Using RTTI with templates
- •References
- •Exceptions
- •Multiple inheritance
- •Sensible uses for RTTI
- •Revisiting the trash recycler
- •Mechanism & overhead of RTTI
- •Creating your own RTTI
- •Explicit cast syntax
- •Summary
- •Exercises
- •9: Building stable systems
- •Shared objects & reference counting
- •Reference-counted class hierarchies
- •Finding memory leaks
- •An extended canonical form
- •Exercises
- •10: Design patterns
- •The pattern concept
- •The singleton
- •Variations on singleton
- •Classifying patterns
- •Features, idioms, patterns
- •Basic complexity hiding
- •Factories: encapsulating object creation
- •Polymorphic factories
- •Abstract factories
- •Virtual constructors
- •Destructor operation
- •Callbacks
- •Observer
- •The “interface” idiom
- •The “inner class” idiom
- •The observer example
- •Multiple dispatching
- •Visitor, a type of multiple dispatching
- •Efficiency
- •Flyweight
- •The composite
- •Evolving a design: the trash recycler
- •Improving the design
- •“Make more objects”
- •A pattern for prototyping creation
- •Trash subclasses
- •Parsing Trash from an external file
- •Recycling with prototyping
- •Abstracting usage
- •Applying double dispatching
- •Implementing the double dispatch
- •Applying the visitor pattern
- •More coupling?
- •RTTI considered harmful?
- •Summary
- •Exercises
- •11: Tools & topics
- •The code extractor
- •Debugging
- •Trace macros
- •Trace file
- •Abstract base class for debugging
- •Tracking new/delete & malloc/free
- •CGI programming in C++
- •Encoding data for CGI
- •The CGI parser
- •Testing the CGI parser
- •Using POST
- •Handling mailing lists
- •Maintaining your list
- •Mailing to your list
- •A general information-extraction CGI program
- •Parsing the data files
- •Summary
- •Exercises
- •General C++
- •My own list of books
- •Depth & dark corners
- •Design Patterns
- •Index
(Without the line break, of course.) Here you see a little bit of the way that data is encoded to send to CGI. For one thing, spaces are not allowed (since spaces typically separate commandline arguments). Spaces are replaced by ‘+’ signs. In addition, each field contains the field name (which is determined by the form on the HTML page) followed by an ‘=‘ and the field data, and terminated by a ‘&’.
At this point, you might wonder about the ‘+’, ‘=,’ and ‘&’. What if those are used in the field, as in “John & Marsha Smith”? This is encoded to:
John+%26+Marsha+Smith
That is, the special character is turned into a ‘%’ followed by its ASCII value in hex. Fortunately, the web browser automatically performs all encoding for you.
The CGI parser
There are many examples of CGI programs written using Standard C. One argument for doing this is that Standard C can be found virtually everywhere. However, C++ has become quite
ubiquitous, especially in the form of the GNU C++ Compiler29 (g++) that can be downloaded free from the Internet for virtually any platform (and often comes pre-installed with operating systems such as Linux). As you will see, this means that you can get the benefit of objectoriented programming in a CGI program.
Since what we’re concerned with when parsing the CGI information is the field name-value pairs, one class (CGIpair) will be used to represent a single name-value pair and a second class (CGImap) will use CGIpair to parse each name-value pair that is submitted from the HTML form into keys and values that it will hold in a map of strings so you can easily fetch the value for each field at your leisure.
One of the reasons for using C++ here is the convenience of the STL, in particular the map class. Since map has the operator[ ], you have a nice syntax for extracting the data for each field. The map template will be used in the creation of CGImap, which you’ll see is a fairly short definition considering how powerful it is.
The project will start with a reusable portion, which consists of CGIpair and CGImap in a header file. Normally you should avoid cramming this much code into a header file, but for these examples it’s convenient and it doesn’t hurt anything:
//: C10:CGImap.h
//Tools for extracting and decoding data from
//from CGI GETs and POSTs.
29 GNU stands for “Gnu’s Not Unix.” The project, created by the Free Software Foundation, was originally intended to replace the Unix operating system with a free version of that OS. Linux appears to have replaced this initiative, but the GNU tools have played an integral part in the development of Linux, which comes packaged with many GNU components.
Appendix B: Programming Guidelines
541
#include <string> #include <vector> #include <iostream> using namespace std;
class CGIpair : public pair<string, string> { public:
CGIpair() {}
CGIpair(string name, string value) { first = decodeURLString(name); second = decodeURLString(value);
}
// Automatic type conversion for boolean test: operator bool() const {
return (first.length() != 0);
}
private:
static string decodeURLString(string URLstr) { const int len = URLstr.length();
string result;
for(int i = 0; i < len; i++) { if(URLstr[i] == '+')
result += ' ';
else if(URLstr[i] == '%') { result +=
translateHex(URLstr[i + 1]) * 16 + translateHex(URLstr[i + 2]);
i += 2; // Move past hex code
}else // An ordinary character result += URLstr[i];
}
return result;
}
//Translate a single hex character; used by
//decodeURLString():
static char translateHex(char hex) { if(hex >= 'A')
return (hex & 0xdf) - 'A' + 10; else
return hex - '0';
}
};
Appendix B: Programming Guidelines
542
//Parses any CGI query and turns it into an
//STL vector of CGIpair which has an associative
//lookup operator[] like a map. A vector is used
//instead of a map because it keeps the original
//ordering of the fields in the Web page form. class CGImap : public vector<CGIpair> {
string gq; int index;
//Prevent assignment and copy-construction: void operator=(CGImap&);
CGImap(CGImap&);
public:
CGImap(string query): index(0), gq(query){ CGIpair p;
while((p = nextPair()) != 0) push_back(p);
}
// Look something up, as if it were a map: string operator[](const string& key) {
iterator i = begin(); while(i != end()) {
if((*i).first == key) return (*i).second;
i++;
}
return string(); // Empty string == not found
}
void dump(ostream& o, string nl = "<br>") { for(iterator i = begin(); i != end(); i++) {
o << (*i).first << " = " << (*i).second << nl;
}
}
private:
//Produces name-value pairs from the query
//string. Returns an empty Pair when there's
//no more query string left:
CGIpair nextPair() { if(gq.length() == 0)
return CGIpair(); // Error, return empty if(gq.find('=') == -1)
return CGIpair(); // Error, return empty string name = gq.substr(0, gq.find('='));
Appendix B: Programming Guidelines
543
gq |
= |
gq.substr(gq.find('=') |
+ |
1); |
|
string |
value = gq.substr(0, |
gq.find('&')); |
|||
gq |
= |
gq.substr(gq.find('&') |
+ |
1); |
|
return |
CGIpair(name, value); |
|
|
||
}
};
// Helper class for getting POST data: class Post : public string {
public: Post() {
//For a CGI "POST," the server puts the
//length of the content string in the
//environment variable CONTENT_LENGTH: char* clen = getenv("CONTENT_LENGTH"); if(clen == 0) {
cout << "Zero CONTENT_LENGTH, Make sure " "this is a POST and not a GET" << endl;
return;
}
int len = atoi(clen); char* s = new char[len];
cin.read(s, len); // Get the data append(s, len); // Add it to this string delete []s;
}
}; ///:~
The CGIpair class starts out quite simply: it inherits from the standard library pair template to create a pair of strings, one for the name and one for the value. The second constructor calls the member function decodeURLString( ) which produces a string after stripping away all the extra characters added by the browser as it submitted the CGI request. There is no need to provide functions to select each individual element – because pair is inherited publicly, you can just select the first and second elements of the CGIpair.
The operator bool provides automatic type conversion to bool. If you have a CGIpair object called p and you use it in an expression where a Boolean result is expected, such as
if(p) { //...
then the compiler will recognize that it has a CGIpair and it needs a Boolean, so it will automatically call operator bool to perform the necessary conversion.
Because the string objects take care of themselves, you don’t need to explicitly define the copy-constructor, operator= or destructor – the default versions synthesized by the compiler do the right thing.
Appendix B: Programming Guidelines
544
The remainder of the CGIpair class consists of the two methods decodeURLString( ) and a helper member function translateHex( ) which is used by decodeURLString( ). (Note that translateHex( ) does not guard against bad input such as “%1H.”) decodeURLString( ) moves through and replaces each ‘+’ with a space, and each hex code (beginning with a ‘%’) with the appropriate character. It’s worth noting here and in CGImap the power of the string class – you can index into a string object using operator[ ], and you can use methods like find( ) and substring( ).
CGImap parses and holds all the name-value pairs submitted from the form as part of a CGI request. You might think that anything that has the word “map” in it’s name should be inherited from the STL map, but map has it’s own way of ordering the elements it stores whereas here it’s useful to keep the elements in the order that they appear on the Web page. So CGImap is inherited from vector<CGIpair>, and operator[ ] is overloaded so you get the associative-array lookup of a map.
You can also see that CGImap has a copy-constructor and an operator=, but they’re both declared as private. This is to prevent the compiler from synthesizing the two functions (which it will do if you don’t declare them yourself), but it also prevents the client programmer from passing a CGImap by value or from using assignment.
CGImap’s job is to take the input data and parse it into name-value pairs, which it will do with the aid of CGIpair (effectively, CGIpair is only a helper class, but it also seems to make it easier to understand the code). After copying the query string (you’ll see where the query string comes from later) into a local string object gq, the nextPair( ) member function is used to parse the string into raw name-value pairs, delimited by ‘=‘ and ‘&’ signs. Each resulting CGIpair object is added to the vector using the standard vector::push_back( ). When nextPair( ) runs out of input from the query string, it returns zero.
The CGImap::operator[ ] takes the brute-force approach of a linear search through the elements. Since the CGImap is intentionally not sorted and they tend to be small, this is not too terrible. The dump( ) function is used for testing, typically by sending information to the resulting Web page, as you might guess from the default value of nl, which is an HTML “break line” token.
Using GET can be fine for many applications. However, GET passes its data to the CGI program through an environment variable (called QUERY_STRING), and operating systems typically run out of environment space with long GET strings (you should start worrying at about 200 characters). CGI provides a solution for this: POST. With POST, the data is encoded and concatenated the same way as with GET, but POST uses standard input to pass the encoded query string to the CGI program and has no length limitation on the input. All you have to do in your CGI program is determine the length of the query string. This length is stored in the environment variable CONTENT_LENGTH. Once you know the length, you can allocate storage and read the precise number of bytes from standard input. Because POST is the less-fragile solution, you should probably prefer it over GET, unless you know for sure that your input will be short. In fact, one might surmise that the only reason for GET is that it is slightly easier to code a CGI program in C using GET. However, the last class in
Appendix B: Programming Guidelines
545
