Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Thinking In C++, 2nd Edition, Volume 2 Standard Libraries& Advanced Topics - Eckel B..pdf
Скачиваний:
319
Добавлен:
24.05.2014
Размер:
2.09 Mб
Скачать

(Without the line break, of course.) Here you see a little bit of the way that data is encoded to send to CGI. For one thing, spaces are not allowed (since spaces typically separate commandline arguments). Spaces are replaced by ‘+’ signs. In addition, each field contains the field name (which is determined by the form on the HTML page) followed by an ‘=‘ and the field data, and terminated by a ‘&’.

At this point, you might wonder about the ‘+’, ‘=,’ and ‘&’. What if those are used in the field, as in “John & Marsha Smith”? This is encoded to:

John+%26+Marsha+Smith

That is, the special character is turned into a ‘%’ followed by its ASCII value in hex. Fortunately, the web browser automatically performs all encoding for you.

The CGI parser

There are many examples of CGI programs written using Standard C. One argument for doing this is that Standard C can be found virtually everywhere. However, C++ has become quite

ubiquitous, especially in the form of the GNU C++ Compiler29 (g++) that can be downloaded free from the Internet for virtually any platform (and often comes pre-installed with operating systems such as Linux). As you will see, this means that you can get the benefit of objectoriented programming in a CGI program.

Since what we’re concerned with when parsing the CGI information is the field name-value pairs, one class (CGIpair) will be used to represent a single name-value pair and a second class (CGImap) will use CGIpair to parse each name-value pair that is submitted from the HTML form into keys and values that it will hold in a map of strings so you can easily fetch the value for each field at your leisure.

One of the reasons for using C++ here is the convenience of the STL, in particular the map class. Since map has the operator[ ], you have a nice syntax for extracting the data for each field. The map template will be used in the creation of CGImap, which you’ll see is a fairly short definition considering how powerful it is.

The project will start with a reusable portion, which consists of CGIpair and CGImap in a header file. Normally you should avoid cramming this much code into a header file, but for these examples it’s convenient and it doesn’t hurt anything:

//: C10:CGImap.h

//Tools for extracting and decoding data from

//from CGI GETs and POSTs.

29 GNU stands for “Gnu’s Not Unix.” The project, created by the Free Software Foundation, was originally intended to replace the Unix operating system with a free version of that OS. Linux appears to have replaced this initiative, but the GNU tools have played an integral part in the development of Linux, which comes packaged with many GNU components.

Appendix B: Programming Guidelines

541

#include <string> #include <vector> #include <iostream> using namespace std;

class CGIpair : public pair<string, string> { public:

CGIpair() {}

CGIpair(string name, string value) { first = decodeURLString(name); second = decodeURLString(value);

}

// Automatic type conversion for boolean test: operator bool() const {

return (first.length() != 0);

}

private:

static string decodeURLString(string URLstr) { const int len = URLstr.length();

string result;

for(int i = 0; i < len; i++) { if(URLstr[i] == '+')

result += ' ';

else if(URLstr[i] == '%') { result +=

translateHex(URLstr[i + 1]) * 16 + translateHex(URLstr[i + 2]);

i += 2; // Move past hex code

}else // An ordinary character result += URLstr[i];

}

return result;

}

//Translate a single hex character; used by

//decodeURLString():

static char translateHex(char hex) { if(hex >= 'A')

return (hex & 0xdf) - 'A' + 10; else

return hex - '0';

}

};

Appendix B: Programming Guidelines

542

//Parses any CGI query and turns it into an

//STL vector of CGIpair which has an associative

//lookup operator[] like a map. A vector is used

//instead of a map because it keeps the original

//ordering of the fields in the Web page form. class CGImap : public vector<CGIpair> {

string gq; int index;

//Prevent assignment and copy-construction: void operator=(CGImap&);

CGImap(CGImap&);

public:

CGImap(string query): index(0), gq(query){ CGIpair p;

while((p = nextPair()) != 0) push_back(p);

}

// Look something up, as if it were a map: string operator[](const string& key) {

iterator i = begin(); while(i != end()) {

if((*i).first == key) return (*i).second;

i++;

}

return string(); // Empty string == not found

}

void dump(ostream& o, string nl = "<br>") { for(iterator i = begin(); i != end(); i++) {

o << (*i).first << " = " << (*i).second << nl;

}

}

private:

//Produces name-value pairs from the query

//string. Returns an empty Pair when there's

//no more query string left:

CGIpair nextPair() { if(gq.length() == 0)

return CGIpair(); // Error, return empty if(gq.find('=') == -1)

return CGIpair(); // Error, return empty string name = gq.substr(0, gq.find('='));

Appendix B: Programming Guidelines

543

gq

=

gq.substr(gq.find('=')

+

1);

string

value = gq.substr(0,

gq.find('&'));

gq

=

gq.substr(gq.find('&')

+

1);

return

CGIpair(name, value);

 

 

}

};

// Helper class for getting POST data: class Post : public string {

public: Post() {

//For a CGI "POST," the server puts the

//length of the content string in the

//environment variable CONTENT_LENGTH: char* clen = getenv("CONTENT_LENGTH"); if(clen == 0) {

cout << "Zero CONTENT_LENGTH, Make sure " "this is a POST and not a GET" << endl;

return;

}

int len = atoi(clen); char* s = new char[len];

cin.read(s, len); // Get the data append(s, len); // Add it to this string delete []s;

}

}; ///:~

The CGIpair class starts out quite simply: it inherits from the standard library pair template to create a pair of strings, one for the name and one for the value. The second constructor calls the member function decodeURLString( ) which produces a string after stripping away all the extra characters added by the browser as it submitted the CGI request. There is no need to provide functions to select each individual element – because pair is inherited publicly, you can just select the first and second elements of the CGIpair.

The operator bool provides automatic type conversion to bool. If you have a CGIpair object called p and you use it in an expression where a Boolean result is expected, such as

if(p) { //...

then the compiler will recognize that it has a CGIpair and it needs a Boolean, so it will automatically call operator bool to perform the necessary conversion.

Because the string objects take care of themselves, you don’t need to explicitly define the copy-constructor, operator= or destructor – the default versions synthesized by the compiler do the right thing.

Appendix B: Programming Guidelines

544

The remainder of the CGIpair class consists of the two methods decodeURLString( ) and a helper member function translateHex( ) which is used by decodeURLString( ). (Note that translateHex( ) does not guard against bad input such as “%1H.”) decodeURLString( ) moves through and replaces each ‘+’ with a space, and each hex code (beginning with a ‘%’) with the appropriate character. It’s worth noting here and in CGImap the power of the string class – you can index into a string object using operator[ ], and you can use methods like find( ) and substring( ).

CGImap parses and holds all the name-value pairs submitted from the form as part of a CGI request. You might think that anything that has the word “map” in it’s name should be inherited from the STL map, but map has it’s own way of ordering the elements it stores whereas here it’s useful to keep the elements in the order that they appear on the Web page. So CGImap is inherited from vector<CGIpair>, and operator[ ] is overloaded so you get the associative-array lookup of a map.

You can also see that CGImap has a copy-constructor and an operator=, but they’re both declared as private. This is to prevent the compiler from synthesizing the two functions (which it will do if you don’t declare them yourself), but it also prevents the client programmer from passing a CGImap by value or from using assignment.

CGImap’s job is to take the input data and parse it into name-value pairs, which it will do with the aid of CGIpair (effectively, CGIpair is only a helper class, but it also seems to make it easier to understand the code). After copying the query string (you’ll see where the query string comes from later) into a local string object gq, the nextPair( ) member function is used to parse the string into raw name-value pairs, delimited by ‘=‘ and ‘&’ signs. Each resulting CGIpair object is added to the vector using the standard vector::push_back( ). When nextPair( ) runs out of input from the query string, it returns zero.

The CGImap::operator[ ] takes the brute-force approach of a linear search through the elements. Since the CGImap is intentionally not sorted and they tend to be small, this is not too terrible. The dump( ) function is used for testing, typically by sending information to the resulting Web page, as you might guess from the default value of nl, which is an HTML “break line” token.

Using GET can be fine for many applications. However, GET passes its data to the CGI program through an environment variable (called QUERY_STRING), and operating systems typically run out of environment space with long GET strings (you should start worrying at about 200 characters). CGI provides a solution for this: POST. With POST, the data is encoded and concatenated the same way as with GET, but POST uses standard input to pass the encoded query string to the CGI program and has no length limitation on the input. All you have to do in your CGI program is determine the length of the query string. This length is stored in the environment variable CONTENT_LENGTH. Once you know the length, you can allocate storage and read the precise number of bytes from standard input. Because POST is the less-fragile solution, you should probably prefer it over GET, unless you know for sure that your input will be short. In fact, one might surmise that the only reason for GET is that it is slightly easier to code a CGI program in C using GET. However, the last class in

Appendix B: Programming Guidelines

545

Соседние файлы в предмете Программирование