
Advanced PHP Programming
.pdf
318 Chapter 12 Interacting with Databases
CREATE TABLE forum_entries (
id int not null auto increment, author varchar(60) not null,
posted_at timestamp not null default now(). data text
);
The posts are ordered by timestamp, and entries can be deleted, so a simple range search based on the posting ID won’t work. A common way I’ve seen the range extraction implemented is as follows:
function returnEntries($start, $numrows)
{
$entries = array();
$dbh = new DB_Mysql_Test;
$query = “SELECT * FROM forum_entries ORDER BY posted_at”; $res = $dbh->execute($query);
while($data = $res->fetch_assoc()) {
if ( $i++ < $start || $i > $start + $numrows ) { continue;
}
array_push($entries, new Entry($data));
}
return $entries;
}
The major problem with this methodology is that you end up pulling over every single row in forum_entries. Even if the search is terminated with $i > $end, you have still pulled over every row up to $end.When you have 10,000 forum entry postings and are trying to display records 9,980 to 10,000, this will be very, very slow. If your average forum entry is 1KB, running through 10,000 of them will result in 10MB of data being transferred across the network to you.That’s quite a bit of data for the 20 entries that you want.
A better approach is to limit the SELECT statement inside the query itself. In MySQL this is extremely easy; you can simply use a LIMIT clause in the SELECT, as follows:
function returnEntries($start, $numrows)
{
$entries = array();
$dbh = new DB_Mysql_Test;
$query = “SELECT * FROM forum_entries ORDER BY posted_at LIMIT :1, :2”; $res = $dbh->prepare($query)->execute($start, $numrows);
while($data = $res->fetch_assoc()) { array_push($entries, new Entry($data));
}
return $entries;
}

Tuning Database Access |
319 |
The LIMIT syntax is not part of the SQL92 language syntax definition for SQL, so it might not be available on your platform. For example, on Oracle you need to write the query like this:
$query = “SELECT a.* FROM
(SELECT * FROM forum_entries ORDER BY posted_at) a WHERE rownum BETWEEN :1 AND :2”;
This same argument applies to the fields you select as well. In the case of forum_entries, you most likely need all the fields. In other cases, especially were a table is especially wide (meaning that it contains a number of large varchar or LOB columns), you should be careful not to request fields you don’t need.
SELECT * is also evil because it encourages writing code that depends on the position of fields in a result row. Field positions are subject to change when a table is altered (for example, when you add or remove a column). Fetching result rows into associative arrays mitigates this problem.
Remember: Any data on which you use SELECT will need to be pulled across the network and processed by PHP. Also, memory for the result set is tied up on both the server and the client.The network and memory costs can be extremely high, so be pragmatic in what you select.
Lazy Initialization
Lazy initialization is a classic tuning strategy that involves not fetching data until you actually need it.This is particularly useful where the data to be fetched is expensive and the fetching is performed only occasionally. A typical example of lazy initialization is lookup tables. If you wanted a complete two-way mapping of ISO country codes to country names, you might create a Countries library that looks like this:
class Countries {
public static $codeFromName = array(); public static $nameFromCode = array();
public static function populate()
{
$dbh = new DB_Mysql_Test;
$query = “SELECT name, countrycode FROM countries”; $res = $dbh->execute($query)->fetchall_assoc(); foreach($res as $data) {
self::$codeFromName[$data[‘name’]] = $data[‘countrycode’]; self::$nameFromCode[$data[‘countrycode’]] = $data[‘name’];
}
}
}
Countries::populate();

320 Chapter 12 Interacting with Databases
Here, populate() is called when the library is first loaded, to initialize the table. With lazy initialization, you do not perform the country lookup until you actually
need it. Here is an implementation that uses accessor functions that handle the population and caching of results:
class Countries {
private static $nameFromCodeMap = array();
public static function nameFromCode($code)
{
if(!in_array($code, self::$nameFromCodeMap)) {
$query = “SELECT name FROM countries WHERE countrycode = :1”; $dbh = new DB_Mysql_Test;
list ($name) = $dbh->prepare($query)->execute($code)->fetch_row(); self::$nameFromCodeMap[$code] = $name;
if($name) { self::$codeFromNameMap[$name] = $code;
}
}
return self::$nameFromCodeMap[$code];
}
public static function codeFromName($name)
{
if(!in_array($name, self::$codeFromNameMap)) {
$query = “SELECT countrycode FROM countries WHERE name = :1”; $dbh = new DB_Mysql_Test;
list ($code) = $dbh->prepare($query)->execute($name)->fetch_row(); self::$codeFromNameMap[$name] = $code;
if($code) { self::$nameFromCodeMap[$code] = $name;
}
}
return self::$codeFromNameMap[$name];
}
}
Another application of lazy initialization is in tables that contain large fields. For example, my Web logging software uses a table to store entries that looks like this:
CREATE TABLE entries (
id int(10) unsigned NOT NULL auto_increment, title varchar(200) default NULL,
timestamp int(10) unsigned default NULL, body text,
PRIMARY KEY (id)
);

Tuning Database Access |
321 |
I have an Active Record pattern class Entry that encapsulates individual rows in this table.There are a number of contexts in which I use the timestamp and title fields of an Entry object but do not need its body. For example, when generating an index of entries on my Web log, I only need their titles and time of posting. Because the body field can be very large, it is silly to pull this data if I do not think I will use it.This is especially true when generating an index, as I may pull tens or hundreds of Entry records at one time.
To avoid this type of wasteful behavior, you can use lazy initialization body. Here is an example that uses the overloaded attribute accessors __get() and __set() to make the lazy initialization of body completely transparent to the user:
class Entry { public $id; public $title; public $timestamp; private $_body;
public function __construct($id = false)
{
if(!$id) { return;
}
$dbh = new DB_Mysql_Test;
$query = “SELECT id, title, timestamp FROM entries
WHERE id = :1”;
$data = $dbh->prepare($query)->execute($id)->fetch_assoc(); $this->id = $data[‘id’];
$this->title = $data[‘title’]; $this->timestamp = $data[‘timestamp’];
}
public function __get($name) { if($name == ‘body’) {
if($this->id && !$this->_body) { $dbh = new DB_Mysql_Test;
$query = “SELECT body FROM entries WHERE id = :1”; list($this->_body) =
$dbh->prepare($query)->execute($this->id)->fetch_row();
}
return $this->_body;
}
}
public function __set($name, $value)
{

322 Chapter 12 Interacting with Databases
if($name == ‘body’) { $this->_body = $value;
}
}
/** Active Record update() delete() and insert() omitted below **/
}
When you instantiate an Entry object by id, you get all the fields except for body. As soon as you request body, though, the overload accessors fetch it and stash it in the private variable $_body. Using overloaded accessors for lazy initialization is an extremely powerful technique because it can be entirely transparent to the end user, making refactoring simple.
Further Reading
The Active Record and Mapper patterns are both taken from Martin Fowler’s excellent
Patterns of Enterprise Application Development.This is one of my favorite books, and I cannot recommend it enough. It provides whip-smart coverage of design patterns, especially data-to-object mapping patterns.
Database and even SQL tuning are very different from one RDBMS to another. Consult the documentation for your database system, and look for books that get high marks for covering that particular platform.
For MySQL, Jeremy Zawodny and Derek J. Balling’s upcoming High Performance MySQL is set to be the authoritative guide on high-end MySQL tuning.The online MySQL documentation available from http://www.mysql.com is also excellent.
For Oracle, Guy Harrison’s Oracle SQL High-Performance Tuning and Jonathan Lewis’s
Practical Oracle 8I: Building Efficient Databases are incredibly insightful texts that no Oracle user should be without.
A good general SQL text is SQL Performance Tuning by Peter Gulutzan and Trudy Pelzer. It focuses on tuning tips that generally coax at least 10% greater performance out of the eight major RDBMSs they cover, including DB2, Oracle, MSSQL, and MySQL.

13
User Authentication and
Session Security
WE ALL KNOW THAT HTTP IS THE Web protocol, the protocol by which browsers and Web servers communicate.You’ve also almost certainly heard that HTTP is a stateless protocol.The rumors are true: HTTP maintains no state from request to request. HTTP is a simple request/response protocol.The client browser makes a request, the Web server responds to it, and the exchange is over.This means that if I issue an HTTP GET to a Web server and then issue another HTTP GET immediately after that, the HTTP protocol has no way of associating those two events together.
Many people think that so-called persistent connections overcome this and allow state to be maintained. Not true. Although the connection remains established, the requests themselves are handled completely independently.
The lack of state in HTTP poses a number of problems:
nAuthentication—Because the protocol does not associate requests, if you authorize a person’s access in Request A, how do you determine whether a subsequent Request B is made by that person or someone else?
nPersistence—Most people use the Web to accomplish tasks. A task by its very nature requires something to change state (otherwise, you did nothing). How do you effect change, in particular multistep change, if you have no state?
An example of a typical Web application that encounters these issues is an online store. The application needs to authenticate the user so that it can know who the user is (since it has personal data such as the user’s address and credit card info). It also needs to make certain data—such as the contents of a shopping cart—be persistent across requests.
The solution to both these problems is to implement the necessary statefulness yourself.This is not as daunting a challenge as it may seem. Networking protocols often consist of stateful layers built on stateless layers and vice versa. For example, HTTP is an application-level protocol (that is, a protocol in which two applications, the browser and the Web server, talk) that is built on TCP.

324 Chapter 13 User Authentication and Session Security
TCP is a system-level protocol (meaning the endpoints are operating systems) that is stateful.When a TCP session is established between two machines, it is like a conversation.The communication goes back and forth until one party quits.TCP is built on top of IP, which is in turn a stateless protocol.TCP implements its state by passing sequence numbers in its packets.These sequence numbers (plus the network addresses of the endpoints) allow both sides to know if they have missed any parts of the conversation.They also provide a means of authentication, so that each side knows that it is still talking with the same individual. It turns out that if the sequence numbers are easy to guess, it is possible to hijack a TCP session by interjecting yourself into the conversation with the correct sequence numbers.This is a lesson you should keep in mind for later.
Simple Authentication Schemes
The system you will construct in this chapter is essentially a ticket-based system.Think of it as a ski lift ticket.When you arrive at the mountain, you purchase a lift ticket and attach it to your jacket.Wherever you go, the ticket is visible. If you try to get on the lift without a ticket or with a ticket that is expired or invalid, you get sent back to the entrance to purchase a valid ticket.The lift operators take measures to ensure that the lift tickets are not compromised by integrating difficult-to-counterfeit signatures into the passes.
First, you need to be able to examine the credentials of the users. In most cases, this means being passed a username and a password.You can then check this information against the database (or against an LDAP server or just about anything you want). Here is an example of a function that uses a MySQL database to check a user’s credentials:
function check_credentials($name, $password) { $dbh = new DB_Mysql_Prod();
$cur = $dbh->execute(“ SELECT
userid FROM
users WHERE
username = ‘$name’
AND password = ‘$password’”); $row = $cur->fetch_assoc(); if($row) {
$userid = $row[‘userid’];
}
else {
throw new AuthException(“user is not authorized”);
}
return $userid;
}

Simple Authentication Schemes |
325 |
You can define AuthException to be a transparent wrapper around the base exception class and use it to handle authentication-related errors:
class AuthException extends Exception {}
Checking credentials is only half the battle.You need a scheme for managing authentication as well.You have three major candidates for authentication methods: HTTP Basic Authentication, query string munging, and cookies.
HTTP Basic Authentication
Basic Authentication is an authentication scheme that is integrated into HTTP.When a server receives an unauthorized request for a page, it responds with this header:
WWW-Authenticate: Basic realm=”RealmFoo”
In this header, RealmFoo is an arbitrary name assigned to the namespace that is being protected.The client then responds with a base 64–encoded username/password to be authenticated. Basic Authentication is what pops up the username/password window on a browser for many sites. Basic Authentication has largely fallen to the wayside with the wide adoption of cookies by browsers.The major benefit of Basic Authentication is that because it is an HTTP-level schema, it can be used to protect all the files on a site—not just PHP scripts.This is of particular interest to sites that serve video/audio/images to members only because it allows access to the media files to be authenticated as well. In PHP, the Basic Authentication username and password is passed into the script as $_SERVER[‘PHP_AUTH_USER’] and $_SERVER[‘PHP_AUTH_PW’], respectively.
The following is an example of an authentication function that uses Basic Authentication:
function check_auth() { try {
check_credentials($_SERVER[‘PHP_AUTH_USER’], $_SERVER[‘PHP_AUTH_PW’]);
}
catch (AuthException $e) {
header(‘WWW-Authenticate: Basic realm=”RealmFoo”’); header(‘HTTP/1.0 401 Unauthorized’);
exit;
}
}
Query String Munging
In query string munging, your credentials are added to the query string for every request.This is the way a number of Java-based session wrappers work, and it is supported by PHP’s session module as well.
I intensely dislike query string munging. First, it produces horribly long and ugly URLs. Session information can get quite long, and appending another 100 bytes of data

326Chapter 13 User Authentication and Session Security
to an otherwise elegant URL is just plain ugly.This is more than a simple issue of aesthetics. Many search engines do not cache dynamic URLs (that is, URLs with query string parameters), and long URLs are difficult to cut and paste—they often get linebroken by whatever tool you may happen to be using, making them inconvenient for conveyance over IM and email.
Second, query string munging is a security problem because it allows for a user session parameters to be easily leaked to other users. A simple cut and paste of a URL that contains a session ID allows other users to hijack (sometimes unintentionally) another user’s session.
I don’t discuss this technique in greater depth except to say that there is almost always a more secure and more elegant solution.
Cookies
Starting with Netscape 3.0 in 1996, browsers began to offer support for cookies.The following is a quote from the Netscape cookie specification:
A server, when returning an HTTP object to a client, may also send a piece of state information which the client will store. Included in that state object is a description of the range of URLs for which that state is valid. Any future HTTP requests made by the client which fall in that range will include a transmittal of the current value of the state object from the client back to the server.The state object is called a cookie, for no compelling reason.
Cookies provide an invaluable tool for maintaining state between requests. More than just a way of conveying credentials and authorizations, cookies can be effectively used to pass large and arbitrary state information between requests—even after the browser has been shut down and restarted.
In this chapter you will implement an authentication scheme by using cookies. Cookies are the de facto standard for transparently passing information with HTTP requests.These are the major benefits of cookies over Basic Authentication:
nVersatility—Cookies provide an excellent means for passing around arbitrary information between requests. Basic Authentication is, as its name says, basic.
nPersistence—Cookies can be set to remain resident in a user’s browser between sessions. Many sites take advantage of this to enable transparent, or automatic, login based on the cookied information. Clearly this setup has security ramifications, but many sites make the security sacrifice to take advantage of the enhanced usability. Of course users can set their cookie preferences to refuse cookies from your site. It’s up to you how much effort you want to apply to people who use extremely paranoid cookie policies.
nAesthetic—Basic Authentication is the method that causes a browser to pop up that little username/password window.That window is unbranded and unstyled, and this is unacceptable in many designs.When you use a homegrown method, you have greater flexibility.

Registering Users |
327 |
The major drawback with using cookie-based authentication is that it does not allow you to easily protect non-PHP pages with them.To allow Apache to read and understand the information in cookies, you need to have an Apache module that can parse and read the cookies. If a Basic Authentication implementation in PHP employees any complex logic at all, you are stuck in a similar situation. So cookies aren’t so limiting after all.
Authentication Handlers Written in PHP
In PHP 5 there is an experimental SAPI called apache_hooks that allows you to author entire Apache modules in PHP. This means that you can implement an Apache-level authentication handler that can apply your authentication logic to all requests, not just PHP pages. When this is stable, it provides an easy way to seamlessly implement arbitrarily complex authentication logic consistently across all objects on a site.
Registering Users
Before you can go about authenticating users, you need to know who the users are. Minimally, you need a username and a password for a user, although it is often useful to collect more information than that. Many people concentrate on the nuances of good password generation (which, as we discuss in the next section, is difficult but necessary) without ever considering the selection of unique identifiers.
I’ve personally had very good success using email addresses as unique identifiers for users in Web applications.The vast majority of users (computer geeks aside) use a single address.That address is also usually used exclusively by that user.This makes it a perfect unique identifier for a user. If you use a closed-loop confirmation process for registration (meaning that you will send the user an email message saying that he or she must act on to complete registration), you can ensure that the email address is valid and belongs to the registering user.
Collecting email addresses also allows you to communicate more effectively with your users. If they opt in to receive mail from you, you can send them periodic updates on what is happening with your sites, and being able to send a freshly generated password to a user is critical for password recovery. All these tasks are cleanest if there is a one-to-one correspondence of users and email addresses.
Protecting Passwords
Users choose bad passwords. It’s part of human nature. Numerous studies have confirmed that if they are allowed to, most users will create a password that can be guessed in short order.
A dictionary attack is an automated attack against an authentication system.The cracker commonly uses a large file of potential passwords (say all two-word combinations of words in the English language) and tries to log in to a given user account with each in succession.This sort of attack does not work against random passwords, but it is incredibly effective against accounts where users can choose their own passwords.