Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Advanced PHP Programming

.pdf
Скачиваний:
71
Добавлен:
14.04.2015
Размер:
7.82 Mб
Скачать

358 Chapter 14 Session Handling

Here is a simple script that uses sessions to track the number of times the visitor has seen this page:

<?php

session_start(); if(isset($_SESSION[viewnum])) {

$_SESSION[viewnum]++;

}else { $_SESSION[viewnum] = 1;

}

?>

<html>

<body>

Hello There.<br>

This is <?= $_SESSION[viewnum] ?> times you have seen a page on this site.<br> </body>

</html>

session_start()initializes the session, reading in the session ID from either the specified cookie or through a query parameter.When session_start() is called, the data store for the specified session ID is accessed, and any $_SESSION variables set in previous requests are reinstated.When you assign to $_SESSION, the variable is marked to be serialized and stored via the session storage method at request shutdown.

If you want to flush all your session data before the request terminates, you can force a write by using session_write_close(). One reason to do this is that the built-in session handlers provide locking (for integrity) around access to the session store. If you are using sessions in multiple frames on a single page, the user’s browser will attempt to fetch them in parallel; but the locks will force this to occur serially, meaning that the frames with session calls in them will be loaded and rendered one at a time.

Sometimes you might want to permanently end a session. For example, with a shopping cart application that uses a collection of session variables to track items in the cart, when the user has checked out, you might want to empty the cart and destroy the session. Implementing this with the default handlers is a two-step process:

...

//clear the $_SESSION globals $_SESSION = array();

//now destroy the session backing session_destroy();

...

While the order in which you perform these two steps does not matter, it is necessary to perform both. session_destroy() clears the backing store to the session, but if you do not unset $_SESSION, the session information will be stored again at request shutdown.

You might have noticed that we have not discussed how this session data is managed internally in PHP.You have seen in Chapters 9,“External Performance Tunings,” 10,

Server-Side Sessions

359

“Data Component Caching,” and 11 “Computational Reuse,” that it is easy to quickly amass a large cache in a busy application. Sessions are not immune to this problem and require cleanup as well.The session extension chooses to take a probabilistic approach to garbage collection. On every request, it has a certain probability of invoking its internal garbage-collection routines to maintain the session cache.The probability that the garbage collector is invoked is set with this php.ini setting:

// sets the probability of garbage collection on a give request to 1% session.gc_probability=1

The garbage collector also needs to know how old a session must be before it is eligible for removal.This is also set with a php.ini setting (and it defaults to 1,440 seconds— that is, 24 minutes):

// sessions can be collected after 15 minutes (900 seconds) session.gc_maxlifetime=900

Figure 14.1 shows the actions taken by the session extension during normal operation. The session handler starts up, initializes its data, performs garbage collection, and reads the user’s session data.Then the page logic after session_start() is processed.The script may use or modify the $_SESSION array to its choosing.When the session is shut down, the information is written back to disk and the session extension’s internals are cleaned up.

startup and garbage collection

shutdown and

Initialize

$_SESSION

internal cleanup

array based on

 

user's SID

session data is stored back to

User code logic

non-volatile

manipulates

storage

$_SESSION

Figure 14.1 Handler callouts for a session handler.

360 Chapter 14 Session Handling

Custom Session Handler Methods

It seems a shame to invest so much effort in developing an authentication system and not tie it into your session data propagation. Fortunately, the session extension provides the session_id function, which allows for setting custom session IDs, meaning that you can integrate it directly into your authentication system.

If you want to tie each user to a unique session, you can simply use each user’s user ID as the session ID. Normally this would be a bad idea from a security standpoint because it would provide a trivially guessable session ID that is easy to exploit; however, in this case you will never transmit or read the session ID from a plaintext cookie; you will grab it from your authentication cookie.

To extend the authentication example from Chapter 13, you can change the page visit counter to this:

try {

$cookie = new Cookie(); $cookie->validate(); session_id($cookie->userid); session_start();

}

catch (AuthException $e) {

header(Location: /login.php?originating_uri=$_SERVER[REQUEST_URI]);

exit;

}

if(isset($_SESSION[viewnum])) { $_SESSION[viewnum]++;

} else { $_SESSION[viewnum] = 1;

}

?>

<html>

<body>

Hello There.<br>

This is <?= $_SESSION[viewnum] ?> times you have seen a page on this site.<br> </body>

</html>

Note that you set the session ID before you call session_start().This is necessary for the session extension to behave correctly. As the example stands, the user’s user ID will be sent in a cookie (or in the query string) on the response.To prevent this, you need to disable both cookies and query munging in the php.ini file:

session.use_cookies=0

session.use_trans_sid=0

Server-Side Sessions

361

And for good measure (even though you are manually setting the session ID), you need to use this:

session.use_only_cookies=1

These settings disable all the session extension’s methods for propagating the session ID to the client’s browser. Instead, you can rely entirely on the authentication cookies to carry the session ID.

If you want to allow multiple sessions per user, you can simply augment the authentication cookie to contain an additional property, which you can set whenever you want to start a new session (on login, for example). Allowing multiple sessions per user is convenient for accounts that may be shared; otherwise, the two users’ experiences may become merged in strange ways.

Note

We discussed this at length in Chapter 13, but it bears repeating: Unless you are absolutely unconcerned about sessions being hijacked or compromised, you should always encrypt session data by using strong cryptography. Using ROT13 on your cookie data is a waste of time. You should use a proven symmetric cipher such as Triple DES, AES, or Blowfish. This is not paranoia—just simple common sense.

Now that you know how to use sessions, let’s examine the handlers by which they are implemented.The session extension is basically a set of wrapper functions around multiple storage back ends.The method you choose does not affect how you write your code, but it does affect the applicability of the code to different architectures.The session handler to be used is set with this php.ini setting:

session.save_handler=files

PHP has two prefabricated session handlers:

nfiles—The default, files uses an individual file for storing each session.

nmm—This is an implementation that uses BSD shared memory, available only if you

have libmm installed and build PHP by using the –with-mm configure flag.

We’ve looked at methods similar to these in Chapters 9, 10, and 11.They work fine if you are running on a single machine, but they don’t scale well with clusters. Of course, unless you are running an extremely simple setup, you probably don’t want to be using the built-in handlers anyway. Fortunately, there are hooks for userspace session handlers, which allow you to implement your own session storage functions in PHP.You can set them by using session_set_save_handler. If you want to have distributed sessions that don’t rely on sticky connections, you need to implement them yourself.

The user session handlers work by calling out for six basic storage operations:

nopen

nclose

nread

362 Chapter 14 Session Handling

nwrite

ndestroy

ngc

For example, you can implement a MySQL-backed session handler.This will give you the ability to access consistent session data from multiple machines.

The table schema is simple, as illustrated in Figure 14.2.The session data is keyed by session_id.The serialized contents of $_SESSION will be stored in session_data.You use the CLOB (character large object) column type text so that you can store arbitrarily large amounts of session data. modtime allows you to track the modification time for session data for use in garbage collection.

startup

session_open

shutdown and

Initialize

$_SESSION

internal cleanup

array based on

 

user's SID

session_close

session_read

 

session_write

session_gc

 

session data is

User code logic

stored back to

manipulates

non-volatile

$_SESSION

storage

 

called together by

 

session_start()

 

called automatically at

 

session end

 

Figure 14.2 An updated copy of Figure 14.1 that shows how the callouts fit into the session life cycle.

Server-Side Sessions

363

For clean organization, you can put the custom session handlers in the MySession class:

class MySession { static $dbh;

MySession::open is the session opener.This function must be prototyped to accept two arguments: $save_path and $session_name. $save_path is the value of the php.ini parameter session.save_path. For the files handler, this is the root of the session data caching directory. In a custom handler, you can set this parameter to pass in locationspecific data as an initializer to the handler. $session_name is the name of the session (as specified by the php.ini parameter session.session_name). If you maintain multiple named sessions in distinct hierarchies, this might prove useful. For this example, you do not care about either of these, so you can simply ignore both passed parameters and open a handle to the database, which you can store for later use. Note that because open is called in session_start() before cookies are sent, you are not allowed to generate any output to the browser here unless output buffering is enabled.You can return true at the end to indicate to the session extension that the open() function completed correctly:

function open($save_path, $session_name) {

MySession::$dbh = new DB_MySQL_Test();

return(true);

}

MySession::close is called to clean up the session handler when a request is complete and data is written. Because you are using persistent database connections, you do not need to perform any cleanup here. If you were implementing your own file-based solution or any other nonpersistent resource, you would want to make sure to close any resources you may have opened.You return true to indicate to the session extension that we completed correctly:

function close() {

return(true);

}

MySession::read is the first handler that does real work.You look up the session by using $id and return the resulting data. If you look at the data that you are reading from, you see session_data, like this:

count|i:5;

This should look extremely familiar to anyone who has used the functions serialize() and unserialize(). It looks a great deal like the output of the following:

<?php

$count = 5;

print serialize($count);

?>

364 Chapter 14 Session Handling

> php ser.php

i:5;

This isn’t a coincidence:The session extension uses the same internal serialization routines as serialize and deserialize.

After you have selected your session data, you can return it in serialized form.The session extension itself handles unserializing the data and reinstantiating $_SESSION:

function read($id) {

$result = MySession::$dbh->prepare(SELECT session_data FROM sessions

WHEREsession_id = :1)->execute($id); $row = $result->fetch_assoc();

return $row[session_data];

}

MySession::write is the companion function to MySession::read. It takes the session ID $id and the session data $sess_data and handles writing it to the backing store. Much as you had to hand back serialized data from the read function, you receive preserialized data as a string here.You also make sure to update your modification time so that you are able to accurately dispose of idle sessions:

function write($id, $sess_data) {

$clean_data = mysql_escape_string($sess_data); MySession::$dbh->execute(REPLACE INTO

sessions

(session_id, session_data, modtime) VALUES($id, $clean_data, now()));

}

MySession::destroy is the function called when you use session_destroy().You use this function to expire an individual session by removing its data from the backing store. Although it is inconsistent with the built-in handlers, you can also need to destroy the contents of $_SESSION.Whether done inside the destroy function or after it, it is critical that you destroy $_SESSION to prevent the session from being re-registered automatically.

Here is a simple destructor function:

function destroy($id) { MySession::$dbh->execute(DELETE FROM sessions

WHERE session_id = $id’”);

$_SESSION = array();

}

Finally, you have the garbage-collection function, MySession::gc.The garbagecollection function is passed in the maximum lifetime of a session in seconds, which is the value of the php.ini setting session.gc_maxlifetime. As you’ve seen in previous chapters, intelligent and efficient garbage collection is not trivial.We will take a closer

Server-Side Sessions

365

look at the efficiency of various garbage-collection methods in the following sections. Here is a simple garbage-collection function that simply removes any sessions older than the specified $maxlifetime:

function gc($maxlifetime) { $ts = time() - $maxlifetime;

MySession::$dbh->execute(DELETE FROM sessions

WHERE modtime < from_unixtimestamp($ts));

}

}

Garbage Collection

Garbage collection is tough. Overaggressive garbage-collection efforts can consume large amounts of resources. Underaggressive garbage-collection methods can quickly overflow your cache. As you saw in the preceding section, the session extension handles garbage collection by calling the save_handers gc function every so often. A simple probabilistic algorithm helps ensure that sessions get collected on, even if children are short-lived.

In the php.ini file, you set session.gc_probability.When session_start() is called, a random number between 0 and session.gc_dividend (default 100) is generated, and if it is less than gc_probability, the garbage-collection function for the installed save handler is called.Thus, if session.gc_probability is set to 1, the garbage collector will be called on 1% of requests—that is, every 100 requests on average.

Garbage Collection in the files Handler

In a high-volume application, garbage collection in the files session handler is an extreme bottleneck.The garbage-collection function, which is implemented in C, basically looks like this:

function files_gc_collection($cachedir, $maxlifetime)

{

$now = time();

$dir = opendir($cachedir);

 

while(($file = readdir($dir)) !== false) {

 

if(strncmp(sess_, $file, 5)) {

continue;

}

 

if($now - filemtime($cachedir./.$file)

> $maxlifetime) {

unlink($cachedir./.$file);

 

}

 

}

 

}

The issue with this cleanup function is that extensive input/output (I/O) must be performed on the cache directory. Constantly scanning that directory can cause serious contention.

366 Chapter 14 Session Handling

One solution for this is to turn off garbage collection in the session extension completely (by setting session.gc_probability = 0) and then implement a scheduled job such as the preceding function, which performs the cleanup completely asynchronously.

Garbage Collection in the mm Handler

In contrast to garbage collection in the files handler, garbage collection in the mm handler is quite fast. Because the data is all stored in shared memory, the process simply needs to take a lock on the memory segment and then recurse the session hash in memory and expunge stale session data.

Garbage Collection in the MySession Handler

So how does the garbage collection in the MySession handler stack up against garbage collection in the files and mm handlers? It suffers from the same problems as the files handler. In fact, the problems are even worse for the MySession handler.

MySQL requires an exclusive table lock to perform deletes.With high-volume traffic, this can cause serious contention as multiple processes attempt to maintain the session store simultaneously while everyone else is attempting to read and update their session information. Fortunately, the solution from the files handler works equally well here: You can simply disable the built-in garbage-collection trigger and implement cleanup as a scheduled job.

Choosing Between Client-Side and Server-Side Sessions

In general, I prefer client-side managed sessions for systems where the amount of session data is relatively small.The magic number I use as “relatively small” is 1KB of session data. Below 1KB of data, it is still likely that the client’s request will fit into a single network packet. (It is likely below the path maximum transmission unit [MTU] for all intervening links.) Keeping the HTTP request inside a single packet means that the request will not have to be fragmented (on the network level), and this reduces latency.

When choosing a server-side session-management strategy, be very conscious of your data read/update volumes. It is easy to overload a database-backed session system on a high-traffic site. If you do decide to go with such a system, use it judiciously—only update session data where it needs to be updated.

Implementing Native Session Handlers

If you would like to take advantage of the session infrastructure but are concerned about the performance impact of having to run user code, writing your own native session handler in C is surprisingly easy. Chapter 22,“Detailed Examples and Applications,” demonstrates how to implement a custom session extension in C.

15

Building a Distributed

Environment

UNTIL NOW WE HAVE LARGELY DANCED AROUND the issue of Web clusters. Most of the

solutions so far in this book have worked under the implicit assumption that we were running a single Web server for the content. Many of those coding methods and techniques work perfectly well as you scale past one machine. A few techniques were designed with clusters in mind, but the issues of how and why to build a Web cluster were largely ignored. In this chapter we’ll address these issues.

What Is a Cluster?

A group of machines all serving an identical purpose is called a cluster. Similarly, an application or a service is clustered if any component of the application or service is served by more than one server.

Figure 15.1 does not meet this definition of a clustered service, even though there are multiple machines, because each machine has a unique roll that is not filled by any of the other machines.

Figure 15.2 shows a simple clustered service.This example has two front-end machines that are load-balanced via round-robin DNS. Both Web servers actively serve identical content.

There are two major reasons to move a site past a single Web server:

nRedundancy—If your Web site serves a critical purpose and you cannot afford even a brief outage, you need to use multiple Web servers for redundancy. No matter how expensive your hardware is, it will eventually fail, need to be replaced, or need physical maintenance. Murphy’s Law applies to IT at least as much as to any industry, so you can be assured that any unexpected failures will occur at the least convenient time. If your service has particularly high uptime requirements,

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]