
Advanced PHP Programming
.pdf
268 Chapter 10 Data Component Caching
$this->author = $row[‘author’]; $this->long_description = $row[‘long_description’]; $this->file_url = $row[‘file_url’];
}
else {
throw new Exception;
}
}
}
You can use a store() method for saving any changes to a project back to the database:
public function store() { $dbh = new DB_Mysql_Test(); $cur = $dbh->execute(“
REPLACE INTO projects
SET
short_description = :1, author = :2, long_description = :3, file_url = :4
WHERE
name = :5”); $cur->execute($this->short_description,
$this->author, $this->long_description, $this->file_url, $this->name);
}
}
Because you are writing out cache files, you need to know where to put them.You can create a place for them by using the global configuration variable $CACHEBASE, which specifies the top-level directory into which you will place all your cache files. Alternatively, you could create a global singleton Config class that will contain all your configuration parameters. In Project, you add a class method get_cachefile() to generate the path to the Cache File for a specific project:
public function get_cachefile($name) {
global $CACHEBASE;
return “$CACHEBASE/projects/$name.cache”;
}
The project page itself is a template in which you fit the project details.This way you have a consistent look and feel across the site.You pass the project name into the page as a GET parameter (the URL will look like http://www.example.com/ project.php?name=ProjectFoo) and then assemble the page:

Integrating Caching into Application Code |
269 |
<?php
require ‘Project.inc’; try {
$name = $_GET[‘name’]; if(!$name) {
throw new Exception();
}
$project = new Project($name);
}
catch (Exception $e) {
// If I fail for any reason, I will send people here header(“Location: /index.php”);
return;
}
?>
<html>
<title><?= $project->name ?></title> <body>
<!-- boilerplate text --> <table>
<tr>
<td>Author:</td><td><?= $project->author ?> </tr>
<tr>
<td>Summary:</td><td><?= $project->short_description ?> </tr>
<tr>
<td>Availability:</td>
<td><a href=”<?= $project->file_url ?>”>click here</a></td>
</tr>
<tr>
<td><?= $project->long_description ?></td>
</tr>
</table>
</body>
</html>
You also need a page where authors can edit their pages:
<?
require_once ‘Project.inc’; $name = $_REQUEST[‘name’]; $project = new Project($name);
if(array_key_exists(“posted”, $_POST)) { $project->author = $_POST[‘author’];
$project->short_description = $_POST[‘short_description’];

270 Chapter 10 Data Component Caching
$project->file_url = $_POST[‘file_url’]; $project->long_description = $_POST[‘long_description’]; $project->store();
}
?>
<html>
<title>Project Page Editor for <?= $project->name ?> </title> <body>
<form name=”editproject” method=”POST”>
<input type =”hidden” name=”name” value=”<?= $name ?>”> <table>
<tr>
<td>Author:</td>
<td><input type=”text” name=author value=”<?= $project->author ?>” ></td>
</tr>
<tr>
<td>Summary:</td>
<td>
<input type=”text”
name=short_description
value=”<?= $project->short_description ?>”>
</td>
</tr>
<tr>
<td>Availability:</td>
<td><input type=”text” name=file_url value=”<?= $project->file_url?>”></td>
</tr>
<tr>
<td colspan=2>
<TEXTAREA name=”long_description” rows=”20” cols=”80”><?= $project-> long_description ?></TEXTAREA>
</td>
</tr>
</table>
<input type=submit name=posted value=”Edit content”>
</form>
</body>
</html>
The first caching implementation is a direct application of the class Cache_File you developed earlier:
<?php
require_once ‘Cache_File.inc’; require_once ‘Project.inc’; try {
$name = $_GET[‘name’];

Integrating Caching into Application Code |
271 |
if(!$name) {
throw new Exception();
}
$cache = new Cache_File(Project::get_cachefile($name)); if($text = $cache->get()) {
print $text; return;
}
$project = new Project($name);
}
catch (Exception $e) {
// if I fail, I should go here header(“Location: /index.php”);
return;
}
$cache->begin();
?>
<html>
<title><?= $project->name ?></title> <body>
<!-- boilerplate text --> <table>
<tr>
<td>Author:</td><td><?= $project->author ?> </tr>
<tr>
<td>Summary:</td><td><?= $project->short_description ? > </tr>
<tr>
<td>Availability:</td><td><a href=”<?= $project->file_url ?>”>click here</a></td>
</tr>
<tr>
<td><?= $project->long_description ?></td> </tr>
</table>
</body>
</html>
<?php $cache->end();
?>
To this point, you’ve provided no expiration logic, so the cached copy will never get updated, which is not really what you want.You could add an expiration time to the page, causing it to auto-renew after a certain period of time, but that is not an optimal solution. It does not directly address your needs.The cached data for a project will in

272 Chapter 10 Data Component Caching
fact remain forever valid until someone changes it.What you would like to have happen is for it to remain valid until one of two things happens:
nThe page template needs to be changed
nAn author updates the project data
The first case can be handled manually. If you need to update the templates, you can change the template code in project.php and remove all the cache files.Then, when a new request comes in, the page will be recached with the correct template.
The second case you can handle by implementing cache-on-write in the editing page. An author can change the page text only by going through the edit page.When the changes are submitted, you can simply unlink the cache file.Then the next request for that project will cause the cache to be generated.The changes to the edit page are extremely minimal—three lines added to the head of the page:
<?php
require_once ‘Cache/File.inc’; require_once ‘Project.inc’; $name = $_REQUEST[‘name’]; $project = new Project($name);
if(array_key_exists(“posted”, $_POST)) { $project->author = $_POST[‘author’];
$project->short_description = $_POST[‘short_description’]; $project->file_url = $_POST[‘file_url’]; $project->long_description = $_POST[‘long_description’]; $project->store();
// remove our cache file
$cache = new Cache_File(Project::get_cachefile($name)); $cache->remove();
}
?>
When you remove the cache file, the next user request to the page will fail the cache hit on project.php and cause a recache.This can result in a momentary peak in resource utilization as the cache files are regenerated. In fact, as discussed earlier in this section, concurrent requests for the page will all generate dynamic copies in parallel until one finishes and caches a copy.
If the project pages are heavily accessed, you might prefer to proactively cache the page.You would do this by reaching it instead of unlinking it on the edit page.Then there is no worry of contention. One drawback of the proactive method is that it works poorly if you have to regenerate a large number of cache files. Proactively recaching 100,000 cache files may take minutes or hours, whereas a simple unlink of the cache backing is much faster.The proactive caching method is effective for pages that have a high cache hit rate. It is often not worthwhile if the cache hit rate is low, if there is

Integrating Caching into Application Code |
273 |
limited storage for cache files, or if a large number of cache files need to be invalidated simultaneously.
Recaching all your pages can be expensive, so you could alternatively take a pessimistic approach to regeneration and simply remove the cache file.The next time the page is requested, the cache request will fail, and the cache will be regenerated with current data. For applications where you have thousands or hundreds of thousands of cached pages, the pessimistic approach allows cache generation to be spread over a longer period of time and allows for “fast” invalidation of elements of the cache.
There are two drawbacks to the general approach so far—one mainly cosmetic and the other mainly technical:
nThe URL http://example.com/project.php?project=myproject is less appealing than http://example.com/project/myproject.html.This is not entirely a cosmetic issue.
nYou still have to run the PHP interpreter to display the cached page. In fact, not only do you need to start the interpreter to parse and execute project.php, you also must then open and read the cache file.When the page is cached, it is entirely static, so hopefully you can avoid that overhead as well.
You could simply write the cache file out like this:
/www/htdocs/projects/myproject.html
This way, it could be accessed directly by name from the Web; but if you do this, you lose the ability to have transparent regeneration. Indeed, if you remove the cache file, any requests for it will return a “404 Object Not Found” response.This is not a problem if the page is only changed from the user edit page (because that now does cache-on- write); but if you ever need to update all the pages at once, you will be in deep trouble.
Using Apache’s mod_rewrite for Smarter Caching
If you are running PHP with Apache, you can use the very versatile mod_rewrite so that you can cache completely static HTML files while still maintaining transparent regeneration.
If you run Apache and have not looked at mod_rewrite before, put down this book and go read about it. Links are provided at the end of the chapter. mod_rewrite is very, very cool.
mod_rewrite is a URL-rewriting engine that hooks into Apache and allows rulebased rewriting of URLs. It supports a large range of features, including the following:
nInternal redirects, which change the URL served back to the client completely internally to Apache (and completely transparently)
nExternal redirects
nProxy requests (in conjunction with mod_proxy)

274 Chapter 10 Data Component Caching
It would be easy to write an entire book on the ways mod_rewrite can be used. Unfortunately, we have little time for it here, so this section explores its configuration only enough to address your specific problem.
You want to be able to write the project.php cache files as full HTML files inside the document root to the path /www/htdocs/projects/ProjectFoo.html.Then people can access the ProjectFoo home page simply by going to the URL http://www. example.com/projects/ProjectFoo.html.Writing the cache file to that location is easy—you simply need to modify Project::get_cachefile() as follows:
function get_cachefile($name) { $cachedir = “/www/htdocs/projects”; return “$cachedir/$name.html”;
}
The problem, as noted earlier, is what to do if this file is not there. mod_rewrite provides the answer.You can set up a mod_rewrite rule that says “if the cache file does not exist, redirect me to a page that will generate the cache and return the contents.” Sound simple? It is.
First you write the mod_rewrite rule:
<Directory /projects>
RewriteEngine On
RewriteCond /www/htdocs/%{REQUEST_FILENAME} !-f
RewriteRule ^/projects/(.*).html /generate_project.php?name=$1
</Directory>
Because we’ve written all the cache files in the projects directory, you can turn on the rewriting engine there by using RewriteEngine On.Then you use the RewriteCond rule to set the condition for the rewrite:
/www/htdocs/%{REQUEST_FILENAME} !-f
This means that if /www/htdocs/${REQUEST_FILENAME} is not a file, the rule is successful. So if /www/htdocs/projects/ProjectFoo.html does not exist, you move on to the rewrite:
RewriteRule ^/projects/(.*).html /generate_project.php?name=$1
This tries to match the request URI (/projects/ProjectFoo.html) against the following regular expression:
^/projects/(.*).html
This stores the match in the parentheses as $1 (in this case, ProjectFoo). If this match succeeds, an internal redirect (which is completely transparent to the end client) is created, transforming the URI to be served into /generate_project.php?name=$1 (in this case, /generate_project.php?name=ProjectFoo).

Integrating Caching into Application Code |
275 |
All that is left now is generate_project.php. Fortunately, this is almost identical to the original project.php page, but it should unconditionally cache the output of the page. Here’s how it looks:
<?php
require ‘Cache/File.inc’; require ‘Project.inc’; try {
$name = $_GET[name]; if(!$name) {
throw new Exception;
}
$project = new Project($name);
}
catch (Exception $e) {
// if I fail, I should go here header(“Location: /index.php”); return;
}
$cache = new Cache_File(Project::get_cachefile($name)); $cache->begin();
?>
<html>
<title><?= $project->name ?></title> <body>
<!-- boilerplate text --> <table>
<tr>
<td>Author:</td><td><?= $project->author ?> </tr>
<tr>
<td>Summary:</td><td><?= $project->short_description ?> </tr>
<tr>
<td>Availability:</td>
<td><a href=”<?= $project->file_url ?>”>click here</a></td>
</tr>
<tr>
<td><?= $project->long_description ?></td>
</tr>
</table>
</body>
</html>
<?php
$cache->end();
?>

276 Chapter 10 Data Component Caching
An alternative to using mod_rewrite is to use Apache’s built-in support for custom error pages via the ErrorDocument directive.To set this up, you replace your rewrite rules in your httpd.conf with this directive:
ErrorDocument 404 /generate_project.php
This tells Apache that whenever a 404 error is generated (for example, when a requested document does not exist), it should internally redirect the user to /generate_project.php.This is designed to allow a Web master to return custom error pages when a document isn’t found. An alternative use, though, is to replace the functionality that the rewrite rules provided.
After you add the ErrorDocument directive to your httpd.conf file, the top block of generate_project.php needs to be changed to use $_SERVER[‘REQUEST_URI’] instead of having $name passed in as a $_GET[] parameter.Your generate_project.php now looks like this:
<?php
require ‘Cache/File.inc’; require ‘Project.inc’; try {
$name = $_SERVER[‘REQUEST_URI’]; if(!$name) {
throw new Exception;
}
$project = new Project($name);
}
catch (Exception $e) {
// if I fail, I should go here header(“Location: /index.php”); return;
}
$cache = new Cache_File(Project::get_cachefile($name)); $cache->begin();
?>
Otherwise, the behavior is just as it would be with the mod_rewrite rule.
Using ErrorDocument handlers for generating static content on-the-fly is very useful if you do not have access over your server and cannot ensure that it has
available. Assuming that I control my own server, I prefer to use mod_rewrite. mod_rewrite is an extremely flexible tool, which means it is easy to apply more complex logic for cache regeneration if needed.
In addition, because the ErrorDocument handler is called, the page it generates is returned with a 404 error code. Normally a “valid” page is returned with a 200 error code, meaning the page is okay. Most browsers handle this discrepancy without any problem, but some tools do not like getting a 404 error code back for content that is

Integrating Caching into Application Code |
277 |
valid.You can overcome this by manually setting the return code with a header() command, like this:
header(“$_SERVER[‘SERVER_PROTOCOL’] 200”);
Caching Part of a Page
Often you cannot cache an entire page but would like to be able to cache components of it. An example is the personalized navigation bar discussed earlier in this chapter, in the section “Cookie-Based Caching.” In that case, you used a cookie to store the user’s navigation preferences and then rendered them as follows:
<?php
$userid = $_COOKIE[‘MEMBERID’]; $user = new User($userid); if(!$user->name) {
header(“Location: /login.php”);
}
$navigation = $user->get_interests(); ?>
<table>
<tr>
<td>
<table>
<tr><td>
<?= $user->name %>’s Home </td></tr>
<?php for($i=1; $i<=3; $i++) { ?> <tr><td>
<!-- navigation row position <?= $i ?> -->
<?= generate_navigation_element($navigation[$i]) ?> </td></tr>
<?php } ?> </table>
</td>
<td>
<!-- page body (static content identical for all users) --> </td>
</tr>
</table>
You tried to cache the output of generate_navigation_component(). Caching the results of small page components is simple. First, you need to write generate_navigation_element. Recall the values of $navigation, which has
topic/subtopic pairs such as sports-football, weather-21046, project-Foobar, and news-global.You can implement generate_navigation as a dispatcher that calls out to an appropriate content-generation function based on the topic passed, as follows: