
Advanced PHP Programming
.pdf
458 Chapter 19 Synthetic Benchmarks: Evaluating Code Blocks and Functions
Interestingly, by using this process, you see that in fact the harness overhead did bias the results for the test case:
size |
my_max |
max |
my_max/max |
10 |
0.000115 |
0.000007 |
16.41 |
100 |
0.001015 |
0.000031 |
33.27 |
1000 |
0.011421 |
0.000264 |
43.31 |
The benefit of using the built-in linear search over using a userspace search is even greater than you originally estimated, even for small arrays.
Timing Fast Functions
If you are timing very fast functions—for example, functions that perform only a few basic operations—the overhead might appear to be greater than the function call time itself (that is, it may show a negative mean time). Increasing the iteration count should improve the statistics by minimizing the effect of outliers.
Adding Custom Timer Information
Sometimes you would like to know more about a function’s resource usage than just wall-clock times. On systems that support the getrusage() call (most modern Unix systems and on Windows systems via cygwin), you can get detailed process accounting information via the getrusage() PHP function, which returns an associative array containing the values described in Table 19.1.
Table 19.1 getrusage() Resource Values
Key Value |
Description |
[ru_oublock] |
The number of input block operations |
[ru_inblock] |
The number of output block operations |
[ru_msgsnd] |
The number of SYS V IPC messages sent |
[ru_msgrcv] |
The number of SYS V IPC messages received |
[ru_maxrss] |
The maximum resident memory size |
[ru_ixrss] |
The shared memory size |
[ru_idrss] |
The data size |
[ru_minflt] |
The number of (memory_ page) reclamations |
[ru_majflt] |
The number of (memory) page faults |
[ru_nsignals] |
The number of signals received by the process |
[ru_nvcsw] |
The number of voluntary context switches |
[ru_nivcsw] |
The number of involuntary context switches |
[ru_utime.tv_sec] |
The number of seconds of user time used |
[ru_utime.tv_usec] |
The number of microseconds of user time used |
[ru_stime.tv_sec] |
The number of seconds of system time used |
[ru_stime.tv_usec] |
The number of microseconds of system time used |
|
|

Building a Benchmarking Harness |
459 |
Different systems implement these timers differently. On BSD systems the full set of statistics is available, while in Linux 2.4 kernels only ru_stime, ru_utime, ru_minflt, and ru_majflt are available.This information is still enough to make the exercise worthwhile, though.When using the standard microtime() timers, the information you get is wall-clock time, so called because it is the actual total “real” amount of time spent executing a function. If a system were only executing a single task at a time, this measure would be fine; however, the problem is that it is almost certainly handling multiple tasks concurrently. Again, because your benchmarks are all relative anyway, as long as the total amount of free processor time is the same between benchmarks, your results should be useful with the microtime() timers; but if there are peaks or lulls in system activity, significant skew can be introduced into these results.The system and user time statistics in getrusage track the actual amount of time that the process spends executing kernellevel system calls and userspace calls (respectively).This gives you a much better idea of the “true” CPU resources used by the function. Of course 10ms of uninterrupted processor time is very different from two 5ms blocks of processor time, and the getrusage statistics do not compensate for the effects of processor cache or register reuse, which vary under system load and can have a very beneficial impact on performance.
To incorporate these statistics into your benchmarking suite, you simply need to overload the setMarker() method (inherited from Benchmark_Timer), which handles statistics collection.You also need to overload the get method to handle organizing the statistics at the end of the run. Here’s how you do this:
require_once ‘Benchmark/Iterate.php’;
class RusageBench extends Benchmark_Iterate { public function setMarker($name) {
$this->markers[$name] = getrusage(); $this->markers[$name][‘ru_utime’] =
sprintf(“%6d.%06d”,$this->markers[$name][‘ru_utime.tv_sec’], $this->markers[$name][‘ru_utime.tv_usec’]);
$this->markers[$name][‘ru_stime’] = sprintf(“%6d.%06d”,$this->markers[$name][‘ru_stime.tv_sec’],
$this->markers[$name][‘ru_stime.tv_usec’]);
}
public function get() { $result = array(); $total = 0;
$iterations = count($this->markers)/2;
for ($i = 1; $i <= $iterations; $i++) { foreach( array_keys(getrusage()) as $key) {
$temp[$key] =
($this->markers[‘end_’.$i][$key] - $this->markers[‘start_’.$i][$key]);

460 Chapter 19 Synthetic Benchmarks: Evaluating Code Blocks and Functions
$result[‘mean’][$key] +=
($this->markers[‘end_’.$i][$key] - $this->markers[‘start_’.$i][$key]);
}
foreach ( array( ‘ru_stime’, ‘ru_utime’ ) as $key ) {
$result[‘mean’][$key] += ($this->markers[‘end_’.$i][$key] - $this->markers[‘start_’.$i][$key]);
}
$result[$i] = $temp;
}
foreach( array_keys(getrusage()) as $key) { $result[‘mean’][$key] /= $iterations;
}
foreach ( array( ‘ru_stime’, ‘ru_utime’ ) as $key ) { $result[‘mean’][$key] /= $iterations;
}
$result[‘iterations’] = $iterations;
return $result;
}
}
Because all the additional resource information has been added, the API has been slightly broken because the format of the return value of the get() method has been changed. Instead of the mean array key containing the mean execution time of the function, it is now an associative array of average resource utilization values.
You can put your new suite to use by looking at what happened with parse_url between PHP 4.2.3 and 4.3.0. parse_url is a built-in function that takes a URL and breaks it into its primitive components: service type, URI, query string, and so on. Prior to PHP 4.3.0 a number of bug reports said that the parse_url function’s performance was abysmally poor. For perspective, you can roll back the clocks to PHP 4.2.3 and benchmark parse_url against a userspace reimplementation:
require ‘RusageBench.inc’;
$fullurl =
“http://george:george@www.example.com:8080/foo/bar.php?example=yes#here”;
function preg_parse_url($url) {
$regex = ‘!^(([^:/?#]+):)?(//(([^/:?#@]+):([^/:?#@]+)@)?([^/:?#]*)’.
‘(:(\d+))?)?([^?#]*)(\\?([^#]*))?(#(.*))?!’;
preg_match($regex, $url, $matches); list(,,$url[‘scheme’],,$url[‘user’],$url[‘pass’],$url[‘host’], ,
$url[‘port’],$url[‘path’],,$url[‘query’]) = $matches;
return $url;
}

Building a Benchmarking Harness |
461 |
foreach(array(‘preg_parse_url’, ‘parse_url’) as $func) { $b = new RusageBench;
$b->run(‘1000’, $func, $fullurl); $result = $b->get();
print “$func\t”;
printf(“System + User Time: %1.6f\n”, $result[mean][ru_utime] + $result[mean][ru_stime]);
}
When I run this under PHP version 4.2.3, my laptop returns the following:
PHP 4.2.3
preg_parse_url System + User Time: 0.000280
parse_url |
System + User Time: 0.002110 |
So much for built-in functions always being faster! The preg_match solution is a full order of magnitude faster than parse_url.What might be causing this problem? If you delve into the 4.2.3 source code for the parse_url function, you see that the function uses the system (POSIX-compatible) regular expression library and on every iteration uses the following:
/* pseudo-C code */
regex_t re; /* locally scoped regular expression variable */
regmatch_t subs[11]; /* the equivalent of $matches in our userspace parser */ /* compile the pattern */
regcomp(&re, pattern, REG_EXTENDED);
/* execute the regex on our input string and stick the matches in subs */ regexec(&re, string, stringlen, subs, 0)
So on each iteration, you are recompiling your regular expression before executing it. In the userspace reimplementation you use preg_match, which is smart enough to cache the compiled regular expression in case it wants to use it later.
In PHP 4.3.0, the parse_url function was fixed not by adding caching to the regular expression but by hand-coding a URL parser. Here is the same code as before, executed under PHP 4.3.0
PHP 4.3.0
preg_parse_url System + User Time: 0.000210
parse_url |
System + User Time: 0.000150 |
The built-in function is now faster, as well it should be. It is worth noting that the performance edge of the built-in function over your reimplementation is only about 30%. This goes to show that it is hard to beat the Perl-Compatible Regular Expression (PCRE) functions (the preg functions) for speed when you’re parsing complex strings.

462 Chapter 19 Synthetic Benchmarks: Evaluating Code Blocks and Functions
Writing Inline Benchmarks
Tracking benchmark results over time is a good way to keep an eye on the general health of an application as a whole.To make tracking long-term data useful, you need to standardize your tests.You could do this by creating a separate test case, or you could take a cue from your unit testing experiences and include the benchmarks inline in the same file as the library they test.
For include files, which are never executed directly, you can write a benchmark so that it is run if the file is run directly:
// url.inc
function preg_parse_url() {
// ...
}
// add a check to see if we are being executed directly if( $_SERVER[‘PHP_SELF’] == _ _FILE_ _) {
// if so, run our benchmark require ‘RusageBench.inc; $testurl =
“http://george:george@www.example.com:8080/foo/bar.php?example=yes#here”;
$b = new RusageBench;
$b->run(1000, ‘preg_parse_url’, $testurl); $result = $b->get(); printf(“preg_parse_url(): %1.6f execs/sec\n”,
$result[‘mean’][‘ru_utime’] + $result[‘mean’][‘ru_stime’] );
}
Now if you include url.inc, the benchmarking loop is bypassed and the code behaves normally. If you call the library directly, however, you get these benchmark results back:
$ php /home/george/devel/Utils/Uri.inc
preg_parse_url(): 0.000215 execs/sec
Benchmarking Examples
Now that you are familiar with PEAR’s Benchmark suite and have looked at ways you can extend it to address specific needs, let’s apply those skills to some examples. Mastering any technique requires practice, and this is especially true for benchmarking. Improving code performance through small changes takes time and discipline.
The hardest part of productive tuning is not comparing two implementations; the toolset you have built in this chapter is sufficient for that.The difficulty is often in choosing good alternatives to test. Unfortunately, there is no Rosetta stone that will always guide you to the optimal solution; if there were, benchmarking would be a pointless exercise. Realizing potential solutions comes from experience and intuition, both of which only come from practice.

Benchmarking Examples |
463 |
In the following sections I cover a few examples, but to gain the best understanding possible, I recommend that you create your own. Start with a relatively simple function from your own code library and tinker with it. Don’t be discouraged if your first attempts yield slower functions; learning what patterns do not work is in many ways as important in developing good intuition as learning which do.
Matching Characters at the Beginning of a String
A common task in text processing is looking at the leading characters of strings. A common practice is to use substr in a non-assigning context to test strings. For example, to extract all the HTTP variables from $_SERVER, you might use this:
foreach( $_SERVER as $key => $val) { if(substr($key, 0, 5) == ‘HTTP_’) {
$HTTP_VARS[$key] = $val;
}
}
Although substr is a very fast call, repeated executions add up (for example, if it’s used to pick elements out of a large array). Surprising as it may seem, I have seen large applications spend a significant portion of their time in substr due to poorly implemented string parsing. A natural choice for a substr replacement in this context is strncmp, which compares the first n characters of two strings.
For example, you can use the following to compare substr to strncmp for picking out the SCRIPT_ variables from $_SERVER:
function substr_match($arr) { foreach ($arr as $key => $val) {
if (substr($key, 0, 5) == ‘SCRIPT_’) { $retval[$key] =$val;
}
}
}
function strncmp_match($arr) { foreach ($arr as $key => $val) {
if(!strncmp($key, “SCRIPT_”, 5)) { $retval[$key] =$val;
}
}
}
require “MyBench.inc”;
foreach(array(‘substr_match’, ‘strncmp_match’) as $func) {
$bm = new MyBench;
$bm->run(1000, $func, $_SERVER);
$result = $bm->get();

464 Chapter 19 Synthetic Benchmarks: Evaluating Code Blocks and Functions
printf(“$func |
%0.6f\n”, $result[‘mean’]); |
} |
|
This returns the following: |
|
substr_match |
0.000482 |
strncmp_match |
0.000406 |
A 20% speedup is not insignificant, especially on frequently executed code.
Why is substr so much slower than strncmp? substr has to allocate and write its return value and then perform a comparison; on the other hand, strncmp simply performs a character-by-character comparison of the strings. Although PHP hides all the details of memory management, the cost of allocation is still there. Over many iterations, the cost of allocating the 6 bytes for the substr result adds up.
Macro Expansions
In this example you will use benchmarking to optimize a custom macro expander. Implementing your own macro language can be useful in a number of different contexts, such as supplying limited scripting facilities in a content-management system or an email template system.You might want to be able to template some text like this:
Hello {NAME}. Welcome to {SITENAME}.
Your password for managing your account is ‘{PASSWORD}’.
And have it expanded to this:
Hello George. Welcome to example.com.
Your password for managing your account is ‘foobar’.
You can implement your macros as an associative array of matches and replacements. First, you can pull all the recipient users’ relevant information from the database:
$result = mysql_query(“SELECT * from user_profile where userid = $id”);
$userinfo = mysql_fetch_assoc($result);
Then you can merge it with an array of “stock” replacements:
$standard_elements = array(‘SITENAME’ => ‘example.com’,
‘FOOTER’ => “Copyright 2004 Example.com”
);
$macros = array_merge($userinfo, $standard_elements);
Now that you have your macro set defined, you need a macro substitution routine. As a first implementation, you can take the naive approach and iterate over the macro set, substituting as you go:
function expand_macros_v1(&$text, $macroset) {
if ($text) {
foreach ($macroset as $tag => $sub) {

Benchmarking Examples |
465 |
if (preg_match(“/\{$tag\}/”, $text)) {
$text = preg_replace(“/\{$tag\}/”, $sub, $text);
}
}
}
}
At the core of the routine is this line, which performs the substitution for each tag on the supplied text:
$text = preg_replace(“/\{$tag\}/”, $sub, $text);
You can implement a simple test to guarantee that all your variations behave the same:
require “PHPUnit.php”; require “macro_sub.inc”;
class MacroTest extends PHPUnit_TestCase { public function MacroTest($name) {
$this->PHPUnit_TestCase($name);
}
// Check that macros are correctly substituted public function testSuccessfulSub() {
$macro_set = array( ‘/\{NAME\}/’ => ‘george’); $sample_text = “Hello {NAME}”;
$expected_text = “Hello george”; $this->assertEquals($expected_text,
expand_macros($sample_text, $macro_set));
}
// Check that things which look like macros but are not are ignored function testUnmatchedMacro() {
$macro_set = array( ‘/\{NAME\}/’ => ‘george’); $sample_text = “Hello {FOO}”;
$expected_text = “Hello {FOO}”; $this->assertEquals($expected_text,
expand_macros($sample_text, $macro_set));
}
}
$suite = new PHPUnit_TestSuite(‘MacroTest’); $result = PHPUnit::run($suite);
echo $result->toString();
Next, you construct your benchmark. In this case, you can try to use data that represents realistic inputs to this function. For this example, you can say that you expect on average a 2KB text message as input, with a macro set of 20 elements, 5 of which are used on average. For test data you can create a macro set of 20 key-value pairs:

466 Chapter 19 Synthetic Benchmarks: Evaluating Code Blocks and Functions
$macros = array(
‘FOO1’ => ‘george@omniti.com’,
‘FOO2’ => ‘george@omniti.com’,
‘FOO3’ => ‘george@omniti.com’,
‘FOO4’ => ‘george@omniti.com’,
‘FOO5’ => ‘george@omniti.com’,
‘FOO6’ => ‘george@omniti.com’,
‘FOO7’ => ‘george@omniti.com’,
‘FOO8’ => ‘george@omniti.com’,
‘FOO9’ => ‘george@omniti.com’,
‘FOO10’ => ‘george@omniti.com’,
‘FOO11’ => ‘george@omniti.com’,
‘FOO12’ => ‘george@omniti.com’,
‘FOO13’ => ‘george@omniti.com’,
‘FOO14’ => ‘george@omniti.com’,
‘FOO15’ => ‘george@omniti.com’,
‘NAME’ => ‘George Schlossnagle’,
‘NICK’ => ‘muntoh’,
‘EMAIL’ => ‘george@omniti.com’,
‘SITENAME’ => ‘www.foo.com’,
‘BIRTHDAY’ => ‘10-10-73’);
For the template text, you can create a 2048KB document of random words, with the macros {NAME}, {NICK}, {EMAIL}, {SITENAME}, and {BIRTHDAY} interjected into the text. The benchmark code itself is the same you have used throughout the chapter:
$bm = new Benchmark_Iterate;
$bm->run(1000, ‘expand_macros_v1’, $text, $macros);
$result = $bm->get();
printf(“expand_macros_v1 %0.6f seconds/execution\n”, $result[‘mean’]);
The code yields this:
expand_macros_v1 0.001037 seconds/execution
This seems fast, but 100 markups per second is not terribly quick, and you can make some improvements on this routine.
First, the preg_match call is largely superfluous—you can just make the replacement and ignore any failures. Also, all the PCRE functions accept arrays as arguments for the patterns’ and substitutions’ variables.You can take advantage of that as well.You can make your routine look like this:
function expand_macros_v2(&$text, &$macroset) {
if ($text) {
preg_replace(array_keys($macroset), array_values($macroset), $text);
}
}

Benchmarking Examples |
467 |
This will work, although you will need to preprocess your macros to turn them into pure regular expressions:
function pre_process_macros(&$macroset) {
foreach( $macroset as $k => $v ) { $newarray[“{“.$k.”}”] = $v;
}
return $newarray;
}
Note
If you are feeling especially clever, you can change your SELECT to this:
SELECT NAME ‘/\{NAME\}/’, ‘/\{EMAIL\}/’
FROM userinfo
WHERE userid = $userid
The major disadvantage of this is that you are forced to recode the SELECT whenever columns are added to the table. With the SELECT * query, macros magically appear as the table definition is updated.
This gives you a significant (15%) performance benefit, as shown here:
$bm = new Benchmark_Iterate;
$bm->run(1000, ‘expand_macros_v2’, $text, pre_process_macros($macros) );
$result = $bm->get();
printf(“expand_macros_v2 %0.6f seconds/execution\n”, $result[‘mean’]);
expand_macros_v2 0.000850 seconds/execution
You can squeeze a little more improvement out of your code by trying to take advantage of the structure of your macros.Your macros are not random strings, but in fact are all quite similar to one another. Instead of having to match a regular expression for every macro, you can match them all with a single expression and then look them up by key and use an evaluated replacement expression to perform the replacement:
function expand_macros_v3(&$text, &$macroset) {
if ($text) {
$text = preg_replace(“/\{([^}]+)\}/e”,
“(array_key_exists(‘\\1’, \$macroset)?\$macroset[‘\\1’]:’{‘.’\\1’.’}’)”,
$text);
}
}
At the core of this routine is the following replacement:
$text = preg_replace(“/\{([^}]+)\}/e”,
“(array_key_exists(‘\\1’, \$macroset)?\$macroset[‘\\1’]:’{‘.’\\1’.’}’)”, $text);