Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Харьковский национальный университет радиоэлектроники

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Advanced PHP Programming

.pdf

Скачиваний:

Добавлен:

14.04.2015

Размер:

7.82 Mб

Скачать

☆

<<< < Предыдущая 20 21 22 23 24 25 26 27 28 29 30 3132 / 6832 33 34 35 36 37 38 39 40 41 42 43 44 > Следующая >>>

288 Chapter 11 Computational Reuse

$fibonacciValues[$n] = Fib($n – 2) + Fib($n – 1);

}

return $fibonacciValues[$n];

}

You can also use static class variables as accumulators. In this case, the Fib() function is moved to Fibonacci::number(), which uses the static class variable $values:

class Fibonacci {

static $values = array( 0 => 1, 1 => 1 ); public static function number($n) {

if(!is_int($n) || $n < 0) { return 0;

}

if(!self::$values[$n]) {

self::$values[$n] = self::$number[$n -2] + self::$number[$n - 1];

}

return self::$values[$n];

}

In this example, moving to a class static variable does not provide any additional functionality. Class accumulators are very useful, though, if you have more than one function that can benefit from access to the same accumulator.

Figure 11.2 illustrates the new calculation tree for Fib(5). If you view the Fibonacci calculation as a slightly misshapen triangle, you have now restricted the necessary calculations to its left edge and then directed cache reads to the nodes adjacent to the left edge.This is (n+1) + n = 2n + 1 steps, so the new calculation method is O(n). Contrast this with Figure 11.3, which shows all nodes that must be calculated in the native recursive implementation.

		Fib (5)
	Fib (4)		Fib (3)
Fib (3)	Fib (2)	Fib (2)	Fib (1)

Fib (2)

Fib (1)

Fib (0)

Fib (1)

Fib (0)

Fiib (1)

Fib (0)

Figure 11.2 The number of operations necessary to compute Fib(5) if you cache the previously seen values.

		Caching Reused Data Inside a Request		289

Figure 11.3 Calculations necessary for Fib(5) with the native implementation.

We will look at fine-grained benchmarking techniques Chapter 19,“Synthetic Benchmarks: Evaluating Code Blocks and Functions,” but comparing these routines side- by-side for even medium-size n’s (even just two-digit n’s) is an excellent demonstration of the difference between a linear complexity function and an exponential complexity function. On my system, Fib(50) with the caching algorithm returns in subsecond time. A back-of-the-envelope calculation suggests that the noncaching tree-recursive algorithm would take seven days to compute the same thing.

Caching Reused Data Inside a Request

I’m sure you’re saying,“Great! As long as I have a Web site dedicated to Fibonacci numbers, I’m set.”This technique is useful beyond mathematical computations, though. In fact, it is easy to extend this concept to more practical matters.

Let’s consider the Text_Statistics class implemented in Chapter 6,“Unit Testing,” to calculate Flesch readability scores. For every word in the document, you created a Word object to find its number of syllables. In a document of any reasonable size, you expect to see some repeated words. Caching the Word object for a given word, as well as the number of syllables for the word, should greatly reduce the amount of per-document parsing that needs to be performed.

Caching the number of syllables looks almost like caching looks for the Fibonacci Sequence; you just add a class attribute, $_numSyllables, to store the syllable count as soon as you calculate it:

class Text_Word {

public $word;

protected $_numSyllables = 0;

290 Chapter 11 Computational Reuse

// unmodified methods

public function numSyllables() {

//if we have calculated the number of syllables for this

//Word before, simply return it

if($this->_numSyllables) { return $this->_numSyllables;

}

$scratch = $this->mungeWord($this->word);

//Split the word on the vowels. a e i o u, and for us always y $fragments = preg_split(“/[^aeiouy]+/”, $scratch);

if(!$fragments[0]) { array_shift($fragments);

}

if(!$fragments[count($fragments) - 1]) { array_pop($fragments);

}

//make sure we track the number of syllables in our attribute $this->_numSyllables += $this->countSpecialSyllables($scratch); if(count($fragments)) {

$this->_numSyllables += count($fragments);

}

else {

$this->numSyllables = 1;

}

return $this->_numSyllables;

}

Now you create a caching layer for the Text_Word objects themselves.You can use a factory class to generate the Text_Word objects.The class can have in it a static associative array that indexes Text_Word objects by name:

require_once “Text/Word.inc”; class CachingFactory {

static $objects;

public function Word($name) { If(!self::$objects[Word][$name]) {

Self::$objects[Word][$name] = new Text_Word($name);

}

return self::$objects[Word][$name];

}

This implementation, although clean, is not transparent.You need to change the calls from this:

$obj = new Text_Word($name);

Caching Reused Data Inside a Request

291

to this:

$obj = CachingFactory::Word($name);

Sometimes, though, real-world refactoring does not allow you to easily convert to a new pattern. In this situation, you can opt for the less elegant solution of building the caching into the Word class itself:

class Text_Word { public $word;

private $_numSyllables = 0; static $syllableCache; function _ _construct($name) {

$this->word = $name; If(!self::$syllableCache[$name]) {

self::$syllableCache[$name] = $this->numSyllables();

}

$this->$_numSyllables = self::$syllableCache[$name];

}

This method is a hack, though.The more complicated the Text_Word class becomes, the more difficult this type of arrangement becomes. In fact, because this method results in a copy of the desired Text_Word object, to get the benefit of computing the syllable count only once, you must do this in the object constructor.The more statistics you would like to be able to cache for a word, the more expensive this operation becomes. Imagine if you decided to integrate dictionary definitions and thesaurus searches into the Text_Word class.To have those be search-once operations, you would need to perform them proactively in the Text_Word constructor.The expense (both in resource usage and complexity) quickly mounts.

In contrast, because the factory method returns a reference to the object, you get the benefit of having to perform the calculations only once, but you do not have to take the hit of precalculating all that might interest you. In PHP 4 there are ways to hack your factory directly into the class constructor:

// php4 syntax – not forward-compatible to php5 $wordcache = array();

function Word($name) { global $wordcache;

if(array_key_exists($name, $wordcache)) { $this = $wordcache[$name];

}

else {

$this->word = $name; $wordcache[$name] = $this;

}

292 Chapter 11 Computational Reuse

Reassignment of $this is not supported in PHP 5, so you are much better off using a factory class. A factory class is a classic design pattern and gives you the added benefit of separating your caching logic from the Text_Word class.

Caching Reused Data Between Requests

People often ask how to achieve object persistence over requests.The idea is to be able to create an object in a request, have that request complete, and then reference that object in the next request. Many Java systems use this sort of object persistence to implement shopping carts, user sessions, database connection persistence, or any sort of functionality for the life of a Web server process or the length of a user’s session on a Web site.This is a popular strategy for Java programmers and (to a lesser extent) mod_perl developers.

Both Java and mod_perl embed a persistent runtime into Apache. In this runtime, scripts and pages are parsed and compiled the first time they are encountered, and they are just executed repeatedly.You can think of it as starting up the runtime once and then executing a page the way you might execute a function call in a loop (just calling the compiled copy). As we will discuss in Chapter 20,“PHP and Zend Engine Internals,” PHP does not implement this sort of strategy. PHP keeps a persistent interpreter, but it completely tears down the context at request shutdown.

This means that if in a page you create any sort of variable, like this, this variable (in fact the entire symbol table) will be destroyed at the end of the request:

<? $string = ‘hello world’; ?>

So how do you get around this? How do you carry an object over from one request to another? Chapter 10,“Data Component Caching,” addresses this question for large pieces of data. In this section we are focused on smaller pieces—intermediate data or individual objects. How do you cache those between requests? The short answer is that you generally don’t want to.

Actually, that’s not completely true; you can use the serialize() function to package up an arbitrary data structure (object, array, what have you), store it, and then retrieve and unserialize it later.There are a few hurdles, however, that in general make this undesirable on a small scale:

nFor objects that are relatively low cost to build, instantiation is cheaper than unserialization.

nIf there are numerous instances of an object (as happens with the Word objects or an object describing an individual Web site user), the cache can quickly fill up, and you need to implement a mechanism for aging out serialized objects.

nAs noted in previous chapters, cache synchronization and poisoning across distributed systems is difficult.

Caching Reused Data Between Requests

293

As always, you are brought back to a tradeoff:You can avoid the cost of instantiating certain high-cost objects at the expense of maintaining a caching system. If you are careless, it is very easy to cache too aggressively and thus hurt the cacheability of more significant data structures or to cache too passively and not recoup the manageability costs of maintaining the cache infrastructure.

So, how could you cache an individual object between requests? Well, you can use the serialize() function to convert it to a storable format and then store it in a shared memory segment, database, or file cache.To implement this in the Word class, you can add a store-and-retrieve method to the Word class. In this example, you can backend it against a MySQL-based cache, interfaced with the connection abstraction layer you built in Chapter 2,“ Object-Oriented Programming Through Design Patterns”:

class Text_Word { require_once ‘DB.inc’;

//Previous class definitions

//...

function store() {

$data = serialize($this); $db = new DB_Mysql_TestDB;

$query = “REPLACE INTO ObjectCache (objecttype, keyname, data, modified) VALUES(‘Word’, :1, :2, now())”;

$db->prepare($query)->execute($this->word, $data);

}

function retrieve($name) { $db = new DB_Mysql_TestDB;

$query = “SELECT data from ObjectCache where objecttype = ‘Word’ and keyname = :1”;

$row = $db->prepare($query)->execute($name)->fetch_assoc(); if($row) {

return unserialize($row[data]);

}

else {

return new Text_Word($name);

}

Escaping Query Data

The DB abstraction layer you developed in Chapter 2 handles escaping data for you. If you are not using an abstraction layer here, you need to run mysql_real_escape_string() on the output of serialize().

To use the new Text_Word caching implementation, you need to decide when to store the object. Because the goal is to save computational effort, you can update ObjectCache in the numSyllables method after you perform all your calculations there:

294Chapter 11 Computational Reuse

function numSyllables() { if($this->_numSyllables) {

return $this->_numSyllables;

}

$scratch = $this->mungeWord($this->word); $fragments = preg_split(“/[^aeiouy]+/”, $scratch);

if(!$fragments[0]) { array_shift($fragments);

}

if(!$fragments[count($fragments) - 1]) { array_pop($fragments);

}

$this->_numSyllables += $this->countSpecialSyllables($scratch); if(count($fragments)) {

$this->_numSyllables += count($fragments);

}

else { $this->_numSyllables = 1;

}

// store the object before return it $this->store();

return $this->_numSyllables;

}

To retrieve elements from the cache, you can modify the factory to search the MySQL cache if it fails its internal cache:

class CachingFactory { static $objects; function Word($name) {

if(!self::$objects[Word][$name]) { self::$objects[Word][$name] = Text_Word::retrieve($name);

}

return self::$objects[Word][$name];

}

Again, the amount of machinery that goes into maintaining this caching process is quite large. In addition to the modifications you’ve made so far, you also need a cache maintenance infrastructure to purge entries from the cache when it gets full. And it will get full relatively quickly. If you look at a sample row in the cache, you see that the serialization for a Word object is rather large:

mysql> select data from ObjectCache where keyname = ‘the’; +---+

data +---+

Computational Reuse Inside PHP

295

O:4:”word”:2:{s:4:”word”;s:3:”the”;s:13:”_numSyllables”;i:0;} +---+

1 row in set (0.01 sec)

That amounts to 61 bytes of data, much of which is class structure. In PHP 4 this is even worse because static class variables are not supported, and each serialization can include the syllable exception arrays as well. Serializations by their very nature tend to be wordy, often making them overkill.

It is difficult to achieve any substantial performance benefit by using this sort of interprocess caching. For example, in regard to the Text_Word class, all this caching infrastructure has brought you no discernable speedup. In contrast, comparing the object-caching factory technique gave me (on my test system) a factor-of-eight speedup (roughly speaking) on Text_Word object re-declarations within a request.

In general, I would avoid the strategy of trying to cache intermediate data between requests. Instead, if you determine a bottleneck in a specific function, search first for a more global solution. Only in the case of particularly complex objects and data structures that involve significant resources is doing interprocess sharing of small data worthwhile. It is difficult to overcome the cost of interprocess communication on such a small scale.

Computational Reuse Inside PHP

PHP itself employs computational reuse in a number of places.

PCREs

Perl Compatible Regular Expressions (PCREs) consist of preg_match(), preg_replace(), preg_split(), preg_grep(), and others.The PCRE functions get their name because their syntax is designed to largely mimic that of Perl’s regular expressions. PCREs are not actually part of Perl at all, but are a completely independent compatibility library written by Phillip Hazel and now bundled with PHP.

Although they are hidden from the end user, there are actually two steps to using preg_match or preg_replace.The first step is to call pcre_compile() (a function in the PCRE C library).This compiles the regular expression text into a form understood internally by the PCRE library. In the second step, after the expression has been compiled, the pcre_exec() function (also in the PCRE C library) is called to actually make the matches.

PHP hides this effort from you.The preg_match() function internally performs pcre_compile() and caches the result to avoid recompiling it on subsequent executions. PCREs are implemented inside an extension and thus have greater control of their own memory than does user-space PHP code.This allows PCREs to not only cache compiled regular expressions with a request but between requests as well. Over time, this completely eliminates the overhead of regular expression compilation entirely.This implementation strategy is very close to the PHP 4 method we looked at earlier in this chapter for caching Text_Word objects without a factory class.

296 Chapter 11 Computational Reuse

Array Counts and Lengths

When you do something like this, PHP does not actually iterate through $array and count the number of elements it has:

$array = array(‘a‘,‘b‘,‘c‘,1,2,3);

$size = count($array);

Instead, as objects are inserted into $array, an internal counter is incremented. If elements are removed from $array, the counter is decremented.The count() function simply looks into the array’s internal structure and returns the counter value.This is an O(1) operation. Compare this to calculating count() manually, which would require a full search of the array—an O(n) operation.

Similarly, when a variable is assigned to a string (or cast to a string), PHP also calculates and stores the length of that string in an internal register in that variable. If strlen() is called on that variable, its precalculated length value is returned.This caching is actually also critical to handling binary data because the underlying C library function strlen() (which PHP’s strlen() is designed to mimic) is not binary safe.

Binary Data

In C there are no complex data types such as string. A string in C is really just an array of ASCII characters, with the end being terminated by a null character, or 0 (not the character 0, but the ASCII character for the decimal value 0.) The C built-in string functions (strlen, strcmp, and so on, many of which have direct correspondents in PHP) know that a string ends when they encounter a null character.

Binary data, on the other hand, can consist of completely arbitrary characters, including nulls. PHP does not have a separate type for binary data, so strings in PHP must know their own length so that the PHP versions of strlen and strcmp can skip past null characters embedded in binary data.