
Advanced PHP Programming
.pdf
478 Chapter 20 PHP and Zend Engine Internals
nopcode 1—Here the ZEND_ASSIGN handler assigns to Register 0 (the pointer to $hi) the value hello. Register 1 is also assigned to, but it is never used. Register 1 would be utilized if the assignment were being used in an expression like this:
if($hi = ‘hello’){}
nopcode 2—Here you re-fetch the value of $hi, now into Register 2.You use the op ZEND_FETCH_R because the variable is used in a read-only context.
nopcode 3—ZEND_ECHO prints the value of Register 2 (or, more accurately, sends it
to the output buffering system). echo (and print, its alias) are operations that are built in to PHP itself, as opposed to functions that need to be called.
nopcode 4—ZEND_RETURN is called, setting the return value of the script to 1. Even though return is not explicitly called in the script, every script contains an implicit return 1, which is executed if the script completes without return being explicitly called.
Here is a more complex example:
<?php
$hi = ‘hello’;
echo strtoupper($hi);
?>
The intermediate code dump looks similar:
opnum |
line |
opcode |
op1 |
op2 |
result |
0 |
2 |
ZEND_FETCH_W |
“hi” |
|
‘0 |
1 |
2 |
ZEND_ASSIGN |
‘0 |
“hello” |
‘0 |
2 |
3 |
ZEND_FETCH_R |
“hi” |
|
‘2 |
3 |
3 |
ZEND_SEND_VAR |
‘2 |
|
|
4 |
3 |
ZEND_DO_FCALL “strtoupper” |
|
‘3 |
|
5 |
3 |
ZEND_ECHO |
‘3 |
|
|
6 |
5 |
ZEND_RETURN |
1 |
|
|
Notice the differences between these two scripts.
nopcode 3—The ZEND_SEND_VAR op pushes a pointer to Register 2 (the variable $hi) onto the argument stack.This argument stack is how the called function receives its arguments. Because the function called here is an internal function (implemented in C and not in PHP), its operation is completely hidden from PHP. Later you will see how a userspace function receives arguments.
nopcode 4—The ZEND_DO_FCALL op calls the function strtoupper and indicates that Register 3 is where its return value should be set.
Here is an example of a trivial PHP script that implements conditional flow control:
<?php
$i = 0;


480 |
Chapter 20 PHP and Zend Engine Internals |
|
|
|
||
|
<?php |
|
|
|
|
|
|
function hello($name) { |
|
|
|
|
|
|
echo “hello\n”; |
|
|
|
|
|
|
} |
|
|
|
|
|
|
hello(“George”); |
|
|
|
|
|
|
?> |
|
|
|
|
|
|
opnum |
line |
opcode |
op1 |
op2 |
result |
|
0 |
2 |
ZEND_NOP |
|
|
|
|
1 |
5 |
ZEND_SEND_VAL “George” |
|
|
|
|
2 |
5 |
ZEND_DO_FCALL |
“hello” |
|
‘0 |
|
3 |
7 |
ZEND_RETURN |
1 |
|
|
But where is the function code? This code simply sets the argument stack (via ZEND_SEND_VAL) and calls hello, but you don’t see the code for hello anywhere.This is because functions in PHP are op arrays as well, as if they were miniature scripts. For example, here is the op array for the function hello:
FUNCTION: hello |
|
|
|
|
|
opnum |
line |
opcode |
op1 |
op2 |
result |
0 |
2 |
ZEND_FETCH_W |
“name” |
|
‘0 |
1 |
2 |
ZEND_RECV |
1 |
|
‘0 |
2 |
3 |
ZEND_ECHO “hello%0A” |
|
|
|
3 |
4 |
ZEND_RETURN |
NULL |
|
|
This looks pretty similar to the inline code you’ve seen before.The only difference is ZEND_RECV, which reads off the argument stack. As with standalone scripts, even though you don’t explicitly return at the end, a ZEND_RETURN op is implicitly added, and it returns null.
Calling includes work similarly to function calls:
<?php include(“file.inc”); ?>
opnum |
line |
opcode |
op1 |
op2 |
result |
0 |
2 |
ZEND_INCLUDE_OR_EVAL “file.inc” |
|
‘0 |
|
1 |
4 |
ZEND_RETURN |
1 |
|
|
This illustrates an important aspect of the PHP language: All includes and requires happen at runtime. So when a script is initially parsed, the op array for that script is generated, and any functions and classes defined in its top-level file (the one that is actually run) are inserted into the symbol table; but no potentially included scripts are parsed yet. When the script is executed, if an include statement is encountered, the include is then parsed and executed on the spot. Figure 20.1 illustrates the flow of a normal PHP script.

How the Zend Engine Works: Opcodes and Op Arrays |
481 |
|
|
|
|
|
|
Figure 20.1 The execution path of a PHP script.
This design choice has a number of repercussions:
nFlexibility—It is an oft-vaunted fact that PHP is a runtime language. One of the important things that being a runtime language means for PHP is that it supports conditional inclusion of files and conditional declaration of functions and classes. Here’s an example:

482 Chapter 20 PHP and Zend Engine Internals
if($condition) { include(“file1.inc”);
}
else { include(“file2.inc”);
}
In this example, the runtime parsing and execution of included files makes this operation more efficient (because files are included only when needed), and it eliminates the potential hassles of symbol conflicts if two files contain different implementations of the same function or class.
nSpeed—Having to actually compile includes on-the-fly means that a significant portion of a script’s execution time is spent simply compiling its dependant
includes. If a file is included twice, it must be parsed and executed twice. include_once and require_once partially solve that problem, but it is further exacerbated by the fact that PHP resets its compiler state completely between script executions. (We’ll talk about that more in a minute, as well as some ways to minimize that effect. )
Variables
Programming languages come in two basic flavors when it comes to how variables are declared:
n Statically typed—Statically typed languages include languages such as C++ or Java, where a variable is assigned a type (for example, int or String) and that type is fixed at compile time.
nDynamically typed—Dynamically typed languages include languages such as PHP, Perl, Python, and VBScript, where types are automatically inferred at runtime. If you use this:
$variable = 0;
PHP will automatically create it as an integer type.
Furthermore, there are two additional criteria for how types are enforced or converted between:
nStrongly typed—In a strongly typed language, if an expression receives an argument of the wrong type, an error is generated.Without exception, statically typed languages are strongly typed (although many allow one type to be cast, or forced to be interpreted, as another type). Some dynamically typed languages, such as Python and Ruby, have strong typing; in them, exceptions are thrown if variables are used in an incorrect context.

Variables 483
nWeakly typed—A weakly typed language does not necessarily enforce types.This is usually accompanied by autoconversion of variables to appropriate types. For instance, in this:
$string = “The value of \$variable is $variable.”;
$variable (which was autocast into an integer when it was first set) is now autoconverted into a string type so that it can be used to create $string.
All these typing strategies have their relative benefits and drawbacks. Static typing allows you to enforce a certain level of data validation at compile time. For this reason, dynamically typed languages tend to be slower than statically typed languages. Dynamic typing is, of course, more flexible. Most interpreted languages choose to go with dynamic typing because it fits their flexibility.
Strong typing similarly allows you a good amount of built-in data validation, in this case at runtime.Weak typing provides additional flexibility by allowing variables to autoconvert between types as necessary.The interpreted languages are pretty well split on strong typing versus weak typing. Python and Ruby (both of which bill themselves as general-purpose “enterprise” languages) implement strong typing, whereas Perl, PHP, and JavaScript implement weak typing.
PHP is both dynamically typed and weakly typed. One slight exception is the optional type checking for argument types in functions. For example, this:
function foo(User $array) { }
and this:
function bar( Exception $array) {}
enforce being passed a User or an Exception object (or one of its descendants or implementers), respectively.
To fully understand types in PHP, you need to look under the hood at the data structures used in the engine. In PHP, all variables are zvals, represented by the following C structure:
struct _zval_struct { |
|
/* Variable information |
*/ |
zvalue_value value; |
/* value */ |
zend_uint refcount; |
|
zend_uchar type; |
/* active type */ |
zend_uchar is_ref; |
|
}; |
|
and its complementary data container: typedef union _zvalue_value {
long lval; |
/* long value */ |
|
double |
dval; |
/* double value */ |
struct |
{ |
|


Variables 485
currently is), zvalue_value is converted into the required format.This is why you get behavior like this:
$a = “00”;
$a += 0;
echo $a;
which prints 0 and not 00 because the extra characters are silently discarded when $a is converted to an integer on the second line.
Variable types are also important in comparison.When you compare two variables with the identical operator (===), like this, the active types for the zvals are compared, and if they are different, the comparison fails outright:
$a = 0; $b = ‘0’;
echo ($a === $b)?”Match”:”Doesn’t Match”;
For that reason, this example fails.
With the is equal operator (==), the comparison that is performed is based on the active types of the operands. If the operands are strings or nulls, they are compared as strings, if either is a Boolean, they are converted to Boolean values and compared, and otherwise they are converted to numbers and compared. Although this results in the == operator being symmetrical (for example, if $a == $b is the same as $b == $a), it actually is not transitive.The following example of this was kindly provided by Dan Cowgill:
$a = “0”;
$b = 0; $c = “”;
echo ($a == $b)?”True”:”False”; // True echo ($b == $c)?”True”:”False”; // True echo ($a == $c)?”True”:”False”; // False
Although transitivity may seem like a basic feature of an operator algebra, understanding how == works makes it clear why transitivity does not hold. Here are some examples:
n “0” == 0 because both variables end up being converted to integers and compared.
n $b == $c because both $b and $c are converted to integers and compared.
nHowever, $a != $c because both $a and $c are strings, and when they are compared as strings, they are decidedly different.
In his commentary on this example, Dan compared this to the == and eq operators in Perl, which are both transitive.They are both transitive, though, because they are both typed comparison. == in Perl coerces both operands into numbers before performing the comparison, whereas eq coerces both operands into strings.The PHP == is not a typed comparator, though, and it coerces variables only if they are not of the same active type. Thus the lack of transitivity.

486 Chapter 20 PHP and Zend Engine Internals
Functions
You’ve seen that when a piece of code calls a function, it populates the argument stack via ZEND_SEND_VAL and uses a ZEND_DO_FCALL op to execute the function. But what does that really do? To really understand how these things work, you need to go back to even before compilation.When PHP starts up, it looks through all its registered extensions (both the ones that were compiled statically and any that were registered in the php.ini file) and registers all the functions that they define.These functions look like this:
typedef struct _zend_internal_function { /* Common elements */
zend_uchar type; zend_uchar *arg_types; char *function_name; zend_class_entry *scope; zend_uint fn_flags;
union _zend_function *prototype; /* END of common elements */
void (*handler)(INTERNAL_FUNCTION_PARAMETERS); } zend_internal_function;
The important things to note here are the type (which is always ZEND_INTERNAL_ FUNCTION, meaning that it is an extension function written in C), the function name, and the handler, which is a C function pointer to the function itself and is part of the extension code.
Registering one of these functions basically amounts to its being inserted into the global function table (a hashtable in which functions are stored).
User-defined functions are, of course, inserted by the compiler.When the compiler (by which I still mean the lexer, parser, and code generator all together) encounters a piece of code like this:
function say_hello($name)
{
echo “Hello $name\n”;
}
it compiles the code inside the function’s block as a new op array, creates a zend_ function with that op array, and inserts that zend_function into the global function table with its type set to ZEND_USER_FUNCTION. A zend_function looks like this:
typedef union _zend_function { zend_uchar type;
struct {
zend_uchar type; /* never used */ zend_uchar *arg_types;
char *function_name;

Classes 487
zend_class_entry *scope; zend_uint fn_flags;
union _zend_function *prototype; } common;
zend_op_array op_array; zend_internal_function internal_function;
} zend_function;
This definition can be rather confusing if you don’t recognize one of the design goals: For the most part, zend_functions are zend_internal_functions are op arrays.They are not identical structs, but all the elements that are in “common” they hold in common.Thus they can safely be casted to each other.
In practice, this means that when a ZEND_DO_FCALL op is executed, it stashes away the current scope, populates the argument stack, and looks up the requested function by name (actually by the lowercase version of the name because PHP implements caseinsensitive function names), returning a pointer to a zend_function. If the function’s type is ZEND_INTERNAL_FUNCTION, it can be recast to a zend_internal_function and executed via zend_execute_internal, which executes internal functions. Otherwise, it will be executed via zend_execute, the same function that is called to execute scripts and includes.This works because for user functions are completely identical to op arrays.
As you can likely infer from the way that PHP functions work, ZEND_SEND_VAL does not push an argument’s zval onto the argument stack; instead, it copies it and pushes the copy onto the stack.This has the consequence that unless a variable is passed by reference (with the exception of objects), changing its value in a function does not change the argument passed—it changes only the copy.To change a passed argument in a function, pass it by reference.
Classes
Classes are similar to functions in that, like functions, they are stashed in their own global symbol table; but they are more complex than functions.Whereas functions are similar to scripts (possessing the same instruction set), classes are like a miniature version of the entire execution scope.
A class is represented by a zend_class_entry, like this:
struct _zend_class_entry { char type;
char *name;
zend_uint name_length;
struct _zend_class_entry *parent; int refcount;
zend_bool constants_updated; zend_uint ce_flags;