Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Advanced PHP Programming

.pdf
Скачиваний:
71
Добавлен:
14.04.2015
Размер:
7.82 Mб
Скачать

478 Chapter 20 PHP and Zend Engine Internals

nopcode 1—Here the ZEND_ASSIGN handler assigns to Register 0 (the pointer to $hi) the value hello. Register 1 is also assigned to, but it is never used. Register 1 would be utilized if the assignment were being used in an expression like this:

if($hi = hello){}

nopcode 2—Here you re-fetch the value of $hi, now into Register 2.You use the op ZEND_FETCH_R because the variable is used in a read-only context.

nopcode 3ZEND_ECHO prints the value of Register 2 (or, more accurately, sends it

to the output buffering system). echo (and print, its alias) are operations that are built in to PHP itself, as opposed to functions that need to be called.

nopcode 4ZEND_RETURN is called, setting the return value of the script to 1. Even though return is not explicitly called in the script, every script contains an implicit return 1, which is executed if the script completes without return being explicitly called.

Here is a more complex example:

<?php

$hi = hello;

echo strtoupper($hi);

?>

The intermediate code dump looks similar:

opnum

line

opcode

op1

op2

result

0

2

ZEND_FETCH_W

hi

 

0

1

2

ZEND_ASSIGN

0

hello

0

2

3

ZEND_FETCH_R

hi

 

2

3

3

ZEND_SEND_VAR

2

 

 

4

3

ZEND_DO_FCALL strtoupper

 

3

5

3

ZEND_ECHO

3

 

 

6

5

ZEND_RETURN

1

 

 

Notice the differences between these two scripts.

nopcode 3—The ZEND_SEND_VAR op pushes a pointer to Register 2 (the variable $hi) onto the argument stack.This argument stack is how the called function receives its arguments. Because the function called here is an internal function (implemented in C and not in PHP), its operation is completely hidden from PHP. Later you will see how a userspace function receives arguments.

nopcode 4—The ZEND_DO_FCALL op calls the function strtoupper and indicates that Register 3 is where its return value should be set.

Here is an example of a trivial PHP script that implements conditional flow control:

<?php

$i = 0;

How the Zend Engine Works: Opcodes and Op Arrays

479

while($i < 5) {

$i++;

}

?>

opnum

line

opcode

op1

op2

result

0

2

ZEND_FETCH_W

i

 

0

1

2

ZEND_ASSIGN

0

0

0

2

3

ZEND_FETCH_R

i

 

2

3

3

ZEND_IS_SMALLER

2

5

2

4

3

ZEND_JMPZ

$3

 

 

5

4

ZEND_FETCH_RW

i

 

4

6

4

ZEND_POST_INC

4

 

4

7

4

ZEND_FREE

$5

 

 

8

5

ZEND_JMP

 

 

 

9

7

ZEND_RETURN

1

 

 

Note here that you have a ZEND_JMPZ op to set a conditional branch point (to evaluate whether you should jump to the end of the loop if $i is greater than or equal to 5) and a ZEND_JMP op to bring you back to the top of the loop to reevaluate the condition at the end of each iteration.

Observe the following in these examples:

nSix registers are allocated and used in this code, even though only two registers are ever used at any one time. Register reuse is not implemented in PHP. For large scripts, thousands of registers may be allocated.

nNo real optimization is performed on the code.This postincrement:

$i++;

could be optimized to a pre-increment:

++$i;

because it is used in a void context (that is, it is not used in an expression where the former value of $i needs to be stored.) This would save you having to stash its value in a register.

nThe jump oplines are not displayed in the debugger.This is really the fault of the assembly dumper.The Zend Engine leaves ops used for some internal purposes marked as unused.

Before we move on, there is one last important example to look at.The example showing function calls earlier in this chapter uses strtoupper, which is a built-in function. Calling a function written in PHP looks similar to that to calling a built-in function:

480

Chapter 20 PHP and Zend Engine Internals

 

 

 

 

<?php

 

 

 

 

 

 

function hello($name) {

 

 

 

 

 

echo hello\n;

 

 

 

 

 

}

 

 

 

 

 

 

hello(George);

 

 

 

 

 

?>

 

 

 

 

 

 

opnum

line

opcode

op1

op2

result

 

0

2

ZEND_NOP

 

 

 

 

1

5

ZEND_SEND_VAL George

 

 

 

2

5

ZEND_DO_FCALL

hello

 

0

 

3

7

ZEND_RETURN

1

 

 

But where is the function code? This code simply sets the argument stack (via ZEND_SEND_VAL) and calls hello, but you don’t see the code for hello anywhere.This is because functions in PHP are op arrays as well, as if they were miniature scripts. For example, here is the op array for the function hello:

FUNCTION: hello

 

 

 

 

opnum

line

opcode

op1

op2

result

0

2

ZEND_FETCH_W

name

 

0

1

2

ZEND_RECV

1

 

0

2

3

ZEND_ECHO hello%0A

 

 

3

4

ZEND_RETURN

NULL

 

 

This looks pretty similar to the inline code you’ve seen before.The only difference is ZEND_RECV, which reads off the argument stack. As with standalone scripts, even though you don’t explicitly return at the end, a ZEND_RETURN op is implicitly added, and it returns null.

Calling includes work similarly to function calls:

<?php include(file.inc); ?>

opnum

line

opcode

op1

op2

result

0

2

ZEND_INCLUDE_OR_EVAL file.inc

 

0

1

4

ZEND_RETURN

1

 

 

This illustrates an important aspect of the PHP language: All includes and requires happen at runtime. So when a script is initially parsed, the op array for that script is generated, and any functions and classes defined in its top-level file (the one that is actually run) are inserted into the symbol table; but no potentially included scripts are parsed yet. When the script is executed, if an include statement is encountered, the include is then parsed and executed on the spot. Figure 20.1 illustrates the flow of a normal PHP script.

How the Zend Engine Works: Opcodes and Op Arrays

481

 

 

 

 

 

 

Figure 20.1 The execution path of a PHP script.

This design choice has a number of repercussions:

nFlexibility—It is an oft-vaunted fact that PHP is a runtime language. One of the important things that being a runtime language means for PHP is that it supports conditional inclusion of files and conditional declaration of functions and classes. Here’s an example:

482 Chapter 20 PHP and Zend Engine Internals

if($condition) { include(file1.inc);

}

else { include(file2.inc);

}

In this example, the runtime parsing and execution of included files makes this operation more efficient (because files are included only when needed), and it eliminates the potential hassles of symbol conflicts if two files contain different implementations of the same function or class.

nSpeed—Having to actually compile includes on-the-fly means that a significant portion of a script’s execution time is spent simply compiling its dependant

includes. If a file is included twice, it must be parsed and executed twice. include_once and require_once partially solve that problem, but it is further exacerbated by the fact that PHP resets its compiler state completely between script executions. (We’ll talk about that more in a minute, as well as some ways to minimize that effect. )

Variables

Programming languages come in two basic flavors when it comes to how variables are declared:

n Statically typed—Statically typed languages include languages such as C++ or Java, where a variable is assigned a type (for example, int or String) and that type is fixed at compile time.

nDynamically typed—Dynamically typed languages include languages such as PHP, Perl, Python, and VBScript, where types are automatically inferred at runtime. If you use this:

$variable = 0;

PHP will automatically create it as an integer type.

Furthermore, there are two additional criteria for how types are enforced or converted between:

nStrongly typed—In a strongly typed language, if an expression receives an argument of the wrong type, an error is generated.Without exception, statically typed languages are strongly typed (although many allow one type to be cast, or forced to be interpreted, as another type). Some dynamically typed languages, such as Python and Ruby, have strong typing; in them, exceptions are thrown if variables are used in an incorrect context.

Variables 483

nWeakly typed—A weakly typed language does not necessarily enforce types.This is usually accompanied by autoconversion of variables to appropriate types. For instance, in this:

$string = The value of \$variable is $variable.;

$variable (which was autocast into an integer when it was first set) is now autoconverted into a string type so that it can be used to create $string.

All these typing strategies have their relative benefits and drawbacks. Static typing allows you to enforce a certain level of data validation at compile time. For this reason, dynamically typed languages tend to be slower than statically typed languages. Dynamic typing is, of course, more flexible. Most interpreted languages choose to go with dynamic typing because it fits their flexibility.

Strong typing similarly allows you a good amount of built-in data validation, in this case at runtime.Weak typing provides additional flexibility by allowing variables to autoconvert between types as necessary.The interpreted languages are pretty well split on strong typing versus weak typing. Python and Ruby (both of which bill themselves as general-purpose “enterprise” languages) implement strong typing, whereas Perl, PHP, and JavaScript implement weak typing.

PHP is both dynamically typed and weakly typed. One slight exception is the optional type checking for argument types in functions. For example, this:

function foo(User $array) { }

and this:

function bar( Exception $array) {}

enforce being passed a User or an Exception object (or one of its descendants or implementers), respectively.

To fully understand types in PHP, you need to look under the hood at the data structures used in the engine. In PHP, all variables are zvals, represented by the following C structure:

struct _zval_struct {

 

/* Variable information

*/

zvalue_value value;

/* value */

zend_uint refcount;

 

zend_uchar type;

/* active type */

zend_uchar is_ref;

 

};

 

and its complementary data container: typedef union _zvalue_value {

long lval;

/* long value */

double

dval;

/* double value */

struct

{

 

484 Chapter 20 PHP and Zend Engine Internals

char *val;

 

int len;

 

} str;

/* string value */

HashTable *ht;

/* hashtable value */

zend_object_value obj;

/* handle to an object */

} zvalue_value;

 

The zval consists of its own value (which we’ll get to in a moment), a refcount, a type, and the flag is_ref.

A zval’s refcount is the reference counter for the value associated with that variable. When you instantiate a new variable, like this, it is created with a reference count of 1:

$variable = foo;

If you create a copy of $variable, the zval for its value has its reference count incremented. So after you perform the following, the zval for foohas a reference count of 2:

$variable_copy = $variable;

If you then change count of 1, and the as follows:

$variable, it will be associated to a new zval with a reference original string foowill have its reference count decremented to 1,

$variable = bar;

When a variable falls out of scope (say it’s defined in a function and that function is returned from), or when the variable is destroyed, its zval’s reference count is decremented by one.When a zval’s refcount reaches 0, it is picked up by the garbagecollection system and its contents will be freed.

The zval type is especially interesting.The fact that PHP is a weakly typed language does not mean that variables do not have types.The type attribute of the zval specifies what the current type of the zval is; this indicates which part of the zvalue_value union should be looked at for its value.

Finally, is_ref indicates whether this zval actually holds data or is simply a reference to another zval that holds data.

The zvalue_value value is where the data for a zval is actually stored.This is a union of all the possible base types for a variable in PHP: long integers, doubles, strings, hashtables (arrays), and object handles. union in C is a composite data type that uses a minimal amount of space to store at different times different possible types. Practically, this means that the data stored for a zval is either a numeric representation, a string representation, an array representation, or an object representation, but never more than one at a time.This is in contrast to a language such as Perl, where all these potential representations can coexist (this is how in Perl you can have a variable that has entirely different representations when accessed as a string than when accessed as a number).

When you switch types in PHP (which is almost never done explicitly—almost always implicitly, when a usage demands a zval be in a different representation than it

Variables 485

currently is), zvalue_value is converted into the required format.This is why you get behavior like this:

$a = 00;

$a += 0;

echo $a;

which prints 0 and not 00 because the extra characters are silently discarded when $a is converted to an integer on the second line.

Variable types are also important in comparison.When you compare two variables with the identical operator (===), like this, the active types for the zvals are compared, and if they are different, the comparison fails outright:

$a = 0; $b = 0;

echo ($a === $b)?Match:Doesnt Match;

For that reason, this example fails.

With the is equal operator (==), the comparison that is performed is based on the active types of the operands. If the operands are strings or nulls, they are compared as strings, if either is a Boolean, they are converted to Boolean values and compared, and otherwise they are converted to numbers and compared. Although this results in the == operator being symmetrical (for example, if $a == $b is the same as $b == $a), it actually is not transitive.The following example of this was kindly provided by Dan Cowgill:

$a = 0;

$b = 0; $c = “”;

echo ($a == $b)?True:False; // True echo ($b == $c)?True:False; // True echo ($a == $c)?True:False; // False

Although transitivity may seem like a basic feature of an operator algebra, understanding how == works makes it clear why transitivity does not hold. Here are some examples:

n 0== 0 because both variables end up being converted to integers and compared.

n $b == $c because both $b and $c are converted to integers and compared.

nHowever, $a != $c because both $a and $c are strings, and when they are compared as strings, they are decidedly different.

In his commentary on this example, Dan compared this to the == and eq operators in Perl, which are both transitive.They are both transitive, though, because they are both typed comparison. == in Perl coerces both operands into numbers before performing the comparison, whereas eq coerces both operands into strings.The PHP == is not a typed comparator, though, and it coerces variables only if they are not of the same active type. Thus the lack of transitivity.

486 Chapter 20 PHP and Zend Engine Internals

Functions

You’ve seen that when a piece of code calls a function, it populates the argument stack via ZEND_SEND_VAL and uses a ZEND_DO_FCALL op to execute the function. But what does that really do? To really understand how these things work, you need to go back to even before compilation.When PHP starts up, it looks through all its registered extensions (both the ones that were compiled statically and any that were registered in the php.ini file) and registers all the functions that they define.These functions look like this:

typedef struct _zend_internal_function { /* Common elements */

zend_uchar type; zend_uchar *arg_types; char *function_name; zend_class_entry *scope; zend_uint fn_flags;

union _zend_function *prototype; /* END of common elements */

void (*handler)(INTERNAL_FUNCTION_PARAMETERS); } zend_internal_function;

The important things to note here are the type (which is always ZEND_INTERNAL_ FUNCTION, meaning that it is an extension function written in C), the function name, and the handler, which is a C function pointer to the function itself and is part of the extension code.

Registering one of these functions basically amounts to its being inserted into the global function table (a hashtable in which functions are stored).

User-defined functions are, of course, inserted by the compiler.When the compiler (by which I still mean the lexer, parser, and code generator all together) encounters a piece of code like this:

function say_hello($name)

{

echo Hello $name\n;

}

it compiles the code inside the function’s block as a new op array, creates a zend_ function with that op array, and inserts that zend_function into the global function table with its type set to ZEND_USER_FUNCTION. A zend_function looks like this:

typedef union _zend_function { zend_uchar type;

struct {

zend_uchar type; /* never used */ zend_uchar *arg_types;

char *function_name;

Classes 487

zend_class_entry *scope; zend_uint fn_flags;

union _zend_function *prototype; } common;

zend_op_array op_array; zend_internal_function internal_function;

} zend_function;

This definition can be rather confusing if you don’t recognize one of the design goals: For the most part, zend_functions are zend_internal_functions are op arrays.They are not identical structs, but all the elements that are in “common” they hold in common.Thus they can safely be casted to each other.

In practice, this means that when a ZEND_DO_FCALL op is executed, it stashes away the current scope, populates the argument stack, and looks up the requested function by name (actually by the lowercase version of the name because PHP implements caseinsensitive function names), returning a pointer to a zend_function. If the function’s type is ZEND_INTERNAL_FUNCTION, it can be recast to a zend_internal_function and executed via zend_execute_internal, which executes internal functions. Otherwise, it will be executed via zend_execute, the same function that is called to execute scripts and includes.This works because for user functions are completely identical to op arrays.

As you can likely infer from the way that PHP functions work, ZEND_SEND_VAL does not push an argument’s zval onto the argument stack; instead, it copies it and pushes the copy onto the stack.This has the consequence that unless a variable is passed by reference (with the exception of objects), changing its value in a function does not change the argument passed—it changes only the copy.To change a passed argument in a function, pass it by reference.

Classes

Classes are similar to functions in that, like functions, they are stashed in their own global symbol table; but they are more complex than functions.Whereas functions are similar to scripts (possessing the same instruction set), classes are like a miniature version of the entire execution scope.

A class is represented by a zend_class_entry, like this:

struct _zend_class_entry { char type;

char *name;

zend_uint name_length;

struct _zend_class_entry *parent; int refcount;

zend_bool constants_updated; zend_uint ce_flags;

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]