
Advanced PHP Programming
.pdf
598Chapter 23 Writing SAPIs and Extending the Zend Engine
When raw_filter is called, it looks to see whether the appropriate $_RAW array exists, and if it does not, it creates it. It then assigns a copy of the original value of *val into that array. Next, it removes all the HTML tags from *val by using php_strip_tags() (the C underpinning of the PHP function strip_tags()) and sets the new (possibly shortened) length of *val.
treat_data and default_post_reader
Although the input_filter callback allows you to modify incoming variables, it does not give you complete control of the variable import process. For example, it does not allow you to avoid inserting certain variables or to change the way they are parsed from their raw form.
If you need more control, you can use two other hooks that the SAPI interface provides:
nsapi_module_struct.treat_data
nsapi_module_struct.default_post_reader
sapi_module_struct.treat_data is called by the engine when it parses the raw POST, COOKIE, and GET query string data.The default implementation breaks the raw data into key/value pairs, sanitizes the values with any registered input_filter callbacks, and inserts the values into the appropriate symbol tables.
sapi_module_struct.default_post_reader is called to parse any POST data that does not have a content type handler already associated with it.The default action is to simply swallow the entire POST contents into $HTTP_RAW_POST_DATA. If, for instance, you need to ban certain file types from ever being uploaded under any circumstances, defining a custom sapi_module_struct.default_post_reader callback might make sense.
Like input_filter, both of these callbacks can be registered at runtime in extensions by using the sapi_register_treat_data() and sapi_register_default_post_ reader() functions. In general, though, these are both very special-purpose functions. In most cases, an input_filter callback can meet your needs.
Modifying and Introspecting the Zend Engine
One of the most exciting design aspects of the Zend Engine is that its behavior is open to extension and modification. As discussed in Chapter 20, there are two ways to modify Zend Engine behavior: by using alterable function pointers and by using the Zend extension API.
Ironically, modification of engine-internal function pointers is not only the most effective way of making many changes, but it can also be done in regular PHP extensions. As a reminder, these are the four major function pointers used inside the Zend Engine:
n zend_compile_file()—zend_compile_file() is the wrapper for the lexer, parser, and code generator. It compiles a file and returns a zend_op_array.

Modifying and Introspecting the Zend Engine |
599 |
nzend_execute()—After a file is compiled, its zend_op_array is executed by zend_execute().There is also a companion zend_execute_internal() function,
which executes internal functions.
nzend_error_cb—This function is called when any error is generated in PHP.
nzend_fopen—This function implements the open call that is used internally
whenever a file needs to be opened.
The following sections present four different engine modifications that use function pointer reassignment.Then a brief section covers parts of the Zend Engine extension API.
Warnings as Exceptions
A much-requested feature that is likely to never appear in a default PHP build is the ability to automatically throw exceptions on E_WARNING class errors.This feature allows object orientation fans to convert all their error checking into exception-based checking.
The reason this feature will never get implemented as an INI-toggleable value is that it makes it nearly impossible to write portable code. If E_WARNING is a nonfatal error on some systems and requires a try{}/catch{} block in other configurations, you have a nightmare on your hands if you distribute code.
It’s a neat feature, though, and by overloading zend_error_cb, you can easily implement it as an extension.The idea is to reset zend_error_cb to a function that throws exceptions instead.
First, you need an extension framework. Here is the base code:
#ifdef HAVE_CONFIG_H #include “config.h” #endif
#include “php.h” #include “php_ini.h”
#include “ext/standard/info.h” #include “zend.h”
#include “zend_default_classes.h”
ZEND_BEGIN_MODULE_GLOBALS(warn_as_except)
ZEND_API void (*old_error_cb)(int type, const char *error_filename,
const uint error_lineno, const char *format, va_list args);
ZEND_END_MODULE_GLOBALS(warn_as_except)
ZEND_DECLARE_MODULE_GLOBALS(warn_as_except)
#ifdef ZTS
#define EEG(v) TSRMG(warn_as_except_globals_id,zend_warn_as_except_globals *,v)

600Chapter 23 Writing SAPIs and Extending the Zend Engine
#else
#define EEG(v) (warn_as_except_globals.v) #endif
void exception_error_cb(int type, const char *error_filename,
const uint error_lineno, const char *format, va_list args);
PHP_MINIT_FUNCTION(warn_as_except)
{
EEG(old_error_cb) = zend_error_cb; zend_error_cb = exception_error_cb; return SUCCESS;
}
PHP_MSHUTDOWN_FUNCTION(warn_as_except)
{
return SUCCESS;
}
PHP_MINFO_FUNCTION(warn_as_except)
{
}
function_entry no_functions[] = { {NULL, NULL, NULL} };
zend_module_entry warn_as_except_module_entry = { STANDARD_MODULE_HEADER,
“warn_as_except”,
no_functions,
PHP_MINIT(warn_as_except),
PHP_MSHUTDOWN(warn_as_except),
NULL,
NULL,
PHP_MINFO(warn_as_except),
“1.0”,
STANDARD_MODULE_PROPERTIES
};
#ifdef COMPILE_DL_WARN_AS_EXCEPT ZEND_GET_MODULE(warn_as_except) #endif
All the work happens in PHP_MINIT_FUNCTION(warn_as_except).There the old error callback is stored in old_error_cb, and zend_error_cb is set to the new error function exception_error_cb.You learned how to throw exceptions in C code in Chapter 22,

Modifying and Introspecting the Zend Engine |
601 |
“Extending PHP: Part II,” so the code for exception_error_cb should look familiar. Here it is:
void exception_error_cb(int type, const char *error_filename,
const uint error_lineno, const char *format, va_list args)
{
char *buffer; int buffer_len; TSRMLS_FETCH();
if(type == E_WARNING || type == E_USER_WARNING) {
buffer_len = vspprintf(&buffer, PG(log_errors_max_len), format, args); zend_throw_exception(zend_exception_get_default(), buffer, type); free(buffer);
}
else {
EEG(old_error_cb)(type, error_filename, error_lineno, format, args);
}
return;
}
If you compile and load this extension, the following script:
<?php
try {
trigger_error(“Testing Exception”, E_USER_WARNING);
}
catch(Exception $e) {
print “Caught this error\n”;
}
?>
yields the following output:
> php test.php
Caught this error
An Opcode Dumper
Chapter 20 uses an opcode dumper to dump the Zend Engine intermediate code into human-readable assembly language. In this section you will see how to write it.The idea is to capture the zend_op_array returned from zend_compile_file() and format it. You could write an extension function to parse a file and dump the output, but it would be more clever to write a standalone application using the embed SAPI.
You learned in Chapter 20 that a zend_op_array contains an array of zend_ops in this form:

602 Chapter 23 Writing SAPIs and Extending the Zend Engine
struct _zend_op { opcode_handler_t handler; znode result;
znode op1; znode op2;
ulong extended_value; uint lineno; zend_uchar opcode;
};
To break these down into assembly language, you need to identify the name of the operation associated with the opcode and then dump the contents of the znodes op1, op2, and result.
The mapping from ocode to operation name must be performed by hand. In zend_compile.h in the Zend source tree is a set of defines that lists all the operations. It is simple to write a script that parses them all into a function. Here’s an example of such a function:
char *opname(zend_uchar opcode)
{
switch(opcode) {
case ZEND_NOP: return “ZEND_NOP”; break; case ZEND_ADD: return “ZEND_ADD”; break; case ZEND_SUB: return “ZEND_SUB”; break; case ZEND_MUL: return “ZEND_MUL”; break; case ZEND_DIV: return “ZEND_DIV”; break; case ZEND_MOD: return “ZEND_MOD”; break; /* ... */
default: return “UNKNOWN”; break;
}
}
Then you need functions to dump the znodes and their zvals. Here’s an example:
#define BUFFER_LEN 40
char *format_zval(zval *z)
{
static char buffer[BUFFER_LEN]; int len;
switch(z->type) { case IS_NULL:
return “NULL”; case IS_LONG: case IS_BOOL:

Modifying and Introspecting the Zend Engine |
603 |
snprintf(buffer, BUFFER_LEN, “%d”, z->value.lval); return buffer;
case IS_DOUBLE:
snprintf(buffer, BUFFER_LEN, “%f”, z->value.dval); return buffer;
case IS_STRING:
snprintf(buffer, BUFFER_LEN, “\”%s\””, php_url_encode(z->value.str.val, z->value.str.len, &len));
return buffer; case IS_ARRAY: case IS_OBJECT: case IS_RESOURCE: case IS_CONSTANT:
case IS_CONSTANT_ARRAY: return “”;
default:
return “unknown”;
}
}
char *format_znode(znode *n)
{
static char buffer[BUFFER_LEN];
switch (n->op_type) { case IS_CONST:
return format_zval(&n->u.constant); break;
case IS_VAR:
snprintf(buffer, BUFFER_LEN, “$%d”, n->u.var/sizeof(temp_variable)); return buffer;
break;
case IS_TMP_VAR:
snprintf(buffer, BUFFER_LEN, “~%d”, n->u.var/sizeof(temp_variable)); return buffer;
break;
default: return “”;
break;
}
}
In the format_zval, you can safely ignore the array, object, and constant types because they do not appear in znodes.To wrap these helper functions all together, here is a function to dump the entire zend_op:

604 Chapter 23 Writing SAPIs and Extending the Zend Engine
void dump_op(zend_op *op, int num)
{
printf(“%5d %5d %30s %040s %040s %040s\n”, num, op->lineno, opname(op->opcode),
format_znode(&op->op1), format_znode(&op->op2), format_znode(&op->result)) ;
}
Then you need a function to iterate through a zend_op_array and dump the opcodes in order, as shown here:
void dump_op_array(zend_op_array *op_array)
{
if(op_array) { int i;
printf(“%5s %5s %30s %040s %040s %040s\n”, “opnum”, “line”,
“opcode”, “op1”, “op2”, “result”); for(i = 0; i < op_array->last; i++) { dump_op(&op_array->opcodes[i], i);
}
}
}
Finally, you tie them all together with a main() routine that compiles the script in question and dumps its contents. Here is a routine that does that:
int main(int argc, char **argv)
{
zend_op_array *op_array; zend_file_handle file_handle;
if(argc != 2) {
printf(“usage: op_dumper <script>\n”); return 1;
}
PHP_EMBED_START_BLOCK(argc,argv); printf(“Script: %s\n”, argv[1]); file_handle.filename = argv[1]; file_handle.free_filename = 0; file_handle.type = ZEND_HANDLE_FILENAME; file_handle.opened_path = NULL;
op_array = zend_compile_file(&file_handle, ZEND_INCLUDE TSRMLS_CC); if(!op_array) {
printf(“Error parsing script: %s\n”, file_handle.filename); return 1;
}
dump_op_array((void *) op_array);

Modifying and Introspecting the Zend Engine |
605 |
PHP_EMBED_END_BLOCK();
return 0;
}
When you compile this as you did psh earlier in this chapter, you can generate full opcode dumps for scripts.
APD
In Chapter 18,“Profiling,” you learned how to use APD for profiling PHP code. APD is a Zend extension that wraps zend_execute() to provide timings around function calls.
In its MINIT section, APD overrides both zend_execute() and zend_execute_internal() and replaces them with its own apd_execute() and apd_execute_internal(). Here is APD’s initialization function:
PHP_MINIT_FUNCTION(apd)
{
ZEND_INIT_MODULE_GLOBALS(apd, php_apd_init_globals, php_apd_free_globals); old_execute = zend_execute;
zend_execute = apd_execute; zend_execute_internal = apd_execute_internal; return SUCCESS;
}
apd_execute() and apd_execute_internal() both record the name, location, and time of the function being called.Then they use the saved execution functions to complete execution. Here is the code for both of these functions:
ZEND_API void apd_execute(zend_op_array *op_array TSRMLS_DC)
{
char *fname = NULL;
fname = apd_get_active_function_name(op_array TSRMLS_CC); trace_function_entry(fname, ZEND_USER_FUNCTION,
zend_get_executed_filename(TSRMLS_C), zend_get_executed_lineno(TSRMLS_C));
old_execute(op_array TSRMLS_CC); trace_function_exit(fname); efree(fname);
}
ZEND_API void apd_execute_internal(zend_execute_data *execute_data_ptr, int return_value_used TSRMLS_DC)
{
char *fname = NULL;

606 Chapter 23 Writing SAPIs and Extending the Zend Engine
fname =
apd_get_active_function_name(EG(current_execute_data)->op_array TSRMLS_CC); trace_function_entry(fname, ZEND_INTERNAL_FUNCTION,
zend_get_executed_filename(TSRMLS_C),
zend_get_executed_lineno(TSRMLS_C));
execute_internal(execute_data_ptr, return_value_used TSRMLS_CC); trace_function_exit(fname);
efree(fname);
}
Both of these functions perform the same core logic. First, they use the helper function apd_get_active_function_name() to identify the name of the executing function. Next, the APD function trace_function_entry() is called.This function calls APD’s logging mechanism to record entry into the function, including the file and line number the function call occurred on.
Next, APD uses PHP’s default execution function to call the passed function. After the function call completes and the execution call returns, APD calls trace_function_exit().This uses APD’s logging mechanism to record the function call exit. In addition, this method records the elapsed time since the last function call, which is how APD compiles the information necessary for profiling.
You now know the heart of the APD extension. As they say, everything else is just the details.
APC
APC follows the same pattern as APD but is a bit more complex.The core functionality in APC is overriding zend_compile_file() with an alternative that can remap, store, and retrieve the resulting zend_op_array in a shared memory cache.
Using Zend Extension Callbacks
A Zend extension is similar to a regular extension except that it implements the following defining struct:
struct _zend_extension { char *name; char *version; char *author; char *URL;
char *copyright; startup_func_t startup; shutdown_func_t shutdown; activate_func_t activate; deactivate_func_t deactivate;
message_handler_func_t message_handler; op_array_handler_func_t op_array_handler;

Modifying and Introspecting the Zend Engine |
607 |
statement_handler_func_t statement_handler; fcall_begin_handler_func_t fcall_begin_handler; fcall_end_handler_func_t fcall_end_handler; op_array_ctor_func_t op_array_ctor; op_array_dtor_func_t op_array_dtor;
int (*api_no_check)(int api_no); void *reserved2;
void *reserved3; void *reserved4; void *reserved5; void *reserved6; void *reserved7; void *reserved8; DL_HANDLE handle; int resource_number;
};
The startup, shutdown, activate, and deactivate functions behave identically to the MINIT, MSHUTDOWN, RINIT, and RSHUTDOWN functions. If a handler of a given type is registered at script compile time, the engine inserts extra opcodes at appropriate places and then calls out to the handler when those opcodes are reached during execution.
Of all the Zend Extension callbacks, the one that is by far the most useful is the statement handler.The statement handler callback inserts an additional opcode at the end of every statement in a script in which the callback is called. One of the primary uses for this sort of callback is to implement per-line profiling, stepping debuggers, or codecoverage utilities. All these applications require information to be collected and acted on in every statement that PHP executes.
The following statement handler prints the filename and line number of every executed statement in a script to stderr:
void statement_handler(zend_op_array *op_array)
{
fprintf(stderr, “%s:%d\n”, zend_get_executed_filename(TSRMLS_C), zend_get_executed_lineno(TSRMLS_C));
}
To then register it, you wrap it in this framework:
#ifdef HAVE_CONFIG_H #include “config.h” #endif
#include “php.h” #include “php_ini.h”
#include “ext/standard/info.h” #include “zend.h”
#include “zend_extensions.h”