PHP tokens and opcodes : 3 useful extensions for understanding the working of Zend Engine

“PHP tokens and opcodes” – When a PHP script is executed it goes through a number of processes, before the final result is displayed. These processes are namely: Lexing, Parsing, Compiling and Executing. In this blog post, I will walk you through all these processes with a sample example. In the end I will list some useful PHP extensions, which can be used to analyze results of every intermediate process.

Lets take a sample PHP script as an example:

<?php
	function increment($a) {
		return $a+1;
	}
	$a = 3;
	$b = increment($a);
	echo $b;
?>

Try running this script through command line:

~ sabhinav$ php -r debug.php
4

This PHP script goes through the following processes before outputting the result:

  • Lexing: The php code inside debug.php is converted into tokens
  • Parsing: During this stage, tokens are processed to derive at meaningful expressions
  • Compiling: The derived expressions are compiled into opcodes
  • Execution: Opcodes are executed to derive at the final result

Lets see how a PHP script passes through all the above steps.

Lexing:
During this stage human readable php script is converted into token. For the first two lines of our PHP script:

<?php
	function increment($a) {

tokens will look like this (try to match the tokens below line by line with the above 2 lines of PHP code and you will get a feel):

~ sabhinav$ php -r 'print_r(token_get_all(file_get_contents("debug.php")));';
Array
(
    [0] => Array
        (
            [0] => 368             // 368 is the token number and it's symbolic name is T_OPEN_TAG, see below
            [1] => <?php

            [2] => 1
        )

    [1] => Array
        (
            [0] => 371
            [1] =>
            [2] => 2
        )

    [2] => Array
        (
            [0] => 334
            [1] => function
            [2] => 2
        )

    [3] => Array
        (
            [0] => 371
            [1] =>
            [2] => 2
        )

    [4] => Array
        (
            [0] => 307
            [1] => increment
            [2] => 2
        )

    [5] => (
    [6] => Array
        (
            [0] => 309
            [1] => $a
            [2] => 2
        )

    [7] => )
    [8] => Array
        (
            [0] => 371
            [1] =>
            [2] => 2
        )

    [9] => {
    [10] => Array
        (
            [0] => 371
            [1] =>

            [2] => 2
        )

A list of parser tokens can be found here: http://www.php.net/manual/en/tokens.php

Every token number has a symbolic name attached with it. Below is our PHP script with human readable code replaced by symbolic name for each generated token:

~ sabhinav$ php -r '$tokens = (token_get_all(file_get_contents("debug.php"))); foreach($tokens as $token) { if(count($token) == 3) { echo token_name($token[0]); echo $token[1]; echo token_name($token[2]);  }  }';
T_OPEN_TAG<?php
UNKNOWNT_WHITESPACE	UNKNOWNT_FUNCTIONfunctionUNKNOWNT_WHITESPACE UNKNOWNT_STRINGincrementUNKNOWNT_VARIABLE$aUNKNOWNT_WHITESPACE UNKNOWNT_WHITESPACE
		UNKNOWNT_RETURNreturnUNKNOWNT_WHITESPACE UNKNOWNT_VARIABLE$aUNKNOWNT_LNUMBER1UNKNOWNT_WHITESPACE
	UNKNOWNT_WHITESPACE
	UNKNOWNT_VARIABLE$aUNKNOWNT_WHITESPACE UNKNOWNT_WHITESPACE UNKNOWNT_LNUMBER3UNKNOWNT_WHITESPACE
	UNKNOWNT_VARIABLE$bUNKNOWNT_WHITESPACE UNKNOWNT_WHITESPACE UNKNOWNT_STRINGincrementUNKNOWNT_VARIABLE$aUNKNOWNT_WHITESPACE
	UNKNOWNT_ECHOechoUNKNOWNT_WHITESPACE UNKNOWNT_VARIABLE$bUNKNOWNT_WHITESPACE
UNKNOWN

Parsing and Compiling:
By generating the tokens in the above step, zend engine is able to recognize each and every detail in the script. Where the spaces are, where are the new line characters, where is a user defined function and what not. Over the next two stages, the generated tokens are parsed and then compiled into opcodes. Below is the compiled opcode for the complete sample script of ours:

~ sabhinav$ php -r '$op_codes = parsekit_compile_file("debug.php", $errors, PARSEKIT_SIMPLE); print_r($op_codes); print_r($errors);';
Array
(
    [0] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
    [1] => ZEND_NOP UNUSED UNUSED UNUSED
    [2] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
    [3] => ZEND_ASSIGN T(0) T(0) 3
    [4] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
    [5] => ZEND_EXT_FCALL_BEGIN UNUSED UNUSED UNUSED
    [6] => ZEND_SEND_VAR UNUSED T(0) 0x1
    [7] => ZEND_DO_FCALL T(1) 'increment' 0x83E710CA
    [8] => ZEND_EXT_FCALL_END UNUSED UNUSED UNUSED
    [9] => ZEND_ASSIGN T(2) T(0) T(1)
    [10] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
    [11] => ZEND_ECHO UNUSED T(0) UNUSED
    [12] => ZEND_RETURN UNUSED 1 UNUSED
    [function_table] => Array
        (
            [increment] => Array
                (
                    [0] => ZEND_EXT_NOP UNUSED UNUSED UNUSED
                    [1] => ZEND_RECV T(0) 1 UNUSED
                    [2] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
                    [3] => ZEND_ADD T(0) T(0) 1
                    [4] => ZEND_RETURN UNUSED T(0) UNUSED
                    [5] => ZEND_EXT_STMT UNUSED UNUSED UNUSED
                    [6] => ZEND_RETURN UNUSED NULL UNUSED
                )

        )

    [class_table] =>
)

As we can see above, Zend engine is able to recognize the flow of our PHP. For instance, [3] => ZEND_ASSIGN T(0) T(0) 3 is a replacement for $a = 3; in our PHP code. Read on to understand what do these T(0) in the opcode means.

Executing the opcodes:
The generated opcode is executed one by one. Below table shows various details as every opcode is executed:

~ sabhinav$ php -d vld.active=1 -d vld.execute=0 -f debug.php
Branch analysis from position: 0
Return found
filename:       /Users/sabhinav/Workspace/interview/facebook/peaktraffic/debug.php
function name:  (null)
number of ops:  13
compiled vars:  !0 = $a, !1 = $b
line     #  op                           fetch          ext  return  operands
-------------------------------------------------------------------------------
   2     0  EXT_STMT
         1  NOP
   5     2  EXT_STMT
         3  ASSIGN                                                   !0, 3
   6     4  EXT_STMT
         5  EXT_FCALL_BEGIN
         6  SEND_VAR                                                 !0
         7  DO_FCALL                                      1          'increment'
         8  EXT_FCALL_END
         9  ASSIGN                                                   !1, $1
   7    10  EXT_STMT
        11  ECHO                                                     !1
   8    12  RETURN                                                   1

Function increment:
Branch analysis from position: 0
Return found
filename:       /Users/sabhinav/Workspace/interview/facebook/peaktraffic/debug.php
function name:  increment
number of ops:  7
compiled vars:  !0 = $a
line     #  op                           fetch          ext  return  operands
-------------------------------------------------------------------------------
   2     0  EXT_NOP
         1  RECV                                                     1
   3     2  EXT_STMT
         3  ADD                                              ~0      !0, 1
         4  RETURN                                                   ~0
   4     5* EXT_STMT
         6* RETURN                                                   null

End of function increment.

First table represents the main loop run, while second table represents the run of user defined function in the php script. compiled vars: !0 = $a tells us that internally while script execution !0 = $a and hence now we can relate [3] => ZEND_ASSIGN T(0) T(0) 3 very well.

Above table also returns back the number of operations number of ops: 13 which can be used to benchmark and performance enhancement of your PHP script.

If APC cache is enabled, it caches the opcodes and thereby avoiding repetitive lexing/parsing/compiling every time same PHP script is called.

3 PHP extensions providing interface to Zend Engine:
Below are 3 very useful PHP extensions for geeky PHP developers. (Specially helpful for all PHP extension developers)

  • Tokenizer: The tokenizer functions provide an interface to the PHP tokenizer embedded in the Zend Engine. Using these functions you may write your own PHP source analyzing or modification tools without having to deal with the language specification at the lexical level.
  • Parsekit: These parsekit functions allow runtime analysis of opcodes compiled from PHP scripts.
  • Vulcan Logic Disassembler (vld): Provides functionality to dump the internal representation of PHP scripts. Homepage of VLD project for download instructions.

Hope this is of some help for PHP geeks out there.
Enjoy!

PHP Extensions – How and Why?

In this short post we will quickly see:

  1. How to write PHP extensions?
  2. Why to write PHP extensions?

However before you could understand what we are going to disucss, I will recommend you to read one of  my previous post How does PHP echo’s a “Hello World”? – Behind the scene . In this post I discussed in brief the backend architecture of PHP.

Assuming you have read the previous post, lets discuss on how to build our first PHP extension:

  1. Every PHP extension is built out of minimum of 2 files.
  2. Configuration file (config.m4) which tells us what files to build and what external libraries are needed.
  3. Source File(s) which will contain the actual functionalities provided by the extension.

Building a sample extension skeleton
Lets start with building and understanding a sample extension skeleton. Then we will move ahead with building our first PHP extension:

config.m4

PHP_ARG_ENABLE(sample,
        [Whether to enable the "sample" extension],
        [-enable-sample  Enable "sample" extension support])
if test $PHP_SAMPLE != "no"; then
        PHP_SUBST(SAMPLE_SHARED_LIBADD)
        PHP_NEW_EXTENSION(sample,sample.c,$ext_shared)  // 1st argument declares the module
                                                        // 2nd tells what all files to compile
                                                        // $ext_shared is counterpart of PHP_SUBST()
fi

I found a number of articles on internet which gives you code for your first PHP extension but none of them go ahead and explain each and every word in those codes. Lets give an attempt in understanding every bit of this strange config file.

  1. This is a minimalistic config file which is required for an extension
  2. The first parameter to PHP_ARG_ENABLE(), sets up a ./configure option called -enable-sample
  3. The second parameter to PHP_ARG_ENABLE() will be displayed during the ./configure process as it reaches this configuration file
  4. Third parameter will be displayed as an option if end user issues ./configure -help

For Newbies: Wondering what is this ./configure option? Kindly read PHP: Installation on Unix System for details.

Lets understand the remaining part of the config.m4 file:

  1. To compile an extension we follow 3 steps: (i) phpize (ii) ./configure -enable-sample (iii) make
  2. When we call ./configure -enable-sample in step (ii), a local environmental variable $PHP_SAMPLE is set to yes. (PS: If our extension name was Hello, then $PHP_HELLO would have been set to yes)
  3. PHP_SUBST() is a MACRO similar to AC_SUBST() in C and is necessary to build the extension as a shared module
  4. PHP_NEW_EXTENSION() declares the module and tell source files that must be compiled as part of the extension. $ext_shared is a counterpart of PHP_SUBST() and is necessary for buildin an extension as a shared module

(PS: We only have a single source file i.e. sample.c for this extension. If in case we had more than a single source file then last line of config.m4 would have been something like this: PHP_NEW_EXTENSION(sample,sample1.c sample2.c sample3.c,$ext_shared) and so on)

Now lets build our source file skeleton. Let’s segregate certain type of data in a header file, which we will finally include in sample.c file. This is generally a good practice rather than maintaining a single source file.

php_sample.h

#ifndef PHP_SAMPLE_H
  #define PHP_SAMPLE_H
  #define PHP_SAMPLE_EXTNAME "sample"
  #define PHP_SAMPLE_EXTVER "1.0"

  #ifdef HAVE_CONFIG_H
    #include "config.h"
  #endif

  #include "php.h"
  extern zend_module_entry sample_module_entry;
  #define phpext_sample_ptr &sample_module_entry
#endif

Do not leave this page on seeing this code. It’s all very simple if you have ever written some code in C.
All that this file wants to do is:

  1. config.h file is included when compiled using phpize tool
  2. It also includes php.h from the PHP source tree. With inclusion of php.h, many other .h files also gets included and hence making available a lot of PHP API’s, which can be used by this extension
  3. The zend_module_entry struct is defined as extern so that it can be picked up by ZEND engine using dlopen() and dlsym() functions, when the module loads.

sample.c

#include "php_sample.h"

zend_module_entry sample_module_entry = {
  #if ZEND_MODULE_API_NO >= 20010901
    STANDARD_MODULE_HEADER,        // Roughly means if PHP Version > 4.2.0
  #endif
    PHP_SAMPLE_EXTNAME,        // Define PHP extension name
    NULL,        /* Functions */
    NULL,        /* MINIT */
    NULL,        /* MSHUTDOWN */
    NULL,        /* RINIT */
    NULL,        /* RSHUTDOWN */
    NULL,        /* MINFO */
  #if ZEND_MODULE_API_NO >= 20010901
    PHP_SAMPLE_EXTVER,        // Roughly means if PHP Version > 4.2.0
  #endif
    STANDARD_MODULE_PROPERTIES
};
#ifdef COMPILE_DL_SAMPLE
  ZEND_GET_MODULE(sample)      // Common for all PHP extensions which are build as shared modules
#endif


Thats it! We have our first PHP extension ready. Compile this module as discussed above i.e.
(i) phpize
(ii) ./configure -enable-sample
(iii) make
check your phpinfo() and see if you have an extension called “sample” loaded successfully or not.

Though this extension is capable of doing nothing, but the skeleton here is the base for every PHP extension. Lets recap in short what has happened till now:

RECAP

  1. config.m4 file is the configuration file for extension
  2. It declared the extension, tells what all files are required for the extension to build, add a few ./configure -help options too
  3. On the other hand sample.c and php_sample.h are the main source files.
  4. The header file includes the config.h and php.h header files from PHP source tree, which additionally provides a number of PHP API’s which can be used
  5. As discussed in last blog post, every extension have the following modules: MINIT, RINIT, RSHUTDOWN, MSHUTDOWN. sample.c helps in telling PHP, which part of the code corresponds to the above module
  6. For our extension “sample” we have defined NULL as MINIT, RINIT, RSHUTDOWN and MSHUTDOWN and hence this module isn’t capable of doing anything

Building a Hello World Extension
To build an extension which actually do something, we will need to just tweak the abiove skeleton. Here we are trying to build an extension which will provide us with a function called sample_hello_world(), which we can use directly in our php codes to output Hello World!

Quickest link between userspace and extension code is the PHP_FUNCTION(). Start by adding the following code block near the top of sample.c file just after
#include “php_sample.h”

PHP_FUNCTION(sample_hello_world) {
  php_printf("Hello World!n");
}

PHP_FUNCTION() is basically a MACRO which expands internally. (I will skip this expansion as of now to keep this post as simple as possible)

But simply declaring the function isn’t enough. The ZE needs to know the address of the function as well as how the function name should be exported to the userspace. Place the following block of code immediately after PHP_FUNCTION() block:

static function_entry php_sample_functions[] = {
    PHP_FE(sample_hello_world,NULL)
    {NULL,NULL,NULL}
};

The php_sample_functions vector is a NULL terminated vector that will grow as we continue to add more functionality to sample extension. Every function we export will appear as an item in this vector.

PHP_FE(sample_hello_world,NULL) expands to {sample_hello_world, zif_sample_hello_world, NULL} and hence providing a name and an address to implement it.

Finally simply go to the sample_module_entry struct and replace:
NULL /* functions */ with
php_sample_functions /* functions */

Now simply rebuild the extension and then try this on command line:
$ php -r ‘sample_hello_world();’

If everything was done perfectly, you would see “Hello World!” output on the shell.

PS: In this tutorial I have tried to explain each and every line which is involved in making a hello world php extension. However I am sure that many questions are still un-answered. Feel free to ask any doubt or correct me in case I have made a blunder while penning this down.

How does PHP echo’s a “Hello World”? – Behind the scene

Have you ever wondered how PHP echo’s a “Hello World” for you on the browser? Even I didn’t until I read about the PHP internals and extensions. I thought may be a few out there will be interested in exploring the other side of PHP, so here we go.

In my last post I discussed in brief “How your browser reaches to my server when you type http://abhinavsingh.com in address bar?”. Read through if you have missed out on that. Here I will discuss in brief “How does PHP churns out the content requested on the webpage?”

An Overview
Here is what happens step-wise:

  1. We never start any PHP daemon or anything by ourself. When we start Apache, it starts the PHP interpreter along itself
  2. PHP is linked to Apache (In general term SAPI i.e. a Server API) using mod_php5.so module
  3. PHP as a whole consists of 3 modules (Core PHP, Zend Engine and Extension Layer)
  4. Core PHP is the module which handles the requests, file streams, error handling and other such operations
  5. Zend Engine(ZE) is the one which converts human readable code into machine understandable tokens/op-codes. Then it executes this generate code into a Virtual Machine.
  6. Extensions are a bunch of functions, classes, streams made available to the PHP scripts, which can be used to perform certain tasks. For example, we need mysql extension to connect to MySQL database using PHP.
  7. While Zend Engine executes the generated code, the script might require access to a few extensions. Then ZE passes the control to the extension module/layer which transfer back the control to ZE after completion of tasks.
  8. Finally Zend Engine returns back the result to PHP Core, which gives that to SAPI layer, and finally which displays it on your browser.

A Step Deeper
But wait! This is still not over yet. Above was just a high level flow diagram. Lets dig a step deeper and see what more is happening behind the scenes:

  1. When we start Apache, it also starts PHP interpreter along itself
  2. PHP startup happens in 2 steps
  3. 1st step is to perform initial setup of structures and values that persists for the life of SAPI
  4. 2nd step is for transient settings that only last for a single page request

Step 1 of PHP Startup
Confused over what’s step 1 and 2 ? No worries, next we will discuss the same in a bit more detail. Lets first see step 1, which is basically the main stuff.
Remember step 1 happens even before any page request is being made.

  1. As we start Apache, it starts PHP interpreter
  2. PHP calls MINIT method of each extension, which is being enabled. View your php.ini file to see the modules which are being enabled by default
  3. MINIT refers to Module Initialization. Each Module Initialization method initializes and define a set of functions, classes which will be used by future page requests

A typical MINIT method looks like:

      PHP_MINIT_FUNCTION(extension_name) {

          /* Initialize functions, classes etc */

      }

Step 2 of PHP Startup

  1. When a page request is being made, SAPI layer gives control to PHP layer. PHP then set up an environment to execute the PHP page requested. In turn it also create a symbol table which will store various variables being used while executing this page.
  2. PHP then calls the RINIT method of each module. RINIT refers to Request Initialization Module. Classic example of RINIT module implementation is the Session’s module. If enabled in php.ini, the RINIT method of Sessions module will pre-populate the $_SESSION variable and save in the symbol table.
  3. RINIT method can be thought as an auto_prepend_file directive, which is pre-appended to every PHP script before execution.

A typical RINIT method looks like:

      PHP_RINIT_FUNCTION(extension_name) {

          /* Initialize session variables, pre-populate variables, redefine global variables etc */

      }

Step 1 of PHP Shutdown
Just like PHP startup, shutdown also happens in 2 steps.

  1. After the page execution is complete either by reaching the end of the script or by call of any exit() or die() function, PHP starts the cleanup process. In turn it calls RSHUTDOWN method of every extension. RSHUTDOWN can be thought as auto_append_file directive to every PHP script, which no matter what happens, is always executed.
  2. RSHUTDOWN method, destroys the symbols table (memory management) by calling unset() on all variables in the symbols table

A typical RSHUTDOWN method looks like:

      PHP_RSHUTDOWN_FUNCTION(extension_name) {

          /* Do memory management, unset all variables used in the last PHP call etc */

      }

Step 2 of PHP Shutdown
Finally when all requests has been made and SAPI is ready to shutdown, PHP call its 2nd step of shutdown process.

  1. PHP calls the MSHUTDOWN method of every extension, which is basically the last chance for every extension to unregister handlers and free any persistent memory allocated during the MINIT cycle.

A typical RSHUTDOWN method looks like:

      PHP_MSHUTDOWN_FUNCTION(extension_name) {

          /* Free handlers and persistent memory etc */

      }

And that brings us to the end of what we can call as PHP Lifecycle. Important point to note is that Step 1 of Startup and Step 2 of Shutdown happens when no request is being made to the web servers.

I hope this post will clear many doubts and un-answered questions which you might have.

Do leave a comment and feedbacks.