“PHP tokens and opcodes” – When a PHP script is executed it goes through a number of processes, before the final result is displayed. These processes are namely: Lexing, Parsing, Compiling and Executing. In this blog post, I will walk you through all these processes with a sample example. In the end I will list some useful PHP extensions, which can be used to analyze results of every intermediate process.
Lets take a sample PHP script as an example:
<?php function increment($a) { return $a+1; } $a = 3; $b = increment($a); echo $b; ?>
Try running this script through command line:
~ sabhinav$ php -r debug.php 4
This PHP script goes through the following processes before outputting the result:
- Lexing: The php code inside debug.php is converted into tokens
- Parsing: During this stage, tokens are processed to derive at meaningful expressions
- Compiling: The derived expressions are compiled into opcodes
- Execution: Opcodes are executed to derive at the final result
Lets see how a PHP script passes through all the above steps.
Lexing:
During this stage human readable php script is converted into token. For the first two lines of our PHP script:
<?php function increment($a) {
tokens will look like this (try to match the tokens below line by line with the above 2 lines of PHP code and you will get a feel):
~ sabhinav$ php -r 'print_r(token_get_all(file_get_contents("debug.php")));'; Array ( [0] => Array ( [0] => 368 // 368 is the token number and it's symbolic name is T_OPEN_TAG, see below [1] => <?php [2] => 1 ) [1] => Array ( [0] => 371 [1] => [2] => 2 ) [2] => Array ( [0] => 334 [1] => function [2] => 2 ) [3] => Array ( [0] => 371 [1] => [2] => 2 ) [4] => Array ( [0] => 307 [1] => increment [2] => 2 ) [5] => ( [6] => Array ( [0] => 309 [1] => $a [2] => 2 ) [7] => ) [8] => Array ( [0] => 371 [1] => [2] => 2 ) [9] => { [10] => Array ( [0] => 371 [1] => [2] => 2 )
A list of parser tokens can be found here: http://www.php.net/manual/en/tokens.php
Every token number has a symbolic name attached with it. Below is our PHP script with human readable code replaced by symbolic name for each generated token:
~ sabhinav$ php -r '$tokens = (token_get_all(file_get_contents("debug.php"))); foreach($tokens as $token) { if(count($token) == 3) { echo token_name($token[0]); echo $token[1]; echo token_name($token[2]); } }'; T_OPEN_TAG<?php UNKNOWNT_WHITESPACE UNKNOWNT_FUNCTIONfunctionUNKNOWNT_WHITESPACE UNKNOWNT_STRINGincrementUNKNOWNT_VARIABLE$aUNKNOWNT_WHITESPACE UNKNOWNT_WHITESPACE UNKNOWNT_RETURNreturnUNKNOWNT_WHITESPACE UNKNOWNT_VARIABLE$aUNKNOWNT_LNUMBER1UNKNOWNT_WHITESPACE UNKNOWNT_WHITESPACE UNKNOWNT_VARIABLE$aUNKNOWNT_WHITESPACE UNKNOWNT_WHITESPACE UNKNOWNT_LNUMBER3UNKNOWNT_WHITESPACE UNKNOWNT_VARIABLE$bUNKNOWNT_WHITESPACE UNKNOWNT_WHITESPACE UNKNOWNT_STRINGincrementUNKNOWNT_VARIABLE$aUNKNOWNT_WHITESPACE UNKNOWNT_ECHOechoUNKNOWNT_WHITESPACE UNKNOWNT_VARIABLE$bUNKNOWNT_WHITESPACE UNKNOWN
Parsing and Compiling:
By generating the tokens in the above step, zend engine is able to recognize each and every detail in the script. Where the spaces are, where are the new line characters, where is a user defined function and what not. Over the next two stages, the generated tokens are parsed and then compiled into opcodes. Below is the compiled opcode for the complete sample script of ours:
~ sabhinav$ php -r '$op_codes = parsekit_compile_file("debug.php", $errors, PARSEKIT_SIMPLE); print_r($op_codes); print_r($errors);'; Array ( [0] => ZEND_EXT_STMT UNUSED UNUSED UNUSED [1] => ZEND_NOP UNUSED UNUSED UNUSED [2] => ZEND_EXT_STMT UNUSED UNUSED UNUSED [3] => ZEND_ASSIGN T(0) T(0) 3 [4] => ZEND_EXT_STMT UNUSED UNUSED UNUSED [5] => ZEND_EXT_FCALL_BEGIN UNUSED UNUSED UNUSED [6] => ZEND_SEND_VAR UNUSED T(0) 0x1 [7] => ZEND_DO_FCALL T(1) 'increment' 0x83E710CA [8] => ZEND_EXT_FCALL_END UNUSED UNUSED UNUSED [9] => ZEND_ASSIGN T(2) T(0) T(1) [10] => ZEND_EXT_STMT UNUSED UNUSED UNUSED [11] => ZEND_ECHO UNUSED T(0) UNUSED [12] => ZEND_RETURN UNUSED 1 UNUSED [function_table] => Array ( [increment] => Array ( [0] => ZEND_EXT_NOP UNUSED UNUSED UNUSED [1] => ZEND_RECV T(0) 1 UNUSED [2] => ZEND_EXT_STMT UNUSED UNUSED UNUSED [3] => ZEND_ADD T(0) T(0) 1 [4] => ZEND_RETURN UNUSED T(0) UNUSED [5] => ZEND_EXT_STMT UNUSED UNUSED UNUSED [6] => ZEND_RETURN UNUSED NULL UNUSED ) ) [class_table] => )
As we can see above, Zend engine is able to recognize the flow of our PHP. For instance, [3] => ZEND_ASSIGN T(0) T(0) 3
is a replacement for $a = 3;
in our PHP code. Read on to understand what do these T(0)
in the opcode means.
Executing the opcodes:
The generated opcode is executed one by one. Below table shows various details as every opcode is executed:
~ sabhinav$ php -d vld.active=1 -d vld.execute=0 -f debug.php Branch analysis from position: 0 Return found filename: /Users/sabhinav/Workspace/interview/facebook/peaktraffic/debug.php function name: (null) number of ops: 13 compiled vars: !0 = $a, !1 = $b line # op fetch ext return operands ------------------------------------------------------------------------------- 2 0 EXT_STMT 1 NOP 5 2 EXT_STMT 3 ASSIGN !0, 3 6 4 EXT_STMT 5 EXT_FCALL_BEGIN 6 SEND_VAR !0 7 DO_FCALL 1 'increment' 8 EXT_FCALL_END 9 ASSIGN !1, $1 7 10 EXT_STMT 11 ECHO !1 8 12 RETURN 1 Function increment: Branch analysis from position: 0 Return found filename: /Users/sabhinav/Workspace/interview/facebook/peaktraffic/debug.php function name: increment number of ops: 7 compiled vars: !0 = $a line # op fetch ext return operands ------------------------------------------------------------------------------- 2 0 EXT_NOP 1 RECV 1 3 2 EXT_STMT 3 ADD ~0 !0, 1 4 RETURN ~0 4 5* EXT_STMT 6* RETURN null End of function increment.
First table represents the main loop run, while second table represents the run of user defined function in the php script. compiled vars: !0 = $a
tells us that internally while script execution !0 = $a
and hence now we can relate [3] => ZEND_ASSIGN T(0) T(0) 3
very well.
Above table also returns back the number of operations number of ops: 13
which can be used to benchmark and performance enhancement of your PHP script.
If APC cache is enabled, it caches the opcodes and thereby avoiding repetitive lexing/parsing/compiling every time same PHP script is called.
3 PHP extensions providing interface to Zend Engine:
Below are 3 very useful PHP extensions for geeky PHP developers. (Specially helpful for all PHP extension developers)
- Tokenizer: The tokenizer functions provide an interface to the PHP tokenizer embedded in the Zend Engine. Using these functions you may write your own PHP source analyzing or modification tools without having to deal with the language specification at the lexical level.
- Parsekit: These parsekit functions allow runtime analysis of opcodes compiled from PHP scripts.
- Vulcan Logic Disassembler (vld): Provides functionality to dump the internal representation of PHP scripts. Homepage of VLD project for download instructions.
Hope this is of some help for PHP geeks out there.
Enjoy!
Pingback: Webby Scripts PHP tokens and opcodes : 3 useful extensions for understanding the …
Pingback: PHP tokens and opcodes : 3 useful extensions for understanding the working of Zend Engine « Narendra Dhami
You might also want to look at the Bytekit[1] Extension published by Stefan Esser.
[1] http://www.bytekit.org/
Pingback: Abhinav Singh’s Blog: PHP tokens & opcodes: 3 useful extensions for understanding the Zend Engine | Development Blog With Code Updates : Developercast.com
Pingback: Abhinav Singh’s Blog: PHP tokens & opcodes: 3 useful extensions for understanding the Zend Engine | Webs Developer
Pingback: VT’s Tech Blog » Blog Archive » Learn about the workings of the Zend Engine
Really nice tutorial about Zend , for those who are new to PHP must read this. http://bit.ly/84e39e
awesome article, thanks
Pingback: PHP tokens and opcodes : 3 useful extensions for understanding the working of Zend Engine | 鸭嘴的blog
userful. Thanks
Pingback: A dive into PHP internals – part I - Florin's blog