The world needs a serious PHP to PHP transpiler … period!
So I started this one right here …
Why ???
Because it’s 2016 and we all know readable, testable code beats fast code now. A nice introduction to the reasoning behind this viewpoint can be found here.
Then again, if two pieces of code do exactly the same and the only reason one of them is slower so that it can be tested and understood nicely, why would the end user of the code have to suffer because of it?
The end user surely doesn’t care about running the tests or reading the source code …
So if one were to find a way of transforming testable and easily readable code into faster, logically equivalent code, developers could continue doing what they’re doing … and end users of the code would get a boost in performance for free.
In compiled languages this kind of thing is done by the compiler to some degree. The readability aspect of things can easily be seen by anyone.
Just take your fun little hello world for example:
Put this into test.c:
Your compiler can turns that into assembly code first, giving you this mess:
you can generate the above via for example:
This your assembler and linker then take and turn it into something your CPU can understand:
Gives you your good old binary that you can execute.
The contents of which are optimized for machine readability and performance but you would not want to be in the business in writing them from scratch.
You can get a sense of the complexity in the compiled hello world executable using readelf.
Showing you this information about the file:
Much of which you as the developer of the hello world program did not have to think about and which got optimized without your doing.
In PHP a very similar thing happens, the interpreter does not generate an executable that can be run directly, but it compiles your code into
OP-Codes that can be interpreted by the machine and then eventually turned into direct machine code instructions like those in the above output.
The important difference between the compiler and the interpreter in this case though is, that the compiler can take it’s time and optimize as best as it can from whatever code it gets.
The PHP interpreter cannot do that to the same extend. An easy to understand example would be this code:
vs:
The latter is one of these generally known performance micro optimizations in PHP. It is somewhat easy to understand why the latter is faster than the former:
The former computes the the length of a string and then checks if it is longer than 5 chars, the latter simply looks if the string has a char at index 5 (position 6) and hence concludes that the string cannot be longer than 5 chars.
Arguably the former is easier to read as it is clear to anyone what is going on, the latter needs some interpretation as to why we’re checking index 5 on a string, but is faster.
Now you could argue that PHP shouldn’t discriminate here and simply handle both code snippets the same way, given their logical equivalency. Surely this would make the former run as fast as the latter, the guy implementing this in the interpreter would surely know,
that the latter is running faster and have PHP create teh OP Code for it instead of for the strlen version.
But even though it’s 2016 now … reality is not a picnic and establishing logical equivalency obviously costs CPU cycles too. Sure we do have Opcaches now and all that, but to a certain degree the interpreter still has to decide whether it will be cheaper to optimize code snippet X before
compiling it into OP Code or if it isn’t eventually just cheaper to simply run the code.
If you think this example through, the interpreter would have to decide along these lines for example:
I compare the output of strlen with 5.
The output of strlen is only used for this comparison and not subject to a nested assignment.
Hence I will turn the strlen call into an isset.
In order for it to do this though, the interpreter would need to constantly keep looking ahead and/or behind when turning tokenized syntax into Op-Codes. Depending on the code in question this could be very expensive compared to simple execution of the code.
In essence this is the reason interpreted languages are generally slower than compiled ones in terms of their CPU cycle use.
Adding a strong transpiler into the mix can, depending on the code in question, do some of the optimizations the interpreter cannot do at runtime and lessen the negative performance impact of interpreter use and writing readable, testable code.
Additionally PHP has another angle a transpiler can work on, it’s just a way to trivial language. PHP probably is as easy as it gets in terms of just doing thing X and not having to worry about many details or conventions. Often times following a few rules you don’t have to follow, will greatly improve the results you’re getting in terms of performance though.
Hence the motivation behind the PHP-Transpiler is to remove testability and readability complications from code going into production and after they have served their purpose in development as well as correcting design flaws, that incur a performance penalty in any case.
Let’s look at two examples …
Example One: Including and Requiring Files
Once your PHP project has grown beyond entirely trivial size, you are likely going to use multiple files to hold your code. Oftentimes though these files will still all be loaded on every run of your script.
Their presence is merely a means of convenience for you while coding. Loading those additional files does come with some performance drawbacks and can be optimized away by the PHP-Transpiler.
An example of this in action would be the following code consisting of two files in the same folder:
parent.php:
child.php:
Lets run this:
On my machine it’ll come out to 0.274s. Not to bad for 1M runs through that loop, but still lets see what the transpiler makes out of this.
I put the files for this in the folder ‘include’ and want my transpilation result to end up in ‘include-out’, so I run:
Now my parent.php looks like this:
Not very readable … but now the whole thing takes 0.018s all of a sudden. A more than 10x speedup in execution!. We paid a lot for having that include in this admittedly academic example, think about your huge projects though, these things do add up in some cases.
Example Two: Failing to Define Object Properties
An example of where sloppy coding can be optimized away automatically would be failing to define object properties that are eventually used at runtime.
This code will run, but your PHP bytecode compiler will not be able to optimize the hashtables it sets up for the sloppy class, instead it will have to setup a dynamic hashtable for $a and $b.
So at any rate, running this takes 0.508s on my machine.
Lets run it through the transpiler :)
The transpiler produces:
not all that nice looking, but lets run it … 0.415s … just because you couldn’t be bothered to declare $a and $b you wasted that tenth of a second … The transpiler caught the issue though and fixed it.
Running it, it also tells you about your mistake in the crude CLI output currently implemented.
That’s it for Now
Hope this was somewhat interesting for the time being, let’s see how far I can take this project :)