Facebook Looks to Fix PHP performance with HipHop Virtual Machine

Submitted by fran 2011-12-14 Internet 54 Comments

PHP’s popularity and simplicity made it easy for the company’s developers to quickly build new features. But PHP’s (lack of) performance makes scaling Facebook’s site to handle hundreds of billions of page views a month problematic, so Facebook has made big investments in making it leaner and faster. The latest product of those efforts is the HipHop VM (HHVM), a PHP virtual machine that significantly boosts performance of dynamic pages . And Facebook is sharing it with the world as open-source.

About The Author

David Adams

Follow me on Twitter @david_adams

54 Comments

2011-12-14 4:22 pm
Bill Shooter of Bul Platinum Prime
Its always great when large companies with experience release the tools they use to get stuff done. Its also a bit frightening to actually rely on those kinds of releases for an important website. You know it works for them, but you don’t know all of the pitfalls and open issues that could come up if you use it in a manner slightly inconsistent with the sponsor’s use.
There are also questions of maintainability and updates.
I usually prefer to wait for another large company to start using the product first, before coming to depend upon it in any critical role.
Also, did they have to throw in a webserver into HipHop? I know you can use hiphop without the webserver, but that just seems like too much unrelated code there with potentials for really bad exploits.
To sum up my thoughts… Quick Everyone else except for me use it and let me know what the typical experience with it is so I can use a revolutionary, but stable product! I would do it myself, but I’m just far too lazy.

2011-12-14 9:42 pm
daedliusswartz
Isn’t that one of the points of opensource, to take care of those concerns you have?

2011-12-14 10:25 pm
Bill Shooter of Bul Platinum Prime
Sadly, it doesn’t address by Lazy, lazy attitude.

2011-12-14 7:19 pm
tony
For most of the sites I’ve worked with that use PHP, the bottleneck wasn’t PHP, but the database back-end. I wonder what point you get to when PHP is actually the bottleneck.

2011-12-14 8:14 pm
Alfman verbose=1
Well, the front end PHP servers are usually (always?) stateless, which means they can be scaled trivially by running clusters of mirrored web servers in parallel. The same sort of scalability is not nearly as trivial for databases, and for that reason they tend to be much more problematic.
However that said, PHP is extremely inefficient. It’s worse than java or .net by a factor of roughly 100 according to the “average” row in the following benchmark:
http://www.csharp-architect.com/images/benchmarksJan2009Final.gif
So, with some hand-waiving, we’d expect a VM version of PHP to significantly reduce the quantity of PHP servers required to service a given load, and the excess servers no longer needed could be redeployed as less heavily loaded database servers.
This makes a lot of sense for an entity like facebook and shared hosting providers where they run servers at max capacity.

2011-12-14 8:40 pm
Alfman verbose=1
Just correcting myself, php session data isn’t stateless, but if load balancing drives users to the same server each request, it’s not an issue. And it’s easy to store on NFS otherwise.
2011-12-14 10:46 pm
fran
ok I know this is not stackoverflow but is this a problem across all databases? For instance NoSql, Couchbase, SQL ect.
Is’nt there also type of cacheing plugin that make PHP faster and methods to curb output buffer flooding?

2011-12-14 11:38 pm
lucas_maximus
There are things like mem-cache and other technologies … but Facebook is built on MySQL which is alright for running a blog or a small web store … but once it gets serious you need a proper RDBMS.

2011-12-15 2:20 pm
Bill Shooter of Bul Platinum Prime
http://gigaom.com/cloud/facebook-shares-some-secrets-on-making-mysq…
I’m too lazy at the moment to track down all of the reasons why using a “proper RDBMS” is a bad idea for facebook but that gives you some hints anyways: licensing costs, speed of support issue resolution and lack of proof that it could actually handle the load.
proper RDBMS == true scotsman. By the time you got something that worked as well as what they use now, it would be almost exactly what they use now, but cost a but load more and have an “oracle” sticker on it.
2011-12-15 6:20 pm
lucas_maximus
Fair enough on the true Scotsman reference, they have what they have now and they done a lot of stuff to fix it.
Mark Zuckerburg even said he wished he built it in something else to begin with.
But MySQL just sucks, Postgres is just as free and is miles better IMO. Postgres has features such as GIS entities built into it, that MSSQL only got in the 2008 version.
http://www.pgcon.org/2010/schedule/attachments/141_PostgreSQL-and-N… (warning PDF).
MySQL and NoSQL databases aren’t really much better than running say Postgres with certain options turned off which IMHO is a decent ORDMS and is in the same league as Oracle and MS-SQL and whatever IBM push (DB2?).
TBH I am not a hardcore database guy but I know enough tricks to optimize queries when needed. But when I am using MySQL you just get that feeling it is a bit shitty.
2011-12-15 7:35 pm
Bill Shooter of Bul Platinum Prime
Mysql does seem to be a bit “shitty”, if you are coming from a traditional RDBS background. It has traditionally favored performance over data integrity or durability. Which is terrible, if you need to rely on those things being correct within the database. of course the reason why they are in Oracle, MSSQL, Postgres and others is because there are some really good reasons why its a good idea to have the database care about those. However, those come at a performance cost.
If you have an environment where the cost of that performance is negligible, then use them.
If you have an environment where the cost of them is astronomical, do not use them.
A good argument can be made that they could do whatever they are doing with Mysql with Postgres as well. And that’s possibly true. The reason why Postgres isn’t used for these kinds of installs, is because of the historically more difficult and non-standard way of setting up simple master slave replication. Replication is how many companies work around the limitations imposed by not having the integrity and durability in a single database.
If Postgres had easy replication built in from day 1, I don’t think Mysql would be nearly as popular.
2011-12-15 7:44 pm
lucas_maximus
But MySQL performance isn’t that good if the queries are complicated.
MSSQL and other databases don’t need the memcache stuff because they do it already. Run a 2000 line SPROC in SQL Server with the same Parameters and it runs instantly.
But tbh my original point it was only good for simple stuff so I am contradicting myself.
There are quite good “free” as in cost databases that work fine such as MSSQL server express or Postgres that will do just as well for performance in most cases.
If Postgres had easy replication built in from day 1, I don’t think Mysql would be nearly as popular.
I didn’t know about that … but that is good to know.
2011-12-15 9:41 pm
Bill Shooter of Bul Platinum Prime
But MySQL performance isn’t that good if the queries are complicated.
True. Facebooks queries are mostly like
INSERT INTO statuses (status_message) VALUES (‘is really happy today’);
Simple. Also forced upon it because the data is split into so many different servers, a single server can’t be asked to do very complex joins.
MSSQL and other databases don’t need the memcache stuff because they do it already. Run a 2000 line SPROC in SQL Server with the same Parameters and it runs instantly.
Well.. that’s debatable. Memcache is outside of mysql. Which means a separate server is hit entirely for those queries. So you save the db server from having to even acknowledge the request. Plus, expanding the cache is simply adding another server. So its not uncommon to see some sites that have 10 memcahced servers for every mysql server. If you did the caching in the database, you’d either have duplicate data in 11 servers or split it up into 11 different servers and have some smart application driven load balancing. Having the cache separate also allows for it to be swapped out when a better solution comes along.
2011-12-16 8:17 am
lucas_maximus
Well.. that’s debatable. Memcache is outside of mysql. Which means a separate server is hit entirely for those queries. So you save the db server from having to even acknowledge the request. Plus, expanding the cache is simply adding another server. So its not uncommon to see some sites that have 10 memcahced servers for every mysql server.
I remember vaguely about it from the PGCON presentation. As I said I am not a really a database guy.
If you did the caching in the database, you’d either have duplicate data in 11 servers or split it up into 11 different servers and have some smart application driven load balancing. Having the cache separate also allows for it to be swapped out when a better solution comes along.
With MSSQL, you normally have a SQL node which appears as one server … the sites I have worked on have only needed passive failover, we are talking about a million uniques a month.
Not sure about larger sites tbh.
2011-12-16 4:48 pm
Bill Shooter of Bul Platinum Prime
I can’t find the original article describing the growing pains Myspace had with MSSQL. But, here’s an overview:
http://highscalability.com/myspace-architecture
They key thing to note in the linked summary. There are different strategies that work best for different kinds of loads.
If I ever have to do another large db design again, I would do things differently with a plan on how to migrate to different solutions at different growth benchmarks.
(disclaimer disclaimer, I have hothing to disclaim. I have never worked for Facebook or Myspace, but I have had conversations with those that did.)

2011-12-15 11:45 am
Soulbender
Part of the problem, and no small part, is that most web developers couldn’t design a relational database if their life depended on it. Normalize? Wazzat? Just use incrementing serials always, everywhere. Relations? No, I’m single and do my database consistency in the code.
Edited 2011-12-15 11:45 UTC

2011-12-15 6:40 pm
lucas_maximus
Part of the problem, and no small part, is that most web developers couldn’t design a relational database if their life depended on it. Normalize? Wazzat? Just use incrementing serials always, everywhere. Relations? No, I’m single and do my database consistency in the code.
Not particular fair comment IMHO.
I wonder how many database admin’s could make a complex page layout that renders correctly in all major desktop browser without resorting to tables … I don’t think it would be many.
Most web agencies you have a small workforce with people that know just enough database stuff to make it work. That is more the fault of their employers than them tbh.
At my old place they have only just got someone with DBA skills. Before they were relying on me and I know enough to write SPROCS, FUNCTIONS, VIEWS and manage users.
I am sure I could ask you some questions about CSS that you wouldn’t know … Writing Good CSS and Terse CSS is very difficult as well as actually knowing HTML … most devs don’t know how to markup an Address properly … let alone the element to use.
2011-12-16 3:08 am
Soulbender
Which is exactly my point. Web developers are good at web developing and thus should NOT design databases. DBA’s are good at databases and should not design web sites. I’m sure there are those who are good at both but from my experience that’s a minority.
2011-12-15 7:36 pm
Bill Shooter of Bul Platinum Prime
Like I mentioned in the other post. Normalization and referential integrity come with performance costs. Sometimes those are too much to bear.
2011-12-15 7:54 pm
lucas_maximus
And tbh sometimes overkill … when you have some very simple databases.

2011-12-15 1:26 am
Dr.Mabuse
However that said, PHP is extremely inefficient. It’s worse than java or .net by a factor of roughly 100 according to the “average” row in the following benchmark: … (snip)
Most Java applications, in my experience, running in a web environment are horrible, simply horrible, memory hogs. I can think of many examples, commercial and internally developed that fit this description!
Maybe it’s not Java’s fault per-se, but the toolkits, or the methodology behind them, I really don’t know (or care), but I cannot hope to recall the amount of times Tomcat has balked because it’s run out of resources. Servers running this software nearly always need more ram, and more cpu than their PHP/Apache2 counterparts.
Maybe the VM powering PHP really is that much slower, but when considering the complete stack to deliver content to the web, PHP is a much better option if you actually care about reliability and, I think, user-perceived performance.
Just my 2 cents…

2011-12-15 3:07 am
Alfman verbose=1
Dr Mabuse,
“Most Java applications, in my experience, running in a web environment are horrible, simply horrible, memory hogs. I can think of many examples, commercial and internally developed that fit this description!”
I can sure vouch for this indirectly. I needed to work on some Java code with some special version of eclipse IDE which consumed no less than 500MB of ram. We thought something was wrong, but the vendor said it was normal and within specs. I needed more ram installed in my employer-provided computer.
Of course, as you say it may not indicate a problem with Java per say, but yikes…
Anyways, some people have done the benchmarks for memory too so we can speak a little more intelligently about it:
http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=…
“Maybe the VM powering PHP really is that much slower, but when considering the complete stack to deliver content to the web, PHP is a much better option if you actually care about reliability and, I think, user-perceived performance.”
I think it’s largely developer preference and skill.
Java forced exceptions caused a lot of friction with developers who simply wanted exceptions to bubble up until they were caught or the program aborted itself. Without a Java IDE to insert code templates, calling exception throwing functions was uniquely painful. Java invented a new problem that no other languages had. It would have been much better handled as a compiler warning. This was the deciding factor in my personal projects to avoid java despite wanting to use it for it’s other qualities.
PHP is a global heap of inconsistent functions with a long history of semantic incompatibility between versions. PHP designers were clearly not qualified to build the language that would become the standard web platform for the internet. They can be credited for the development of anti-features such as \”magic quotes\” and “=== I really mean it” equality.
PHP’s strong point is online documentation, they’ve done an excellent job making things very easy to learn how to do. I think many languages are far behind in the documentation dept.
I would like to try other modern languages, but PHP’s ubiquity at hosting providers keeps me coming back – it remains top dog because it’s top dog.
Edited 2011-12-15 03:12 UTC

2011-12-15 3:57 am
Dr.Mabuse
Anyways, some people have done the benchmarks for memory too so we can speak a little more intelligently about it:
http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&l…
Funny, I found the same site right after I posted above…
With that said, the speed difference doesn’t really seem to manifest itself as an issue with the coding I’ve done for the web. I use both languages at work, so I have a fair grasp of them.
I still think the Java/Tomcat stack a much more difficult environment to master and administer. It is certainly more challenging. There is a clunky-ness to Java sites that is generally unmistakable.
IMHO of course…
2011-12-15 11:49 am
malxau
(Apologies for going way offtopic)
Java forced exceptions caused a lot of friction with developers who simply wanted exceptions to bubble up until they were caught or the program aborted itself…Java invented a new problem that no other languages had. It would have been much better handled as a compiler warning.
I currently work on kernel mode C code with exception handling. If a raise condition is not handled, it hits the unhandled exception handler, which bugchecks the entire machine with KMODE_EXCEPTION_NOT_HANDLED (see http://msdn.microsoft.com/en-us/library/ff557408(v=vs.85).aspx)
Personally, I would _love_ Java’s exception handling model. If a condition is not handled, I want the compile to fail and tell me to fix it, not wait for that condition to happen and crash on some poor unsuspecting customer site. If it’s a warning, I’ll immediately promote it to an error with /WX.
I know many people don’t mind just crashing their program on error. I don’t have that luxury (and if I did, I’d still struggle with tolerating such an ungraceful exit.)
2011-12-15 4:51 pm
Alfman verbose=1
malxau,
“Personally, I would _love_ Java’s exception handling model. If a condition is not handled, I want the compile to fail and tell me to fix it, not wait for that condition to happen and crash on some poor unsuspecting customer site. If it’s a warning, I’ll immediately promote it to an error with /WX.”
That was my point, if it was a warning then developers would have the choice of how to handle it.
The pacemaker devs could have it in their contract that all their classes must compile without warnings. It would give all the safety benefits to them without burdening prototype devs with handling exceptional behaviour before the basic application functionality is even working.
2011-12-15 6:20 pm
igouy
Anyways, some people have done the benchmarks for memory too so we can speak a little more intelligently about it:
http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&l…
If we want to speak intelligently about memory use based on those benchmarks game programs we have to be really really careful 😉
1) Don’t take the memory use of the “Java 7 averaged” programs – they are being run in a different way to allow averages to be calculated, and we can see the reported memory use is higher than for the ordinary “Java 7 -server” measurements
http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=…
2) Notice that some of the programs are written for multicore and allocate additional buffers to accumulate results from multiple processes. If the Java program uses threads and the PHP program forks processes we might see memory use like this –
Java mandelbrot 68,140KB
PHP mandelbrot 117,152KB
3) Notice that differences in the default memory allocation don’t tell us anything about memory use when the programs need to allocate more than the default.
For example, n-body programs don’t need more than default:
Java n-body 16,784KB
PHP n-body 3,680KB
:but these need more than default:
Java reverse-complement 313,080KB
PHP reverse-complement 444,572KB
Java k-nucleotide 458,476KB
PHP k-nucleotide 248,272KB
Java binary-trees 534,924KB
PHP binary-trees 2,364,064KB
2011-12-15 7:58 pm
Alfman verbose=1
igouy,
“If we want to speak intelligently about memory use based on those benchmarks game programs we have to be really really careful ;-)”
Of course, I cited the source to give a general idea.
There’s probably alot you could do to tune Java’s garbage collector to use less memory. But unlike the CPU performance, the memory differences were already less than an order of magnitude.
Ideally we’d have benchmarks for the same language implemented both as both interpreted and JIT native compiler. This way we’d see how the same algorithms faired under both implementations of the language.
2011-12-15 6:43 pm
lucas_maximus
Java forced exceptions caused a lot of friction with developers who simply wanted exceptions to bubble up until they were caught or the program aborted itself. Without a Java IDE to insert code templates, calling exception throwing functions was uniquely painful. Java invented a new problem that no other languages had. It would have been much better handled as a compiler warning. This was the deciding factor in my personal projects to avoid java despite wanting to use it for it’s other qualities.
What are you on about? Painful, yes typing throws Exception is so painful </sarcasm> and using a try catch later, pure torture </sarcasm>
If you don’t want to handle the exception locally you just add throws Exception after the method declaration. and catch the exception in the calling method.
Is this stuff hard to understand. With C# you don’t have the throws keyword. However I actually wish that C# did since it would make it clear what exceptions the method would throw.
Edited 2011-12-15 18:51 UTC
2011-12-15 8:25 pm
Alfman verbose=1
lucas_maximus,
“If you don’t want to handle the exception locally you just add throws Exception after the method declaration. and catch the exception in the calling method.”
This is exactly the wrong way to handle it IMO. If every function needs to add “throws Exception”, then there’s absolutely no benefit at all. By casting to the base exception class, we actually loose the information about the specificity typed exceptions which are thrown and we force calling functions to handle generic exceptions instead of specific ones.
I understand that checked exceptions are a highly opinionated topic, and I have no problem with devs who like them. My point was that forced exception handling in Java is awkward for rapid prototype workflows where it makes far more sense to focus on normal code paths first, and then add exceptional cases at later stages of development as needed.
2011-12-15 10:22 pm
malxau
If every function needs to add “throws Exception”, then there’s absolutely no benefit at all…My point was that forced exception handling in Java is awkward for rapid prototype workflows where it makes far more sense to focus on normal code paths first…
I think lucas_maximus is saying that you can effectively cripple the feature, and in doing so get much closer to being a rapidly prototyped environment. Java is trying to strongly discourage you from ignoring errors, but you can tell it that you really want to (if you really want to), bubble everything up to main() and terminate your app (which is what would happen if Java didn’t do anything.)
If you do this, at least every function will be documented as propagating exceptions, so you have a permanent record of what to fix later (whereas without this it’s never clear where exceptions are propagating.)
2011-12-16 4:31 am
Alfman verbose=1
malxau,
“If you do this, at least every function will be documented as propagating exceptions, so you have a permanent record of what to fix later (whereas without this it’s never clear where exceptions are propagating.)”
Well, if exceptions were merely warnings you’d still get that. And I wouldn’t even mind if Java marked class files as “tarnished” if they ignored exceptions, but that’s just not how things panned out.
There are other sticky issues too like how checked exceptions are at odds with the OOP idea of encapsulating implementation details behind a stable interface. Suppose we’ve got two different implementations with the same interface except that use different underlying libraries, which throw different internal exceptions. In all other languages these two implementations would be compatible, but in Java, the calling functions need to be re-written to throw/catch the new exception type.
Now you could wrap up the internal library’s exceptions inside a compatibility exception class every time they’re handled, but that’s considerably more work, and assumes you anticipated the implementation switch down the line.
Some devs are still in favor of it, and that’s ok, but as for me I’m thankful other OOP languages haven’t copied it.

2011-12-15 3:11 pm
moondevil
Most Java applications, in my experience, running in a web environment are horrible, simply horrible, memory hogs. I can think of many examples, commercial and internally developed that fit this description!
Maybe it’s not Java’s fault per-se, but the toolkits, or the methodology behind them, I really don’t know (or care), but I cannot hope to recall the amount of times Tomcat has balked because it’s run out of resources. Servers running this software nearly always need more ram, and more cpu than their PHP/Apache2 counterparts.
Usually the fault is the programmers.
Most of the time when I see bad Java code, it is caused by developers that learned to program Java on the job while putting to production the first thing they managed to compile.
But this is not Java specific, I see this a lot when we need to rescue projects done by wannabe developers,
regardless of the programming language being used.

2011-12-15 6:45 pm
lucas_maximus
Same thing happens in C#.
Anything that has an IDisposable interface is abused … and you have hundreds of orphaned connections to files, databases, webservers etc. When all one has to do is understand the “using” keyword.
Edited 2011-12-15 18:47 UTC

2011-12-15 11:24 am
Laurence
Well, the front end PHP servers are usually (always?) stateless, which means they can be scaled trivially by running clusters of mirrored web servers in parallel. The same sort of scalability is not nearly as trivial for databases, and for that reason they tend to be much more problematic.
However that said, PHP is extremely inefficient. It’s worse than java or .net by a factor of roughly 100 according to the “average” row in the following benchmark:
http://www.csharp-architect.com/images/benchmarksJan2009Final.gif
So, with some hand-waiving, we’d expect a VM version of PHP to significantly reduce the quantity of PHP servers required to service a given load, and the excess servers no longer needed could be redeployed as less heavily loaded database servers.
This makes a lot of sense for an entity like facebook and shared hosting providers where they run servers at max capacity.
I can’t see any figures for PHP in that gif.
I’d also be interested to see PHP compared against CGI, mod_perl and Python.

2011-12-15 4:40 pm
Alfman verbose=1
Laurence,
“I can’t see any figures for PHP in that gif.”
The table up at the top. I don’t really understand why they were omitted from the graph. Anyways I posted another source with an interactive language by language comparison.
.Net and Java were fairly similar, Perl was often better than PHP, but interpreted languages across the board were at least a magnitude slower than native ones.
Ideally, all languages would have native JIT compilers so that performance would no longer be such a crucial factor between them.

2011-12-18 6:12 pm
Laurence
Laurence,
“I can’t see any figures for PHP in that gif.”
The table up at the top. I don’t really understand why they were omitted from the graph. Anyways I posted another source with an interactive language by language comparison.
.Net and Java were fairly similar, Perl was often better than PHP, but interpreted languages across the board were at least a magnitude slower than native ones.
Ideally, all languages would have native JIT compilers so that performance would no longer be such a crucial factor between them.
PHP, Perl and Python do all have JIT compilers. The issue is more around whether the compiled binaries are cached or not. This is why I’d have been interested to see a comparison between mod_perl and Perl CGI as the former will compile once and run many times where as the latter will JIT each page impression.
2011-12-19 1:58 am
Alfman verbose=1
Laurence,
Just to be clear. I said “native JIT compiler”, which would be better than interpreted bytecode cached or otherwise.
Presumably a benchmark with sufficient number of iterations will spend most of it’s time in the interpreter instead of in the source code parser, but I’d be interested to review stats anyone may have.
For my own small test, I created an empty noinline function and called it 100M times in PHP (PHP 5.3.6-13ubuntu3.2) and in C (with & without optimization). I confirmed that C did not optimise away the loop.
PHP (one loop) = 0.017s
PHP = 12.101s
C (no opt) = 0.384s
C (-O1) = 0.048s
So, the PHP loading/parsing time is negligible here. And that the execution time of the PHP loop is 250X that of the optimised C version. I’m not suggesting this is a critical test, since the loops are empty, but it does give an idea of interpreter overhead compared to native – even after parsing time is removed.
2011-12-19 1:04 pm
Laurence
Just to be clear. I said “native JIT compiler”, which would be better than interpreted bytecode cached or otherwise.
Sorry, yes you did
Presumably a benchmark with sufficient number of iterations will spend most of it’s time in the interpreter instead of in the source code parser, but I’d be interested to review stats anyone may have.
For my own small test, I created an empty noinline function and called it 100M times in PHP (PHP 5.3.6-13ubuntu3.2) and in C (with & without optimization). I confirmed that C did not optimise away the loop.
PHP (one loop) = 0.017s
PHP = 12.101s
C (no opt) = 0.384s
C (-O1) = 0.048s
So, the PHP loading/parsing time is negligible here. And that the execution time of the PHP loop is 250X that of the optimised C version. I’m not suggesting this is a critical test, since the loops are empty, but it does give an idea of interpreter overhead compared to native – even after parsing time is removed.
I don’t think anyone is in any doubt that C would out-perform PHP and Perl, it’s more the comparisons between different Perl and PHP engines I’m interested in (mod_perl vs CGI Perl, mod_php vs HipHop, etc).
That all said, it was definitely interesting seeing your figures

2011-12-15 6:51 pm
tony
Well, the front end PHP servers are usually (always?) stateless, which means they can be scaled trivially by running clusters of mirrored web servers in parallel. The same sort of scalability is not nearly as trivial for databases, and for that reason they tend to be much more problematic.
However that said, PHP is extremely inefficient. It’s worse than java or .net by a factor of roughly 100 according to the “average” row in the following benchmark:
http://www.csharp-architect.com/images/benchmarksJan2009Final.gif
So, with some hand-waiving, we’d expect a VM version of PHP to significantly reduce the quantity of PHP servers required to service a given load, and the excess servers no longer needed could be redeployed as less heavily loaded database servers.
This makes a lot of sense for an entity like facebook and shared hosting providers where they run servers at max capacity.
Those functions in the benchmark, how often is PHP required to do computationally complex operations? Most of the time it’s rendering HTML and pulling data from or pushing into a database. Very simple stuff. We’re not calculating Pi to the millionth digit or calculating jump coordinates with PHP. I can’t imagine you see any difference in a basic web page with a DB back-end using PHP or compiled C.
And Java has always seemed slower to me in implementation. Such a resource hog, and what’s worse as a sysadmin I have very little idea of what’s going on inside the Java engine. Neither do the developers, either it seems. So when stuff goes wrong, it goes way wrong.

2011-12-15 8:56 pm
Alfman verbose=1
tony,
“Those functions in the benchmark, how often is PHP required to do computationally complex operations?”
Consider things like dynamically generated graphics (like captcha). If PHP is the wrong language for that, what is the right language? Is that language available in your hosting package?
Most PHP pages do very little at a time, like maintaining shopping carts and constructing SQL strings, but aggregately the inefficiencies do add up, especially when the level of inefficiency is great.
I’m a little surprised that I’m the only soul here who seems to care about language efficiency. Oh well, it’s a sign of the times.

2011-12-16 12:17 am
tony
tony,
“Those functions in the benchmark, how often is PHP required to do computationally complex operations?”
Consider things like dynamically generated graphics (like captcha). If PHP is the wrong language for that, what is the right language? Is that language available in your hosting package?
Most PHP pages do very little at a time, like maintaining shopping carts and constructing SQL strings, but aggregately the inefficiencies do add up, especially when the level of inefficiency is great.
I’m a little surprised that I’m the only soul here who seems to care about language efficiency. Oh well, it’s a sign of the times.
Image generation typically uses a linked module compiled into PHP, written in C or C++, so there’s no slow down there.
Again, all the heavy lifting is done in the database. Language efficiency isn’t much of an issue in those cases. For instance, you can write a PHP-based web page that creates self-signed certificate and keys. The actually SSL happens in a binary module, not in PHP itself.
If you move the heavy lifting (sorting, etc.) into the application layer, then Java would be a more appropriate language. Or some functions written into C modules attached to PHP or other language.
2011-12-16 4:01 am
Alfman verbose=1
tony,
“Image generation typically uses a linked module compiled into PHP, written in C or C++, so there’s no slow down there.”
Correct me if I’m wrong, but this is what I’m hearing:
A) C/C++ should be used for CPU intensive tasks because PHP performs poorly on intensive tasks.
B) There’s no reason to make PHP efficient because CPU intensive tasks can be done in more efficient languages.
The reasoning is too circular for my taste.
The other issue I already alluded to is that not everyone has access to a web server where they can load their own apache/PHP modules because they’re running in a shared environment where everyone has to make due with the same stock settings.
2011-12-16 5:15 am
tony
tony,
“Image generation typically uses a linked module compiled into PHP, written in C or C++, so there’s no slow down there.”
Correct me if I’m wrong, but this is what I’m hearing:
A) C/C++ should be used for CPU intensive tasks because PHP performs poorly on intensive tasks.
B) There’s no reason to make PHP efficient because CPU intensive tasks can be done in more efficient languages.
The reasoning is too circular for my taste.
I think you’re looking a little too hard for some circular reasoning. A far more reasonable (and obvious) takeaway might be:
1: C/C++ is used in PHP where it makes sense to have something fast.
2: There’s no reason to do everything in C/C++, because much of what PHP does is fairly basic. For the CPU-intensive operations, use compiled code, somewhat the same way hardware uses ASICs and GPUs for specific operations (such as crypto).
The other issue I already alluded to is that not everyone has access to a web server where they can load their own apache/PHP modules because they’re running in a shared environment where everyone has to make due with the same stock settings.
That’s true, but by the time you get to where that becomes important, you’ve got a hosting plan that includes more than a stock environment (VPS, Amazon, etc.)

2011-12-17 12:13 am
Lennie
No large website uses stock PHP, they atleast use a bytecode cache like APC/eaccelerator/xcache and a cluster of memcached servers to not hit the database servers to do read-queries. Maybe a NOSQL solution like redis or whatever.

2011-12-17 12:48 am
Alfman verbose=1
Lennie,
“No large website uses stock PHP, they atleast use a bytecode cache like APC/eaccelerator/xcache and a cluster of memcached servers to not hit the database servers to do read-queries. Maybe a NOSQL solution like redis or whatever.”
You’re probably right, but so what? Even small shared-hosting web sites could benefit from the higher performance.

2011-12-17 1:22 pm
Lennie
I’m sorry, I should have quoted you.
I mean that the impact of a 100 times isn’t very realistic number in real world usage.

2011-12-14 9:37 pm
deathshadow
If they were so worried about performance, maybe they should stop using 380k of markup to deliver 8k of plaintext and a dozen or so content images?
You know, by moving all the static javascripting and static CSS out of the markup? Don’t even want to THINK about how long PHP is stuck with it’s thumb up it’s arse trying to build that markup, or how long their servers are stuck sitting there trying to compress it.
380k of markup generated by PHP, 500k of scripting, 220k or more CSS… and they’re blaming PHP for their speed issues? Whiskey Tango Foxtrot you morons! Here’s a tip, it’s called semantic markup, separation of presentation from content and leveraging caching models.
Though pissing all over your own site seems to be the order of the day with people vomiting up HTML 3.2 any old way and slapping a tranny on it… or the new trend of slapping 5 lip-service on it… Completely ignoring all the benefits of STRICT and the inherent advantages that come with the simple mantra:
“The less code you use, the less there is to break”.
Which is why I still say there’s NO excuse for the average user’s wall to break 50k of markup… which would do a hell of a lot more for their PHP issues than blowing all this time on a optimized VM would.
But of course, that would involve fixing the actual site code — god forbid.
Edited 2011-12-14 21:39 UTC

2011-12-14 10:10 pm
Alfman verbose=1
They don’t seem to be opposing goals. Isn’t it reasonable to do both? To advocate for separation of presentation from content, and also to improve the language performance?

2011-12-14 11:20 pm
deathshadow
It just seems to feed the trend of blaming the tools or throwing more hardware at bad code. Ideally, of course optimizing both would be the answer; but literally it’s blaming the tools for their own developers ineptitude.
As evidenced by the 50k of bandwidth per 1k of actual content. Optimizing PHP should be at the bottom of the list, not the top! Rather than fix their garbage code, they’re blaming the language… and that’s just not right.
Edited 2011-12-14 23:21 UTC

2011-12-15 6:32 am
deathshadow
I just re-read the article because something bothered me…
To help make debugging easier, Facebook’s engineers developed their own PHP interpreter, HPHPi, that closely matches how PHP code will behave when converted and compiled.
Ok, so to make up for the ineptitude of their coders they weren’t even using the regular PHP engine, and instead were/are using one with the parking brake on…
Unfortunately, HPHPi is slowâ€”and painfully so, despite efforts to improve it. Former Facebook software engineer Evan Priestly said in a post on Quora that HPHPi is “roughly twice as slow as PHP.”
Oh yes, this bodes well…
the performance of the HHVM interpreter is already 60 percent faster than that of the HPHPi interpreter
Now, I’m no mathematical genius, but 1/2*1.6=0.8
So this new “faster” vm their hooting and hollering about is 80% the speed of the normal PHP engine if you just left things alone and took the whip to your developers instead?
Oh yes, that’s made of win.
By the time they optimize it… wow, they’ll be lucky to be the same speed as the flat normal stock PHP. Way to go guys. Probably means if you put something like a caching accelerator like APC in place regular PHP spanks their VM by 50-80% or more depending on available RAM since most of the real bottleneck on PHP is the code to bytecode compile before running. (which is why bytecode cache like eAccelerator and APC work in the first place!)
But again, the difference between a non-JIT VM and a bytecode interpreter is basically splitting hairs.
Over-reliance on debugging tools instead of forcing your developers to learn to do things properly… I’d NEVER have guessed from the quality of code sent client side. Lemme guess, the same type of coders who never heard the phrase “the only thing about Dreamweaver that can be considered professional grade tools are the people promoting it’s use” or “the less code you use the less there is to break”?
Just more sleazing code out any old way and selling slower products as faster to the suits — business as usual for web development.
Edited 2011-12-15 06:36 UTC

2011-12-15 4:31 pm
Alfman verbose=1
deathshadow,
“But again, the difference between a non-JIT VM and a bytecode interpreter is basically splitting hairs.”
I don’t really know where you are getting your information, but the performance difference between native code VM and interpreters are huge. Please cite a benchmark showing otherwise.
Of course CPU may not be the bottleneck for everyone, however in my experience shared hosting packages do sometimes become CPU starved and the experience is terrible until they reallocate resources.
Maybe I skimmed over it too quickly, but I didn’t read where anyone was trying to accelerate PHP to compensate for bad coding. Isn’t making PHP more efficient is a good thing whether you think facebook needs it or not? I don’t understand why it’s controversial.
2011-12-16 12:03 am
ggiunta
You should read it again a bit better, I think 😉
FB developed a PHP compiler a while ago, called hiphop/hphpc (as in, turn php code to c++ then just compile it to native).
The code it produces is -already- faster than the interpreted bytecode by the php engine, but, to the surprise of many people, not so much. In part because the usual php web page does not do a lot of calculations, of course. Where the hiphop compiler really shines though, is in reducing cpu and memory used to render the same webpages – and that is very important to FB. It is a good thing for any website owner, but the cost involved in using the static compiler (loss in dev productivity and deployment ease) do not generally compensate the gain.
They also developed an alternative php interpreter, called hphpi, which was closer to the hiphop compiler in its syntax support, but way too slow for real life usage. That’s why they use it for development only. The interpreter evolved over time, but it is probably still worse than plain php (http://en-gb.facebook.com/notes/facebook-engineering/making-hphpi-f… – I think some of those optimizations have been part of php for a while), without even taking apc into account…
The “yet another alternative php interpreter, called hhvm” they announce might be slower than the php engine today, but it still has an edge: jit to native code. And, by virtue of that, and possibly using profile-guided optimization, they hope for hhvm to produce the fastest running code in the future.
While I’m no fan of fragmenting oss projects, I have always thought that alternative php implementations were a good idea, if nothing else to give some competition to the php-vm developers and to prove that the language is derived from a real spec and can be reimplemented.
Now, figures of hphpc or hhvm vs php+apc would be interesting.
And of course figures from parrot, if only its php support was in a decent state…
2011-12-17 12:23 am
Lennie
I think you are reading it wrong.
As I read it this is more like making the already cached bytecode run 2x faster and more efficient because it was cached as native code instead.

2011-12-15 6:34 pm
theosib
A bunch of people are going around in circles, raving about how Facebook shouldn’t use such an inefficient language or should do this better or that better, etc. Based on the content of what many of you are saying, I infer that (a) you are very talented at developing efficient web solutions, but (b) you have no clue what it’s like to run a code monkey farm.
If you have people writing bad code, then don’t blame the tools, right? But what if you don’t have anyone who can code well? The fact is, anyone at a company like Facebook with TALENT will not stay in the coding farm for very long. They’ll quickly be promoted into positions where they can be more useful to the company, e.g. architecting, optimizing, leading, etc.
In an organiation like this, to make things scalable, you find a handful of GOOD programmers (those that got promoted) to develop better tools so as to make the code monkeys more productive.
Why would you do something so insane? Because it’s more efficient and cheaper. Code monkeys are a dime-a-dozen, so you can hire lots of them and benefit from the parallelism. Yes, they’re slower (single-thread latency) than good programmers and make more mistakes, but when you consider dollars/hour, it’s more efficient to just hire more of them.
Data centers do something precisely analogous to this with servers. Rather than a small number of really fast servers for X megawatts, you can install a much larger number of lower-power servers and get higher aggregate throughput. And you only need a fraction of the number of fast servers for specialized functions.
The fact is, it’s easier to write PHP. Yes, it’s messier, and it’s a hell of a lot slower, but code monkeys can write usable code in PHP with a lot less training. (Some of this “easier” is an illusion. You’re probably more likely to make mistakes in PHP than some alternatives, but perceptions are what matter here.) Mark Zuckerberg actually probably started writing Facebook in PHP back when he was in college. When they turned it into a business, the traffic wasn’t heavy enough to warrant a change of infrastructure, so they kept using PHP. By the time they realized that they should have started with something better, it was too late. With millions of lines of code in PHP, it’s even more insane to throw it away when it’s just cheaper to throw more machinery at it. In this case some of the “machinery” is smarter compiler tools, written by a handful of experts, so that the code monkeys continue to scale well.
In theory, if they were to hire a smaller number of really talented programmers to port their code to a more efficient platform (say, loadable Apache modules written in C), they could massively improve the efficiency of their system, improving latency, slashing their ecological footprint, etc. But it would cost an enormous amount of money, because these programmers would cost way more, and the result would be a system that’s MUCH harder to extend, because it’s harder to tweak, requires more compile time for testing, and requires more expensive programmers.
In the end, it’s just best for Facebook to optimize for cost and for the efficiency of the code monkey (i.e. inferior engineers), even if that requires inferior tools.