H.264 is a video compression codec standard. It is ubiquitous – internet video, Blu-ray, phones, security cameras, drones, everything. Everything uses H.264 now.
H.264 is a remarkable piece of technology. It is the result of 30+ years of work with one single goal: To reduce the bandwidth required for transmission of full-motion video.
Technically, it is very interesting. This post will give insight into some of the details at a high level – I hope to not bore you too much with the intricacies. Also note that many of the concepts explained here apply to video compression in general, and not just H.264.


I had a fellow Ph.D. student working on a hardware codec that got rid of one of the “pay to play” portions of the H.264 codec. Google was highly interested in it (back in 2013). Odd that it is so ubiquitous, even though they offer free use to end users… the provider has to pay, as it is patented. From Wiki:
https://en.wikipedia.org/wiki/H.264/MPEG-4_AVC
“H.264 is protected by patents owned by various parties. A license covering most (but not all) patents essential to H.264 is administered by patent pool MPEG LA.[2] Commercial use of patented H.264 technologies requires the payment of royalties to MPEG LA and other patent owners. MPEG LA has allowed the free use of H.264 technologies for streaming internet video that is free to end users, and Cisco Systems pays royalties to MPEG LA on behalf of the users of binaries for its open source H.264 encoder.”
People have to pay for the patented codec because nobody has managed to make a true rival. On2 made some codecs which were essentially H.264 while avoiding the H.264 patents, and the results when compared with x264 (considered the best H.264 encoder) for the same amount of compression time were ridiculous. Of course, Google rigged the tests and used a mode in vpxenc which will take hours to encode a video (cpu-used=0) and made some good-looking numbers, but they were just that: good-looking numbers.
On2 has also aggressively optimised their default encoder for PSNR while completely ignoring SSIM in order to help in the “good-looking numbers” department.
Back in the real world, users will just download encoders and decoders from countries without patent restrictions (hello Handbrake and VLC), for everything else the price of the encoder and/or the decoder is hidden in the product price (be it smartphones or proprietary video editing software or OSes), usage for streaming is free, so nobody will use an inferior codec and wait forever for his videos to be compressed or get an inferior quality/size ratio just because some FOSS advocates think it is so much of a burden to download from a French website or because they like to make a political statement out of everything.
Edited 2016-11-06 21:18 UTC
If you download the encoder from another country doesn’t mean that you don’t break the law in your country and don’t need to license it. And even in France (or many other countries in Europe) they would need to license it as many H.264 patents are filed there too.
It is not just about FOSS advocates at all, it is also about small business and users. How crazy the patent situation can become you can see with H.265 where even the big software, hardware and streaming companies rather join Google’s efforts to create a patent unencumbered codec, than deal the licensing mess of H.265.
Quikee,
Yea, the statement about FOSS guys was a bit ironic. Who is more free: the FOSS guys who want everyone to have & use legally unrestricted codecs, or the crowds who outright ignore the legal restrictions on software because they don’t believe in the restrictions and simply use whatever codec they want? I don’t know, that’s actually a tough call. On the one hand it’s important to be legally free, but on the other hand those who ignore laws might actually have more freedom in practice.
VP9 is patent unencumbered. American software companies like Mozilla use it without paying any royalties to anybody, while for other formats like H.264 they have to pay or arrange for someone else to pay.
So, just think for a moment, since both VP9 and H.265 break bitstream compatibility with H.264, and VP9 is royalty-free while H.265 isn’t, there has to be something in H.265 in order for for-profit companies to pay to use it instead of going for the royalty-free VP9.
In fact, from various samples I ‘ve seen, the best VP9 encoders can’t even beat x264 when it comes to subjective visual quality (aka blind tests). Sure, if you sacrifice encoding time you will get a nice PSNR value from most VP9 encoders that will make you all happy as a benchmarker, but corporations who have to deliver the best quality using the minimum bandwidth will pay for H.264 (where compatibility with existing H.264 decoding circuits is desired) or pay for H.265 (where compatibility with existing H.264 decoding circuits doesn’t apply, such as 4K video) and won’t choose VP9 over H.265.
Look, I like royalty-free as much as the next guy, because it would reduce the cost of hardware like smartphones by a dollar or two, but I am also aware of a little thing called “generation loss” and “lossy compression”, so I do not want to mindlessly move my video collection from VP8 to VP9 to VP10 (yes, google is prepping a VP10) ’till they finally manage to beat the H.26x family of codecs. For me, its H.264 for anything 1080p and below, H.265 for 4K (unless the device records 4K in H.264).
Edited 2016-11-07 16:36 UTC
kurkosdr,
There’s a lot of corporate power plays going on. Apple and microsoft are significant patent stakeholders and actually stand to benefit from patent encumbered technology by using them against others. If you want your content to work for IOS users, apple doesn’t give you much of a say. It’s often said that microsoft made a lot more money from android than it did from windows phone because of patents.
That aside, back when it was H.264 versus VP8, my own tests concluded H.264 performed significantly better. It wasn’t even close, but it could have been that VP8 was too immature at that time. I don’t know if a gap still remains now. Does anyone here have experience with H.265 versus VP9?
Edited 2016-11-07 18:03 UTC
Netflix also uses H.265 for their 4K streams, despite the fact they hold no MPEG LA patents and have nothing to gain from using H.265 over VP9, instead they have to pay a license fee. Netflix uses H.265 (for their 4K streams) to deliver video to Android devices too which have VP9 hardware decoding circuitry, so it is not a hardware decoding issue either. So, there is got to be something drawing Netflix (a for-profit company) to the patent-encumbered fee-requiring H.265 instead of the royalty-free VP9, right? (also note that Netflix doesn’t copy video bitstreams from Blu-rays but generate their own bitstreams, so they could have gone with VP9 instead of H.265 if they wanted) Visit tthe wikipedia entry of VP9 for the links where Netflix descibes why.
PS: Allow me not to comment on the silly AppleInsider article. No patent number presented, no discussion about patent-encumberness. I don’t comment on fear campaigns and speculation.
Edited 2016-11-07 18:35 UTC
That’s why Netflix is part of AOMedia – because they are happy by paying for H.265.
Regarding VP9 – http://www.streamingmedia.com/Articles/Editorial/Featured-Articles/…
“Streaming Media: Where are you distributing VP9-encoded files? Assume compatible browsers, what about Android or compatible OTT?
Ronca: The primary VP9 targets will be mobile/cellular and 4K.”
Edited 2016-11-07 20:00 UTC
kurkosdr,
I’m not judging you.
As far as I understand it, no one else would have dared to use VP8 as long as there was a clear possibility of patent infringement since they themselves might get sued. Google paid some money to make the problem go away quickly rather than have to go through a multi-year process.[1]
I remember the story being very different depending on where I was hearing it from. It was quite jarring. On one end you had those who claimed that Google willfully infringed on the patents for several years and also tried to outmanÅ“uvre an already established standard in favour of its own.[2][3] On the other end you had sites like OSnews that described it as “Google called the MPEG-LA’s bluff, and won”.[4]
[1] http://xiphmont.livejournal.com/59893.html?thread=310261#t310261
[2] http://www.svt.se/nyheter/utrikes/google-brot-mot-upphovs-och-paten… (Swedish)
[3] https://marco.org/2013/03/09/google-webm-infringement
[4] http://www.osnews.com/story/26849/Google_called_the_MPEG-LA_s_bluff…
TasnuArakun,
Yep, you are right. There are lots of mixed signals.
I would have much preferred for google not to negotiate with the MPEG-LA. That way we could have seen the MPEG-LA case play out in court. I just hate that the agreement implicitly puts webm in a state of “encumbrance limbo” for parties who want to integrate webm into technologies that don’t fit under google’s agreements.
Edited 2016-11-07 21:06 UTC
I went through some old bookmarks and found this gem: http://arstechnica.com/tech-policy/2011/03/report-doj-looking-into-…
I wonder what came of it. Was it dropped after Google agreed to pay a licensing fee?
I also recall someone saying that MPEG-LA’s stance was basically that it was impossible to create a modern video codec that didn’t infringe on their patents – which is why they’d set up a patent pool before even looking.
TasnuArakun,
I don’t know, but I find it kind of ironic that the federal government would investigate companies for harm caused it’s own system of granting patent monopolies.
I’m not having luck finding details of the VP8/9 agreement other than the generic press release indicating that google would pay for the royalties. I don’t suppose there’s anyone here who has a definitive answer to this but what’s the scope they agreed on? Have any details ever been made public? It’s really unclear to me what projects the VP8 patent sub-license applies to.
VP8 is open source, but if we were to use VP8/9 algorithms in a new codec or a different kind of software, how does that derived work fit into the google / MPEG-LA patent licensing agreement?
Edited 2016-11-07 22:56 UTC
AFAIK, VP8 and VP9 are covered. VP10 wouldn’t be but they shifted to AV1 anyway so it is not that important. They know the patents and with a larger patent portfolio it shouldn’t be a problem to find a solution.
see also:
http://www.webmproject.org/cross-license/vp8/faq/
Quikee,
Thanks for that link! It specifically answers my questions and is more informative than anything else I’ve read. And it’s from an authoritative source as well… Why didn’t I find this earlier? Bah.
http://www.webmproject.org/cross-license/vp8/faq/
And yet everybody is using VP9 and selling hardware and software implementations in the USA, and MPEG LA hasn’t sued anybody and hasn’t asserted a single patent number.
Google paid MPEG LA some money to drop their FUD campaign to avoid a a death of the format from lack of adoption from software and hardware vendors due to fear of some potential lawsuit.
I agree that Google negotiating with the MPEG LA gave MPEG LA’s claims a small amount of cred, but the fact everybody is using VP9 even without an official “cross licensing” speaks a lot. MPEG LA has no patents against VP8 or VP9 and it was all a FUD campaign to kill the format in its infancy. Good thing the DOJ took a look at this whole “we have patents but we won’t tell you which” that the MPEG LA was pulling when VP8 was announced.
Edited 2016-11-08 01:50 UTC
kurkosdr,
Obviously that’s what the agreement was: google will cover the royalties, and MPEG-LA won’t go after VP8/9. That agreement is already in effect. The patent cross licensing agreement being drafted is google’s agreement with us, not the MPEG-LA.
You know, this could be a case of the ends justifying the means. As long as WebM is royalty free for us, then who cares that google’s paying the patent royalties for it? It’s an interesting question.
But that’s true – isn’t it.
Google did the best thing possible – they settled and got a license which allowed to sub-license the patents to everyone that used the codec. If they would drag it to the court it would take a long time and cost a big amount of money to disprove that the VP8 doesn’t infringe those patents. The case would also bury VP8 and VP9 in the mean time because of patent uncertainty.
1. VP10 won’t happen – I’m quite surprised you didn’t don’t know about AOMedia and AV1 codec. That explains a lot.
2. Sure, re-encoding doesn’t make sense unless you have the original available.
3. For a backup x264 at a higher bitrate is probably the best choice. At lower bitrates it quickly falls apart (compared to x265 and VP9).
4. I’m not really interested and don’t look from a standpoint of a home user, but from a professional user or company and streaming companies viewpoint and potentially real-time video (teleconference) companies viewpoint.
Not that the rest of the article isn’t an interesting overview of compression, but the png is only 568kb, not 1015kb. Maybe Apple changed it since the article was written?
Also, the video is literally just 3 still images with a quick slide animation between them – there is very little motion in it to begin with, and it is also much lower resolution than the png. About 80% of it is just playing the same 3 frames over and over again…
Any old codec, even DiVX, would be able to compress it pretty well, maybe not quite as well – but close.
He would be better served calling this article “Video Compression is Magic”, because very little of what he describes is specific to h.264, and most of it predates it by decades…
Well written, and nice, but I would like to read more about quantization. I read several articles about it, and I still don’t fully understand it.
I do know it is the basis for ALL modern codecs, image, audio and video, from JPEG to MPEG to HEVC, VP9, MP3, Ogg Vorbis and more…
But a nice read. And yes, this article is generic about any video, so I agre with another guy saying to rename it to “Video codecs are magic”.
Edited 2016-11-06 03:39 UTC
Fourier transformations. Unfortunately I never learnt how to use them and I’m a bit too busy for it now.
I feel the article fails to properly explain the quantization step. Quantization is not about throwing away high frequencies. It’s where yo take your sampled value, in this case the amplitudes of the frequency components, and round them of to fit within your limited range of values. For 8-bit that’s one of 256 possible values. By reducing the number of possible values further you get even more compression. I don’t know how it works in H.264 (is the “frequency domain mask” really a thing?) so I’ll describe how JPEG does it. During the quantization step it uses a quantization matrix. Each frequency component is divided by the corresponding value in the quantization matrix. The divisor is larger for higher frequencies. This will reduce the size of the values and many of them will be rounded to zero. The zeroes can then be efficiently compressed using run-length encoding (like in the heads and tails example). It’s the values in the quantization matrix you control when you slide the quality slider in your image editor.
Wikipedia has a very good description of the steps that are involved in generating a JPEG image. https://en.wikipedia.org/wiki/JPEG#Encoding
General concept:
First, we move from the spatial domain (aka brightness values) to the frequency domain. Then, during lossy compression, the higher-frequency values are not discarded, they are just recorded with lower accuracy. How do you “record with lower accuracy”? I ‘ll explain:
Method:
The move to the frequency domain happens in blocks and NOT for the whole image as the dangerously misleading article Thom posted suggests (aka, the image is broken into blocks). Let’s assume a block size of 8×8. Let’s assume we are processing a single block and we have already moved to the frequency domain.. So… JPEG (and H.264) divide (with integer division) the resulting 8×8 block element-by-element with a “quantization matrix”. Here is how a “quantization matrix” looks like btw[1] By doing an integer division of the block by that matrix, you drop accuracy. For example, let’s say that the four elements (frequency components) at the bottom right corner of the 8×8 block are 113, 115, 118, 116. Now look at the bottom right corner of our quantization matrix[1]. Those four values will be divided by 56, 69, 69, 83 accordingly. So, the result will be 2, 1, 1, 1. Of course, during decoding you have to multiply with the quantization matrix (element-by-element) to “restore” the values. The “restored” values will be 112, 69, 69, 83. Notice how accuracy was lost. This happened because of the integer division.
Also, notice how, after quantization, we now have three same values 1,1,1 Hello run-length encoding! After quantization, we scan the table with the infamous zig-zag pattern[2] to put the elements of the (quantized) table into a series. Notice how our three similar values (1,1,1) will be grouped together by that pattern.
(that moment when you realise the horrible article didn’t even mention the zig-zag pattern).
Also, after run-length happens, all compression standards use a special scheme to encode those integers (value and length) where low integers (0,1,2,3) use the minimum number of bits. Variable-length coding we call it.
The savings quickly add up.
—
It is also worth noting that most pictures don’t have much in the high-frequency components anyways, so most high-frequency components (sometimes even half the block) will become zero after division.
—-
As another guy said, most encoders allow you to choose different quantization tables (which more often than not is a single quantization table which has all of its elements multipled by a “QP” factor to generate many tables). This allows encoders to have a “quality setting”.
—
Article also doesn’t mention that after “motion estimation” has been done, an “error” is calculated. Since motion estimation is nothing more than a dumb copy paste from another position in another frame (but the same size) some error will exist, which is essentially values that follow the same compression principles as outlined above (but with a different quantization matrix).
—
[1] http://images.slideplayer.com/25/8083681/slides/slide_6.jpg
[2] http://lh3.googleusercontent.com/-T_6oixAuBjs/Vd8ojQIbjLI/AAAAAAAB3…
Edited 2016-11-06 21:24 UTC
Sorry, but I couldn’t read this article, is so full of bull. I managed to get somehow past the overly-exaggerated intro up to the image comparison.
Seriously, a perfect looking PNG versus a horrible video with hideous compression artifacts and poor image quality (protip: save the PNG as JPEG with 60-70 compression ration, file sze in much smaller, image quality much better compared with H.264). After this, I just can’t take the author seriously.
I agree.
His analogies are bad, leaves important things out (intra prediction), no high level encoding diagram or other diagrams and most things are actually JPEG (the real magical format that is still relevant today – 25 years after it was released) with a bit of MPEG2 at the end. There is nothing about H.264 or what it makes this format magical.
He should’ve describe it in a better order: start with simple general purpose coding, then still image coding and human perception with describing techniques of JPEG in the order they are encoded, then go to video and describe techniques that are unique for video (with examples from MPEG2) and lastly describe what H.264 does on top.
Now it just feels really sloppy and deceives about H.264.
I feel the same thing: I’m not terribly impressed by this article. It’s all quite basic stuff and it could just as well have been describing MPEG-1. I’m pretty sure H.264 has a lot more interesting things in it’s toolbox. I’d like to know about all the stuff that’s been added over the years that make H.264 more efficient than MPEG-1.
I also don’t like how it handwaves the more complicated stuff by calling it “mindfuck”. By the way, I didn’t think run-length encoding counted as entropy coding and I think he is mixing talk about the frequency domain and the sampling theorem in a confusing way.
For a better introduction to digital media I urge people watch Christopher “Monty” Montgomery’s videos: https://www.xiph.org/video/ . I really wish he could do one on DCT and MDCT though. MDCT still seems like some kind of evil dark magic to me.
I agree with all the criticisms of the article, however that aside I’d just like to be on record saying that I do like when these kinds of articles are posted. It’s a refreshing change away from IOS/android/MS/etc that dominate the headlines. So, more articles from left field please
In JPEG, entropy coding is huffman + significant bits of 16 bit integers. Strictly speaking not RLE, but I would put both in the same family of compression techniques, with RLE being the inferior method.
I do agree with the general critique of this article not having anything directly to do with H264 though. More like a primer on the basic ideas used in lossy image and video compression. It also leaves out half the important stuff, like predictors and other transforms.
Captain Disillusion had a bit of this in his “Reptilian Bieber-mosh” explination of how video artifacts give rise to people looking like lizards. His imagery of how the p-frame “wears” the i-frame like a skin is sort of cute:
https://www.youtube.com/watch?v=flBfxNTUIns
https://www.reddit.com/r/programming/comments/5b31gt/h264_is_magic/