“Google’s efforts to improve Internet efficiency through the development of the SPDY (pronounced ‘speedy’) protocol got a major boost today when the chairman of the HTTP Working Group (HTTPbis), Mark Nottingham, called for it to be included in the HTTP 2.0 standard. SPDY is a protocol that’s already used to a certain degree online; formal incorporation into the next-generation standard would improve its chances of being generally adopted.”
I find HTTP interesting so I went digging in this story, which fails to provide a source. But as far as I can tell, it’s this http://lists.w3.org/Archives/Public/ietf-http-wg/2012JanMar/0098.ht… mailing list post.
Note that this is not about incorporating SPDY per se, it’s about finding out in IETF how to design the next HTTP to fix some of the same things that SPDY fixes.
The guys who initially developed this stuff had much much less powerful technology than we to today and they did just fine with text protocols.
FunkyElf,
“The guys who initially developed this stuff had much much less powerful technology than we to today and they did just fine with text protocols.”
I don’t completely agree. Obviously ordinary users wouldn’t care about a text protocol, so it’s just techies. Text was very nice when we lacked tools to decode and analyze packets, but these days anyone with the need to view raw traffic shouldn’t have trouble getting a hold of the tools to decode it. The most likely scenario is that the tool which captures the traffic would also transparently convert it to a readable format (ie wireshark & tcpdump). So even techies shouldn’t care.
The big question is how much extra bandwidth and cpu overhead is consumed by using text protocols? The way HTTP is used today, I think the overhead is negligible compared to the large payloads. However, I can conceive of future scenarios where the HTTP overhead discourages the use of HTTP to manage traffic context.
Today’s HTTP has issues with asynchronous/bidirectional communications, but assuming it gets fixed, there’s alot of new potential applications for it. Consider future applications where the client can open simultaneous bidirectional data channels to the server in a single multiplexed HTTP pipe. Let’s say it’s a video conference/whiteboard/application sharing utility all running over one HTTP browser connection. The overhead of using a text protocol to manage the multiplexing should start to raise eyebrows.
Besides, I don’t know of anyone who’s complained that SSH’s protocol isn’t human readable (or even it’s telnet precursor, who’s handshake was also in binary).
Edit: Did anyone say anything about converting to a binary HTTP standard?
Edited 2012-01-26 15:10 UTC
Text also has advantages beyond human readability though, like the availability of many quality parsers and the fact that text-based protocols abstract away endianness issues in the underlying text transmission protocol.
Neolander,
“Text also has advantages beyond human readability though, like the availability of many quality parsers”
My opinion is that binary protocols don’t need to be difficult to implement, sometimes they’re even easier to implement than the text parsers. It could go either way: SMTP is very difficult due to the complexity of ASN encoding, which is designed to serialize arbitrary hierarchical objects, others like Modbus borderline on trivial.
One of the problems with pure text is that we have to scan strings of unknown length which is far less efficient than knowing string lengths up front. It’s the difference between pascal strings and c-strings. With terminated strings, every single byte has to be checked. When the length is specified, data can be copied/compared at least 8 bytes at a time on many architectures.
Also, text based protocols like HTTP have to accept superfluous whitespace and 3 different types of newline encodings, all of which makes parsing even less efficient.
All this says nothing of unicode support, which if required makes the delimiter scans even more difficult due to erroneous byte matches within multi-byte unicode sequences.
“and the fact that text-based protocols abstract away endianness issues in the underlying text transmission protocol.”
Yes, endianness is a problem, however let me point out that it would be equally problematic for text-represented numbers if humans had not standardized on big-endian numeric representation for themselves. There’s nothing stopping binary protocols from standardizing a byte order (network byte order) in the same way.
Reading/writing textual numbers requires multiplying/dividing decimal digits one at a time. When using binary numbers, they can be processed in their entirety with a possible opcode for byte swap. Also, since every binary byte represents a valid number, no ASCII digit tests are required.
All in all, binary protocols are more efficient, but the question goes back to how large is text overhead compared to the payload itself? Probably not much. Only once HTTP is used for highly asynchronous/multiplexed communication will it’s overhead start to overtake the payload.
I think there is (legitimate) fear that binary protocols would be extended in proprietary & undocumented ways to make it incompatible, and that would be a damn shame. This is why it’d be crucial for the standard to mandate clients/servers break the connection on non-standard requests, it would force developers to get their act together with regards to total standard compliance.
Did you really mean SMTP when you talked about ASN.1 ?
Because I think you are mistaken, I’ve never seen it in SMTP. SMTP is an example of a text protocol.
All I could find about ASN.1 is a bug in the GSSAPI (authentication) of Microsoft Exchange.
Maybe you meant SNMP ?
Lennie,
“Did you really mean SMTP when you talked about ASN.1 ?”
“Maybe you meant SNMP ?”
Sometimes I’m baffled at how I write the wrong thing sometimes. Yes I wrote an SNMP client for a legacy DOS system using pascal. And I found ASN difficult to use from such a statically structured language.
I haven’t read anything indicating HTTP would become binary, but if it were to be I think a simple binary name/value container would do fairly well without much trouble.
Have you ever written parsers? It is vastly easier to parse binary protocols than text. Even JSON, with a relatively simple grammar, is much harder to parse than BSON, for example.
Endianness is not really an issue, if the protocol simply defines it to be one way or the other. Also, some binary protocols are byte-based, like UTF-8.
Zifre,
This is the first time I’ve heard of BSON. I see the mongodb project came up with it.
http://bsonspec.org/#/specification
I read the spec, and to be honest I don’t like some of the design choices they made. Some of the datatypes are fixed size, some strings are zero terminated, others have a length prefix, yet another has both a length prefix and is zero terminated (I haven’t the foggiest idea why?).
Whereas JSON encapsulates only primary generic datatypes, BSON has started enumerating application specific datatypes like MD5, RegEx, and UUID.
{“Name”: “Alfman”, “MD5″: (string)”md5bytes”}
{“Name”: “Alfman”, “MD5″: (Generic)”md5bytes”}
{“Name”: “Alfman”, “MD5″: (User)”md5bytes”}
{“Name”: “Alfman”, “MD5″: (MD5)”md5bytes”}
{“Name”: “Alfman”, “MD5″: (RegEx)”md5bytes”}
{“Name”: “Alfman”, “MD5″: (UUID)”md5bytes”}
In my opinion this is fundamentally flawed:
Problem 1: Application specific datatypes don’t deserve special treatment in a generic transport protocol. Why is SHA not present? What about RSA keys? What about a JPEG datatype?
Problem 2: The *meaning* of a variable is implicitly known by the code which uses it. If it’s *expecting* the MD5 field to contain an MD5 hash, there’s no need for the transport protocol to tell it that the raw bytes are an MD5 datatype. JSON/XML work fine this way, I can’t think of many applications that would benefit from overloading the MD5 field into different datatypes, which is probably always going to be an error.
So in my opinion, BSON is not a good model for a binary HTTP protocol. But I do think a simple binary name/value collection could work nicely.
Edit: it has occurred to me that you didn’t actually propose BSON would be suitable for HTTP, but merely used it as an example of a binary container. So maybe my reply is out of context to what you were thinking, but I’ll leave my comments anyway.
Edited 2012-01-27 03:27 UTC
Just to make it clear : what I meant was not that it is easy to write a text parser, but that thanks to the UNIX world, there are several quality general-purpose text parsing code and algorithms around the web, ready to be tuned for specific uses. I am not sure that the same can be said of binary parsers, where it seems to me that one is more likely to find one parser for each specific protocol/file format.
It is arguable that UTF-8, like ASCII, is more of the minimal binary support that any text-based protocol needs, though. This protocol is pretty much only good at transmitting text.
Edited 2012-01-27 05:59 UTC
Neolander,
“It is arguable that UTF-8, like ASCII, is more of the minimal binary support that any text-based protocol needs, though. This protocol is pretty much only good at transmitting text.”
Yes, I would argue that UTF8 merely a text variant and not binary by any metric which matters here. You still have to create text structures/delimiters to separate fields, etc.
Neat, I know Mark. He sponsored a talk of mine at WWW 2007 about httperf.