Berners-Lee Talks About W3C Reform and Reinventing HTML

Thom Holwerda 2006-10-30 Internet 31 Comments

“An old adage states that a frog will jump out of boiling water, but can be boiled alive if placed in cold water that is heated at a slow pace. Apparently, the process of making amphibian soup is not entirely unlike the process of cooking up a new web standard. Citing limited adoption of XHTML, Internet innovator and World Wide Web Consortium ringleader Tim Berners-Lee says HTML must be reinvented through a process of incremental change that will build on the existing standard.”

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

31 Comments

2006-10-30 10:21 pm

KaS_m
I think it’s sad that they’ve cracked under the pressure to give HTML more life support – let it die! XHTML itself is far from state-of-the-art, and the truly promising XML technologies are a much more difficult leap. A transition must be made eventually, and cobbling more stuff into HTML will only give lazy web designers more excuses to ignore the good stuff the W3C has been standardizing. They really should let HTML alone, and let it stagnate – web designers need to be forced to abandon it by its limitations, and that’ll never happpen if they start hacking around them.

2006-10-31 8:06 am

cerbie
OK, develop a browser that’ll do it. How many web designers are so lazy as to have test several browsers for every change? To try to get decent HTML, and sometimes make it ugly or break it so it works right in IE.

Life support for HTML exists whether the W3C will consider it the way to go or not. The more radical a departure they make, the less useful they become. Then we’ll be in another round like the old NS v. IE, where the W3C’s standard was only a loose base to really make a page with, and beyond that, you did it IE’s way or NS’ way, because there wasn’t a universal alternative. When it comes to truly replacing tables, FI, there still isn’t a good universal standard…

Sure, it needs to change, but there’s an entrenched market that is not going to budge so easily. If the W3C wants to be an important part of that, they need to follow and clean up, not lead.

2006-10-30 10:27 pm

LB06
It seems like an unwelcome but logical continuation to me. It’s logical because there’s hardly support any for XHTML. IE of course doesn’t support it at all. Konqueror, Safari & co have only partial support, and Mozilla doesn’t support incremental rendering with XHTML.

Many devs pretend to be serving XHTML, just by adding a proper doctype and use XML like syntax. But that’s not enough. If you serve it as text/html it will be parsed as HTML. Only if you serve the data with the application/xhtml+xml mime-type, it will be read as XHTML.

Another great advantage of HTML is that is is simpler and therefore more intuitive to use (No CDATA, no closing of tags that intuitively do not need to be closed).

It’s unwelcome, though, because XHTML is more strict. For example, it forces proper nesting of elements, and it forces you to close all elements.

If we want a page to look the same in every browser, we need a well-defined and strict standard. The current HTML is too loose. It’s open to interpretation, both semantically and syntactically.

For example, how could a browser know with a 100% certainty where a certain paragraph ends, if the p tag is never closed?

Maybe the best solution would be to continue with HTML, and take over the strictness from XHTML. On the other hand, XHTML/XML is far more future proof, so perhaps XHTML ís the right path.

Edited 2006-10-30 22:29

2006-10-31 2:49 am

saxiyn
For example, how could a browser know with a 100% certainty where a certain paragraph ends, if the p tag is never closed?

Of course it can. It’s what DTD specifies. Block tags are closed when another block tags are opened, and it’s perfectly valid SGML.

The problem is that when authors elide end tags (which is permitted and well-defined by HTML standard), they usually didn’t *intend* DTD interpretation.

2006-10-31 12:05 pm

LB06
Yes, syntactically it may be correct, but semantically it is still ambigous. Who says I don’t want to nest a table into a div, which is on its turn nested into the general layout table? That would be impossible to describe without the proper close-tags.

2006-10-31 12:45 pm

dmantione
Correct, but according to HTML both table and div require a close tag, so no ambiguity at all.

2006-10-31 4:14 pm

LB06
Yes, but my point is that XHTML enforces that each element is closed. If it’s not, it will NOT render the page, and throw an exception.

2006-10-31 4:08 am

Jack_Green
What we really need is for browsers to have a “developer” mode that turns off quirks mode completely and provides some sort of debugging console to display rendering errors. Its not just laziness that lead developers to writing bad HTML, its because that bad HTML works just fine. Its difficult to know whats broken.

I like to markup my sites in XHTML strict, but only because I think its the right thing to do, not because it benefits either me or my clients. From a developer perspective, its just not clear what, if any, advantage we might have by using the less forgiving standard.

2006-10-31 3:57 pm

Touvan
You can put Firefox into your “developer mode” (strict xml parsing) by sending the appropriate Content-Type http header.

Content-Type: application/xhtml+xml

http://keystonewebsites.com/articles/mime_type.php

2006-10-31 4:32 am

walterbyrd
As long as browsers keep supporting html, people will keep writing html, no matter what W3C squawks about.

2006-10-31 12:50 pm

walterbyrd
WTF? Why was this modded down?

I don’t get osnews, honest opinions are modded down; then they feature an article where the creater of Slackware is called an ass-hole.

2006-10-31 6:22 am

ValiantSoul
Whatever can be done to remove the inconsistencies of IE compared to other browsers is welcome!

I saw make something with a well defined set of standards, then kill HTML. Hopefully people will follow the standards.
2006-10-31 10:15 am

elvstone
Surprisingly little talk by Tim Berners-Lee for an article titled “Berners-Lee Talks About W3C Reform and Reinventing HTML”. Just a few quotes sprinkled out and discussed by the author. Nevertheless good write-up, and anything that can attract implementors to more closely follow specs is welcome.

Aron

UPDATE: Whooops. Missed the link to Tim’s blog where his actual talking is. *reading*

Edited 2006-10-31 10:16
2006-10-31 10:35 am

axilmar
HTML and any other static presentation format is doomed right from the start because there is no way to predict all possible needs for interactivity and presentation, and therefore custom solutions will inevitably become more popular than the standards and therefore create a bottleneck in development and adoption of those standards. The case of AJAX is a good example of the above.

What is needed is a programming language that is concise, allows the expression of trees as literals, can interface with the host system, and more importantly can manipulate itself…and what a coincidence, this language exists! it is called LISP!

LISP has S-expressions, which is a very nice and more compact substitute of XML, it allows code to be treated as data and vice versa…I am not actually proposing an existing implementation of LISP to take over HTML, but these core ideas of LISP are very important. The solution would be to take these ideas, add some modern ones, and voila: a programmable extensible distributed visual interface that replaces HTML and solves all the issues with ‘standards’…

2006-10-31 11:40 am

Morin
> The solution would be to take these ideas, add some modern ones, and

> voila: a programmable extensible distributed visual interface that

> replaces HTML and solves all the issues with ‘standards’…

Wow. That’s really naive.

HTML took off because everyone and his neighbour was able to write a HTML document by hand, with nothing but a text editor. With LISP, even many programmers are unable to use it for several reasons. Demanding a “language that can manipulate itself” really shows that you don’t understand this.

HTML standards were constantly diluted by adding new features, backwards-compatibility, and attempted market domination by Microsoft. The LISP world had fallen into little segments (clinging to different dialects of the language) even without any embrace & extend strategy from MS. I predict that no LISP-based web standard will last any longer than HTML did.

2006-10-31 2:10 pm

RandomGuy
Just what I was thinking, Morin!

There are two kinds of technologies:

The first are perfect, flawless, able to modify themselfs and maybe

even develop some sort of intelligence.

But they are so mind boggling complex that only a hand of people can understand and use them.

On the other hand there are technologies which have some minor flaws and cannot do everything.

On the other hand they can be used by mere mortals

and are mostly guaranteed to finish running in finite time.

Please, stop telling everyone and his grandma they should be doing everything in Lisp, just because it’s 1337 and 110% future proof.

It’s like amphibious vehicles:

There are times when you need them but usually it will suffice to just build a bridge and keep driving ordinary cars.

Regarding loose standards:

When I’m running and shout at somebody “Out of the way!”, I sure wouldn’t want him to reply “There was no verb in your sentence, I do not understand”.

I’d rather have a 99% correct solution in one minute than a 100% correct solution after a day of pain!

“Worse is better” and “Pareto principle” should come to mind…
2006-10-31 4:05 pm

axilmar
Wow. That’s really naive.

Not really. Please read on.

HTML took off because everyone and his neighbour was able to write a HTML document by hand, with nothing but a text editor.

The tool that is used to write a language is irrelevant to the language content. LISP can happily be written in a text editor.

For example:

(page title=’my page’ (

(label ‘hello world’)

(button text=’ok’ click=submit)

))

With LISP, even many programmers are unable to use it for several reasons.

…which you don’t mention…which makes me suspect your argument is totally bogus.

Demanding a “language that can manipulate itself” really shows that you don’t understand this.

Give a little thought on this and you will see why.

I predict that no LISP-based web standard will last any longer than HTML did.

You don’t get it, do you?

with a LISP-like environment, there need not be a web standard!!! it would be like Swing: a default UI implementation will exist, but programmers will be free to make their own if they really need to!

2006-10-31 4:58 pm

tomcat
The thing which ensures the longevity of HTML is its simplicity. Designers get it. Programmers get it. You don’t need a CS degree to create and use it. As a result, there’s a wide degree of variation in the correctness and style used in creating HTML pages. Browsers have learned to deal with this variation by tolerating incorrectness. We can certainly criticize this lack of formality but, ironically, it is the very thing that allowed the Web to grow so quickly.

One thing is certain: If you want to promote a new standard, you can’t just do it in a vacuum. Academics like Berners-Lee seem to forget this. There are plenty of pseudo-standards in this world that nobody really follows. You need to establish widespread tool and browser support. IE, FireFox, Opera, and other popular browsers need to support XHTML. FrontPage, DreamWeaver, Fusion, PageMill, HomeSite, GoLive, etc all need to support XHTML if you want it to have any chance for survival.

2006-10-31 7:25 pm

NotParker
HTML standards were constantly diluted by adding new features, backwards-compatibility, and attempted market domination by Microsoft.

That was Netscape (until they imploded).

Netscape kept adding proprietary extensions attempting to create a browser monopoly. (After stealing Mosaic from the University of Illinois)

2006-10-31 8:48 pm

eggman
Ssh. Don’t sully the ears of Netscape/Mozilla-loving mental midgets with the truth.

2006-10-31 5:41 pm

John Nilsson
I was just thinking a similar thing. Not conflicting though, LSIP may very well be the best thing for the browser.

What I was thinking is that there aren’t any reason to push for a semantics support in the browser. All the browser needs to do is render the information for the user. Thus the browser doesn’t need to know anything about HTML as long as the server providing it can translate it to something for the browser, or teach the browser how to do it.

One way to teach browsers about strange document formats today is through an <format>2HTML-XSLT. But there is no reason why it couldn’t be the other way around. XHTML2<new browser standard>-XSLT

And why stop there, the only thing you need is a way to teach the browser how to render your format.

Semantic formats can be independent of browser implementations, just as RSS found a way onto the web without browser support.

2006-10-31 3:09 pm

robilad
here’s a link collecting some of the blog posts :

http://www.quirksmode.org/elsewhere/archives/standardsw3c/index.htm…
2006-10-31 3:27 pm

pwjazz
1. Techies are interested in moving “beyond” HTML because it’s difficult to extract the content (much worse the meaning of that content) from all of the surrounding presentation.

2. There’s A LOT of content out there that A LOT of people have worked VERY HARD to create, and it’s almost all in HTML (or in proprietary database schemas). Throwing it out would be a huge waste.

So, in order to build a content-centric web that moves beyond HTML, the new web will need to be backwards compatible and by extension will need to be able to extract content from existing HTML. If and when someone comes up with a technology that allows us to quickly and accurately extract meaningful content from HTML, that technology will obviate the need to get rid of HTML.

I predict (oh shocker) that HTML, much like an Endogenous Retrovirus ( http://en.wikipedia.org/wiki/Endogenous_retrovirus ), will be a permanent fixture on the web. Mash-ups are already showing the way with technologies like web services and screen scrapers that allow us to start working creatively with content without having to throw out HTML altogether.

Lastly, because the revenue model of many web businesses is based on advertising sales driven by content, the real challenges to the semantic web will not be technical but legal and economic. Who owns the content, how are others allowed to use the content, how do I make money off my content when other applications use it without showing my ads?

2006-10-31 5:08 pm

tomcat
Lastly, because the revenue model of many web businesses is based on advertising sales driven by content, the real challenges to the semantic web will not be technical but legal and economic. Who owns the content, how are others allowed to use the content, how do I make money off my content when other applications use it without showing my ads?

These are important points. Because HTML mixes both content and presentation, it’s traditionally been difficult to extract the content without using hacky scraping algorithms. In a lot of ways, this has worked in favor of content owners who, while providing public access to their content via their own portals, don’t really want their IP (ie. the content) repurposed by other people, since unauthorized republishing dilutes the value proposition for advertisers. If you move to a world where content is already refactored away from presentation, then it becomes that much easier for content owners to lose their grip on their IP. I don’t foresee content owners being overly enthusiastic about such changes; especially given the current weight of support for HTML.

2006-10-31 8:51 pm

tryphcycle
“(After stealing Mosaic from the University of Illinois)”

please…chill on the shit flinging!
2006-10-31 9:03 pm

IkeKrull
The W3C is basically in the business of writing standards that are simply not implementable.

They apparently don’t know this because they don’t even try to implement them themselves.

Look at SVG – Not a single fully compliant SVG renderer, anywhere.

Look at XHTML – Not a single fully compliant XHTML render anywhere.

CSS2 – Not a single fully compliant CSS2 renderer anywhere.

Really, is there a single W3C standard that anybody has a complete implementation of, and is there anybody taking a look at just why that is?

Why dont they just write simple standards that are actually usable, and prove this to be the case with a free reference implementation others can learn from and/or build on, rather than overengineered and incredibly complex standards that only the worlds largest software companies have a hope (but no motivation to do so) of implementing?

I mean, isnt this a complete no-brainer?

2006-10-31 9:14 pm

eggman
But what meaningless trivia would the Lunix/Mozilla crowd have to brag about then?

2006-10-31 10:39 pm

helf
dude, go away. please. Pointless linux bashing is getting on, I’m sure, EVERYONES nerves.

Edited 2006-10-31 22:39

2006-11-01 12:57 am

Carewolf
This is exactly what they are starting to do, and what the article is all about. W3C is transform from a pieinsky Microsoft-hating organization to something more usefull.

2006-11-01 12:16 am

Angel Blue01
I’ll be taking a college Web design class next semester. Based on coversations with students who’ve taken the class, we’ll be taught HTML 4 transitional (I learned it on my own in 7th grade), no CSS, along with some JavaScript (which left one student thinking she was learning Java) and a little ASP.

Whatever the W3C does does not matter as long as old standards are taught and proprietary standards are used, if the new standards are not broadly supported.

That’s true of any language or standard, especially if its difficult to use to create the desired effect.

I always code to the spec most appropriate to the task. Frames don’t work well in XHTML so I use HTML quite a bit. Some HTML features have been dropped in XHTML and are difficult to work around so I may use the transitional doctype rather than strict. Regardless I always try to use valid code if possible.
2006-11-01 3:47 am

deathshadow
How like politicians the W3C is – they think somehow making a specification is going to suddenly transform everything – just as politicians think passing new laws will reduce crime. (All new laws do is turn law abiding citizens into criminals… Makes crime go up, not down!) It’s a matter of enforcement, and enforcement has been pretty lax.

If you look at the history of HTML, each release ADDED features and functionality – without breaking your existing code… Adoption was a given as each new version was a superset of the previous… at least until you get into ‘STRICT’.

The ‘strict’ rulessets for both HTML4 and XHTML 1.0 has one big problem – they are a SUBSET with multiple tags removed from the specification, and in XHTML’s case a whole new set of rules for how tags are constructed. Telling people that have been doing something for >5 years to completely change how they work, and to STOP using tags that used to work just fine is NOT going to endear yourself – and worse big ‘monolith’ sites like Yahoo, Google, MSN cannot be BOTHERED to rewrite their sites just because the W3C ‘says so’.

WORSE is that a lot of the ‘reduction’ in XHTML actually works against one of the ideals being promoted in web coding circles – the use of Semantic markup instead of ‘class soup’. Semantic markup makes a HELL of a lot of sense, reducing the numbers of classes needed, amount of code needed, and makes the site non-css browser friendly. Instead of <div id=”header1″> just use an H1 tag… instead of wrapping a class inside a class inside a class just use one class for the top level container, then format the tags INSIDE the class…

While this sounds good – there are things like using an unordered list for menus… Why an unordered list? Because they removed MENU from ‘STRICT’. A LOT of the tags they ‘deprecated/depreciated’ (depends who you ask, W3C uses the former… being I don’t understand how one can be condescending to a tag, I prefer to think of it being reduced in value/importance, so I prefer the latter) would be ‘more’ semantic than the handful of tags we are left with… this results in people putting DIV inside DIV inside div or resorting to classes when it shouldn’t REALLY be neccessary.

At the same time, they introduced STRONG and EM to the HTML4/XHTML spec – and sparked a firestorm of misunderstanding. You’ll get people saying you should never use B or I anymore, and people on the other side of the fence refusing to use the new pair. Just because strong and emphasized APPEAR by default as bold and italic, that does NOT make them replacements. If you are putting the title of a book in italics, you use I… if you are marking a section as muttering under your breath or a character’s thought, you use EM… and NEITHER should be used if quoting a poem or other reference, that’s what BLOCKQUOTE is for (or a div in the case of ‘original’ content). Likewise if you want something ‘louder’ like austin powers right after being thawed out, it goes in STRONG, while if you want something bold that would be read as normal text, then you use B… at least if you are planning on people who use screen readers visiting the site (which is the whole point!)

HTML 4 is a mess, with backwards compatability nonexistant in ‘strict’, which is why so many people stick with transitional… XHTML just takes that one step further by changing not only the available tag set, but the syntax itself. Combine this with the fact we’ve only had browser support ‘widespread’ for less than five years… that support being inconsistant between browsers – it’s no shock.

A rule of thumb often heard among developers of all kinds is backwards support for five to six years – this means that until this year web coders still had to worry about people running IE 5.5 – and we likely will STILL have to worry about IE 5.2 mac for a while longer too since for people who cannot/will not upgrade to OSX that’s their best (only) choice. (since iCab sucks). Given that IE 5.x has spotty XHTML support requiring even more ‘hacks’ than IE6 does, it’s not too surprising people haven’t abandoned regular HTML in the droves the W3C seems to have expected.

NONE of the browsers fully support it properly (Opera being the closest, Safari a close second) so if the tools aren’t there, neither is adoption – that simple.

They also seemed to assume that web coders would give a rats ass about cripples, people using screen readers, and mobile users – when the few who actually CARE about mobile use WAP, and the lions share of bigger sites really just don’t give a **** one way or the other… That a lot of people STILL use WYSIWYGS that don’t generate valid code (HotMetal, Frontpage, Visual Page) – or worse ‘valid’ code that’s overthought (Dreamweaver, Mozilla Composer) it’s no shock at adoption issues.

Bottom line, the W3C THINKS they are relevant when the majority of websites largely ignore them – and worse, have unrealistic expectations of both web developers AND timelines for adoption… As I’ve said before a published specification does not a standard make – adoption makes or breaks a standard… and given the number of sites that even VALIDATE I’d say HTML4 AND XHTML both come up WAY short on adoption.

Go ahead, try and Validate Google, Yahoo, E-Bay, Amazon… Oh yeah, great standard.

Sorry for the long post, but this is a subject that I deal with daily.

Edited 2006-11-01 03:52