The eXtensible Markup Language (XML) provides a flexible and efficient way to store, transmit, and express data. The open source community has produced an impressive lineup of XML editing utilities. In this article Ryan Paul takes a look at some of the most useful.
-Huge community (albeit quirky) http://www.emacswiki.org
-Elisp extensibility
-Python integration via PyMacs
-Broad usability across operating systems/window managers
-Wide spectrum of other applications available. Your training investment in Emacs will continue to pay dividends.
-Ridiculous feature spectrum. Rectangular edits!
-Ability to emulate lesser applications.
yes i agree, i think xml is used too often when its not the best solution. too often xml adds overhead, and in cases the xml representation is now powerful enough (xml is a tree structure).
people say – well you can look at the DTD, schema, etc .. but what’s wrong with good old fashioned documentation of any other transfer format?
I’ve NEVER heard anyone call XML efficient.
There’s nothing revolutionary about XML. Lisp programmers have been doing it since the 60’s.
My opinion: Welcome to 45 years ago.
Efficient and fast execution are mixed up often, but are not the same. Neither is efficient and low memory consumption.
The big advantage of XML is that since it is quite popular a damn lot of tools and libraries simplifying the processing of XML exists.
XML is efficient in terms of human resources needed to process it. But it is not efficient in terms of computing resources.
Well, which kind of resources is more expensive?
Carsten
XML is a child of the late 60s ๐
In the late 60s IBM developed GML (General Markup Language) as a meta format for data. Some year later it was standardized and became SGML (Standardized GML). Then some additional years later James Clark et al. simplyfied SGML and created XML. But XML grew fat and is actually more complex than SGML was ๐
Carsten
I think XML has followed the way of C++. It wasn’t a bad idea in the beginning, but it sadly bloated…
Your forget one of the greatest XML Editor, I mean mlview
http://www.mlview.org
I don’t understand why you don’t mention it, this sucks…
There’s one major problem with Emacs:
As far as I know Emacs supports only Relax NG schemas while the W3C approved official (and more and more used) standard is XML schema instead. Relax NG itself may be excellent and avoids a lot of the complxity of W3C schemas but – besides of not being a W3C standard – they cannot do everything with that schemas can.
There would indeed be room for a good open source XML editor with integrated validators. Something like the proprietary editors Oxygen or XML Spy that can do not only basic XML but also XSL(T) and understands both DTD and schemas etc. I’ve ben looking for such an integrated XML/XSL/schema editor for some time but there seems to be none available yet.
There are GUI/non-expert-oriented programs like Conglomerate and Mlview and a few other editors like Treebeard for XSL transformations, but frankly they are quite far from the ease of use and features of Oxygen and other proprietary editors.
AS to XML (& SGML) sucks messages… maybe sometimes, but you cannot help that XML solves many problems better than other solutions, like changing/transforming data (of various original formats) between applications. XML/SGML is also the basis of many good open source data formats, and the basis of (X)HTML and thus of the whole WWW too. Also most things that W3C does in order to develop better Internet standards seem to stem from XML.
XML is neither good nor bad, it just depends on whether it’s used appropriately. Two examples from my work in chemical process scheduling:
– XML Good: Before I switched to XML for my research, I was using a custom text file data storage format. Switching to XML added a standardised basic syntax, formalised the data structure, made interoperation easier and allowed me to use a ready made and tested parser.
– XML Bad: A commercial scheduling package I worked on for a few months switched from a very simple text file data format to an XML format. However, the extra features of XML lured them into developing a massively over-designed and complex system, using inheritance and all sorts of weird ideas and to switch to a CORBA client/server design. Result: 50Kb of text in version 1 becomes 2MB of XML in version 2. It takes longer to generate, transfer and parse the XML than it does to actually run the solver. Version 1 would run on a Sparcstation – version 2 required a top-end PC.
Good points.
It seems that whether one likes XML files or not, XML is here to stay… XML has many good points. But, of course, sometimes XML might just increase bloat too, and some other solution might be much better, simpler and leaner…
A few more good XML things: Do you like vector graphics like SVG? It is XML. Do you or the websites where you visit use RSS? It is XML. So is XHTML. You can transform between various file formats via XML. Also new operating systems seem to use XML more and more. The list goes on and on.
I’m sorry, but XML is not efficient in human terms either. This is what Carl Sassenrath has to say about it:
“Was XML Flawed from the Start?”
http://www.rebol.net/article/0108.html
“XML, SGML, HTML, CSS, XSL, XHTML, DHTML, SHTML, …”
http://www.rebol.net/article/0110.html
Is the best XML editor I’ve find so far
http://www.xmlmind.com/xmleditor/
Multiplatform (Java), heavily configurable, WYSIWYG through CSS2, etc. Not free though. However you get the source code with the Professional version, and there is a free Standard version.
Po
No.
I’m fed up with people saying “CPU’s are fast, lets do it in a more flexible way; We can afford the bandwidth required”.
Yes you can, if you want to destroy the advances made in CPU speed. I want my CPU power back!
inter application communication should use a format efficient for CPU’s, not humans: Binary.
IMHO.
…it depends on what your doing. And for some purposes, XML (whilst a bit clunky as Mr Sassenrath points out) is good, or at least, good enough. But making something XML just for the hell of it (as it sometimes seems) is rather silly…
Have you tried one of the Binary versions os XML? There’s at least two implementations of XML as binary… Maybe it could help your program, and still easy to “export” as XML when/if you need…
I liked the Rebol link, but I think the problems of XML are more insidious.
XML asserts data are hierarchical.
Maybe a lot of data are, but I think that XML is simply going to die, as attempts are made to make it query like relationally-arranged data.
XML is clearly a superstar in the Over-hyped Technology Hall of (F|Sh)ame.
If Berners-Lee have had the idea to use Lisp S-expressions instead of HTML to do his hypertext stuff, I think the whole internet would have gone far beyond its current state. Imagine every web page as a Lisp/Scheme program instead of that mixing of inefficient syntax and ugly <script> tags…
so instead of <> it would be full of ()?
Free as in beer download available for non-commercial use, not great but better than all of the OSS ones I’ve tried. Even supports SVG. Unfortunatley it doesn’t seem to support HTML
I thought “extendable” was the correct diction. ๐
Are technologu forced by biggest hardware vendors… Look, XML and (especially java) was designed for other thing than they’re used to solve… Nothing strange in fact that big hw vendors (IBM, Sun etc…) adopted those technologies to be the mainstream of server systems development – why? Because it gives them the chance to sell few generations of new hardware… Java is portable language – no, Christ, java is no more portable than posix/c but it eats hardware and gives money to vendors…
http://www.onelook.com/?w=extensible
And of course: http://en.wikipedia.org/wiki/XML
๐
inter application communication should use a format efficient for CPU’s, not humans: Binary.
Some ideas:
– “Binary XML proponents stir the waters” (Michael S. Mimoso)
http://searchwebservices.techtarget.com/qna/0,289202,sid26_gci10275…
– W3C & binary XML:
http://www.w3.org/XML/Binary/
– “How do we make XML faster?” (Martin LaMonica)
http://uk.builder.com/architecture/web/0,39026570,39233479,00.htm
BigZaphod, of course there is no doubt that the word “extensible” is commonly used, and therefore appears in dictionaries. We know that.
The point is that it is just a legalized mistake. The correct word is “extendible” (not “extendable” as I said earlier, sorry!).
The rule seems to be that the “-ble” suffix be appended to the truncated verb form. Since the verb is “to extend”, the only correct word is “extendible”.
If you say “extensible”, you are attaching the suffix to the trucated word “extension”, which is a noun, not a verb. This cannot be made.
…also this new article might show some signs of hope…:
– “W3C outlines XML speed boosting plan”
Martin LaMonica, 27 January 2005
http://uk.builder.com/architecture/web/0,39026570,39234718,00.htm
The efficiency problem is much broader than would be solved by simply converting XML to binary. Besides, this nullifies the oft-touted advantage of XML of being human-readable (which even in text form is hardly true). As emacs said, one of the problems is that XML only handles hierarchical data. As Carl Sassenrath points out, this leads to several occurrences of completely different formats in related standards, even embedded in XML, wherever it was too obvious that it would be horrible to encode that format in XML itself. Even worse, a hierarchical format could have been encoded much simpler, because XML syntax is not even hierarchical, but stream-oriented.
This is really independent of the distinction between human-readibility and machine-readability. If the syntax is hard to parse for a machine (needing several completely different languages, and the syntax not matching the structure), it’s also hard to read for a human.
Summarizing, the extensible property of XML leaves a lot to be desired, because the format is not generically suitable; the human readibility is very suboptimal; and the efficiency both in terms of human and machine resources is a joke. It’s all the wrong solutions to the wrong problems. Its only saving grace is that it’s a standard. It should really be a niche standard, for things it’s not unsuitable for, like maybe OpenOffice documents.
There’s a similar thing that happens with defend, defensible and defendable. Both are acceptable, although checking which country you’re in before handing over a thesis might be worthwhile
TBH I sometimes think English has more words that are exceptions to rules than words that conform.
It is usage that defines a language. If it is in the dictionary, it is certainly legitimate as that indicates it is in usage. I’m not really sure what you’re trying to say other than to try to claim that, perhaps in your view, english is not evolving according to “the rules.”
Everyone knows english has tons of exceptions to the oft-stated rules. I would tend to agree that there are possibly more exceptions than rules! This is simply a sign of an evolving language. English today has many rules that were not rules only a few hundred years ago. Does that mean all of our modern words aren’t proper english?
I don’t know the origins of the word extensible, but I do know that its common usage is in the software world. You would describe a TV antenna as being extendable because it can extend physically, whereas I generally hear “extensible” used to describe a framework or platform that can be extended metaphorically. So the two words actually do serve different purposes, because they clarify an inherent vagueness in the two meanings of thw word extend: physical and metaphorical.
English is a living language.
Have a look at these two links:
http://machaut.uchicago.edu/cgi-bin/WEBSTER.sh?WORD=extensible
http://65.66.134.201/cgi-bin/webster/webster.exe?search_for_texts_w…
Both are from Webster’s dictionary. The first is from 1913 and the second is from 1828!
Your off-topic bout actually nicely illustrates how important and subtle language is for conveying meaning, which should make it easy to see that XML is like a crowbar used to comb your hair! (Yes, it hurts. ๐
“There’s nothing revolutionary about XML. Lisp programmers have been doing it since the 60’s. ”
[XML is not S-Expressions]
http://www.prescod.net/xml/sexprs.html
[ Kaj (IP: —.bbned.dsl.internl.net)]
“The efficiency problem is much broader than would be solved by simply converting XML to binary. Besides, this nullifies the oft-touted advantage of XML of being human-readable (which even in text form is hardly true).”
Depends on your tools.
[ mik (IP: —.devs.futuro.pl]
It’s about programmer efficiency, not machine efficiency. There’s no Moore’s Law that applies to programmers.
[ Treza (IP: 194.250.226.—) ]
History illustrates it would have gone farther if it hadn’t been divided into two competing camps.
“Depends on your tools.”
You’re not getting what I’m trying to tell you. No tool can undo the fundamental unsuitability of XML for most tasks it’s used for.
I find it very ridiculous how many people in here are bashing XML without even giving an alternative!
I would say most of the time the speed and the processore overload of parsing XML is negligible.
For me XML solves most of the problem I had when I invented every time a new data format to parse!
> The eXtensible Markup Language (XML) provides a flexible and
> efficient way to store, transmit, and express data.
*Cough*
XML efficient? That’s a joke, right?
“I find it very ridiculous how many people in here are bashing XML without even giving an alternative!”
I didn’t want to say it, but if you want an alternative (besides the well known Lisp and Scheme people have mentioned here), I have to say REBOL. I pointed to it earlier in the two articles I mentioned:
http://www.rebol.net
If you want an open-source solution, there are plenty of other formats, from classic to modern. A popular one is YAML:
http://www.yaml.org
“I would say most of the time the speed and the processore overload of parsing XML is negligible.”
That must be a joke. We’re talking overhead of one or two orders of magnitude. Ben gave an example here earlier where the size of data needed by a commercial package blew up 40-fold. These cases are real, but they can’t be explained by a direct comparison between encoding formats alone. Everybody here who tries to defend XML says that tools should be used to circumvent the shortcomings. This is a big problem in itself, because on top of the inefficiency of XML itself, the need to use ever more layers of tools to solve problems in XML and in other tools leads to a chain reaction of bloat.
“For me XML solves most of the problem I had when I invented every time a new data format to parse!”
That’s true; that’s the biggest reason why XML has become so popular, unfortunately especially for the wrong problem domains. It’s convenient for programmers. At least they think so, because they can avoid writing a parser, which is admittedly a fairly deep technical skill that is taught in academic computer science classes. However, with this XML format and parser comes the need to learn a whole stack of standards and tools. It’s not at all a time saving, it’s just the circumvention of something that most programmers think of as difficult.
Actually, I think that REBOL and even YAML are still too complex for many formats. Someone here defended XML by saying that modern operating systems are using it more and more. Well, I happen to know that in the entire Syllable operating system, probably the only place where a bit of XML is used is in the separate Chat application, which needs to read and send simple XML messages via the Jabber protocol. This is a useful place for XML, because Jabber simply demands it, and it’s communication with the outside world, which is one of the few things that XML is fairly good at. For the rest, our file formats are mostly classic text formats. Nevertheless, for the build system I needed a more flexible format. I created a format that is hierarchical, like XML, but not at all bloated. It’s much simpler than even REBOL and YAML (let alone XML) and it’s parsed by about fifty lines of Ruby code. It could just as well be written as a simple C or C++ parser, so this is magnitudes lighter and faster and easier to learn than XML. Here is an example file that specifies the complete compilation and installation process for GNU BinUtils. It’s an interesting example because it contains a lot of different constructs in a simple, generalized structure:
http://cvs.sourceforge.net/viewcvs.py/syllable/syllable/system/apps…
“You’re not getting what I’m trying to tell you. No tool can undo the fundamental unsuitability of XML for most tasks it’s used for.”
Well I’m addressing your “readability” aspect. That’s a function of your tool (unless everyone is reading binary directly that is). That also happens to be what the original article is about. Not “1001 ways to dislike XML”.
That must be a joke. We’re talking overhead of one or two orders of magnitude. Ben gave an example here earlier where the size of data needed by a commercial package blew up 40-fold.
I can give another example of that. A company I worked for used XML to store information about some hardware that their program needed. As they wanted to keep this information secret, they made an ‘XML-compiler’ that I assumed converted it to binary, and encrypted it. In fact, it just removed all the spaces, and encrypted it!
This lead to massive files, and made the program really unresponsive as it parsed multi-megabyte XML files.
Another problem I think there is with XML is reading it into the program. Current XML parser interfaces are almost all rubbish. Especially MSXML4.
I’d be a happy man.
All I need is an editor that you input XML on one side
then XSLT on the other.
Then click a button.
And voila – xHTML !!
Is it possible in Linux?
I know you can do that with XMLSpy or Visual Studio .NET
(programmatically)
I guess I will have to check every single link on the article.
But I’ve been searching that for ages and never found anything decent so far
“Well I’m addressing your “readability” aspect. That’s a function of your tool (unless everyone is reading binary directly that is). That also happens to be what the original article is about. Not “1001 ways to dislike XML”.”
Like I said a few posts up: that’s what XML users think, that you need tools for everything. And that’s the biggest explanation for all the bloat that comes with XML. Did you read the two Carl Sassenrath blog articles on rebol.net that I linked to in the first posts in this thread, or anything on yaml.org, or my example of a Syllable Builder recipe? There you can see that it’s a mistake to think that you have either binary or XML, or maybe hard to read Lisp. It’s very well possible to design a text format that’s flexible, powerful, efficient for a machine and still easy to read and write for a human. Which means that specialized tools become an option besides using classic tools like text editors. If you have a specialized tool, it should add value instead of being a necessity without which you can’t work.
It’s the outstanding feature of XML, it is both hard to parse and elaborate for both Computers and Humans.
๐
There’s a huge flaw in the XML design, and that is, how end-tags are implemented. This is valid XML:
<doc><person><name>Killroy</name></person&g t;</doc>
XML is strictly hierarchical, and that combined with end-tags, as they are, can lead to sincere problems:
<doc><person><name>Killroy</person></name&g t;</doc>
I’ve only switched 2 of the end-tags, and now it’s NOT valid XML. Those end-tags also make the format bloathed.
An alternative:
[doc [person [name "Killroy"]]]
By using blocks (like []), you can’t switch the ‘end-tags’ (it gives same result), so you can’t make the data invalid that way, and it takes up less space. This format is called RebXML (designed by me) and is a REBOL representation of the XML format. There are scripts to convert between this format and XML. You can find specification of this format here:
http://home.tiscali.dk/john.niclasen/rebxml/rebxml-spec.html
That went wrong. Let me try to repeat those 2 XML-lines:
Valid XML:
<doc><person><name>Killroy</name></person&g t;</doc>
Invalid XML:
<doc><person><name>Killroy</person></name&g t;</doc>
The 2 “&g t;” in the post above should be greater than charactes. *sigh* ๐
“Which means that specialized tools become an option besides using classic tools like text editors. If you have a specialized tool, it should add value instead of being a necessity without which you can’t work.”
They’re still an option. But the point is that “readability” is a function of the tool. As long as there’s a “go between” a bunch of bytes, and you. That will always be true. It’s the “go between” that gives those bytes meaning as far as the user is concerned.
Plus it’s a bit of a moot point anyway, considering most programmer editors already understand XML (The subject of the article, remember?). The question which all the “I hate XML posts” are deviating from is “What is the best editor for the job?”. So at best, you all are off-topic.
I still say you’re not getting it. Readability may be a function of the tool in the XML world, but it’s not in REBOL, YAML and other sane formats. Well, did you read any of the information I provided about them?
I also beg to differ that this is moot and off-topic. We’re still talking about XML tools, just not specific tools, but their fundamental nature. And I don’t understand how someone can consider performance differences on the order of 10 times and 100 times a moot point – both machine performance and programmer performance. By the way, I never said that I hate XML, although I agree with you that it gives me 1001 reasons to. ๐
“The 2 “&g t;” in the post above should be greater than charactes. *sigh* ;-)”
The very fact you’re having these problems is because this site is encoded in HTML, a form of XML.
I agree that XML has its merits. The reason XML always stirs the pot when brought up on places like this I think is because it’s become so widespread so fast, and many people use it where it’s really not appropriate from a technical point of view. It’s a matter of priority; do you want to do it The Right Way ™, or are you lazy and say “let’s go XML” because everyone will understand it when they see it.
I’ve recently come across bad usage of XML in the form of a data provider here in Sweden that deliver their data as XML, when the structure of the data really calls for something else. The data is just a long list of articles, where the document model of XML makes no real sense. I’m pretty sure I know how the reasoning for using XML went at the data provider. XML was the easy way out, but sometimes lazyness is what makes and breaks a project.
On a personal level, I’m not that dismayed with this data provider going with XML. I keep getting these e-mails saying “can you help us get this format into our database in this way”, and I know I can say “well, yes I can” and then write something up with libxml2’s Reader API and get food on my table. That would not necessarily be the case if they had gone with a more efficient (or should I say technically correct?) binary format.
XML is a mixed blessing. Just my two cents.
Now back to topic; Free XML editors suck! Anyone want to help me write a good one with wxWidgets+libxml2?
“I still say you’re not getting it. Readability may be a function of the tool in the XML world, but it’s not in REBOL, YAML and other sane formats. Well, did you read any of the information I provided about them? ”
Rhetorical question: Do you read YAML, REBOL, or anything else directly in binary, or do you use an editor to interpret what those bytes actually mean? Sane, or otherwise is a function of your tool. The computer doesn’t give two wits either way.
“I also beg to differ that this is moot and off-topic. We’re still talking about XML tools, just not specific tools, but their fundamental nature.”
The “fundamental nature” of an XML tool is to “interpret”. That’s it. The “fundamental nature” of XML however isn’t the subject of the article.
“And I don’t understand how someone can consider performance differences on the order of 10 times and 100 times a moot point – both machine performance and programmer performance.”
Since you raised the issue. Here’s your homework for today. Try coming up with technology that can’t be “misapplied”. Bet even your YAML, and REBOL can be misused as well. Guess we better stop using it.
” By the way, I never said that I hate XML, although I agree with you that it gives me 1001 reasons to. :-)”
The problem lies in the fact that most complainers don’t understand the art of compromise. In a perfect world, there would always be the perfect answer to every problem.
“The very fact you’re having these problems is because this site is encoded in HTML, a form of XML. ”
Since you like links so much…
http://www.w3.org/People/Raggett/book4/ch02.html
“”I still say you’re not getting it. Readability may be a function of the tool in the XML world, but it’s not in REBOL, YAML and other sane formats. Well, did you read any of the information I provided about them? ”
Rhetorical question: Do you read YAML, REBOL, or anything else directly in binary, or do you use an editor to interpret what those bytes actually mean? Sane, or otherwise is a function of your tool. The computer doesn’t give two wits either way.”
Does rhetorical mean that you don’t want to be critized? I’ll give a rhetorical answer then. I had a suspicion, but I didn’t want to insult your intelligence by suggesting that either you don’t understand that YAML and REBOL are text formats (although a binary format is being defined for REBOL), or that you meant that reading anything reguires a tool called editor because text files are stored in a binary form. That’s like saying that we need a tool that we call computer to read chips and magnetic disks, and that we need the tool chip and the tool disk to read molecules and atoms and electrons, and the tools atom and electron to read quarks and waves, and so on. I’m just trying to impress my teacher, since you gave me homework that started with particle physics.
I’ll tell you what most humans don’t give a wit about: that text files are encoded in binary form by most modern computers. I thought that you wanted to keep this discussion on topic, so when I was talking about tools, I meant specific XML tools, not Notepad or cat.
“”I also beg to differ that this is moot and off-topic. We’re still talking about XML tools, just not specific tools, but their fundamental nature.”
The “fundamental nature” of an XML tool is to “interpret”. That’s it. The “fundamental nature” of XML however isn’t the subject of the article.”
Your idea of a tool is rather limited, even above the level of electrons. With many tools, I prefer that they can save as well as load.
“”And I don’t understand how someone can consider performance differences on the order of 10 times and 100 times a moot point – both machine performance and programmer performance.”
Since you raised the issue. Here’s your homework for today. Try coming up with technology that can’t be “misapplied”. Bet even your YAML, and REBOL can be misused as well. Guess we better stop using it.
” By the way, I never said that I hate XML, although I agree with you that it gives me 1001 reasons to. :-)”
The problem lies in the fact that most complainers don’t understand the art of compromise. In a perfect world, there would always be the perfect answer to every problem.”
I guess you were looking in the mirror when you wrote this. I said that I don’t hate XML, but that I complain about its current overuse, resulting in many cases of human and machine performance losses of one and two orders of magnitude. You suggest that nobody should complain because that amounts to demanding a perfect world, that large numbers of projects with huge losses is a good compromise, and the many alternatives that would improve the situation are not perfect, either.
“”The very fact you’re having these problems is because this site is encoded in HTML, a form of XML. ”
Since you like links so much…
http://www.w3.org/People/Raggett/book4/ch02.html
”
I suppose you don’t like me backing up my statements with references, because then I can ask you to actually read them.
That history article ends in 1998. I assume you are aware that both HTML and XML were based on SGML, that HTML was since reformulated in XML as XHTML, and that XHTML was appointed to be the newest version of HTML by the W3C.
Look, I entered this discussion to clarify my amazement at the article calling XML efficient. I want to give people insight in why that’s my opinion, but if you want to keep twisting linguistics and then arguing about it, I have better things to do.