Linked by Thom Holwerda on Thu 18th Jan 2007 15:29 UTC, submitted by Matthew Cruickshank
Features, Office Ten months after its 2.0 release comes version 3 of Docvert. It builds upon OpenOffice.org or Abiword and converts any word processing document to HTML, DocBook, RSS, or any other XML format. People can migrate away from Word with this tool, or integrate it into their tool-chain with its REST interface.
Order by: Score:
it is a nice tool but .
by antwarrior (1.8) on Thu 18th Jan 2007 16:52 UTC
antwarrior
Member since:
2006-02-11
Fans: 0

i cna't really access the website before i get RTFMd by the masses so I only have the article description to go by in the way of commen. It is a nice idea but will it really help in people migrating from MS WORD. If it is build on OOffice then it will rely on ITS word filters, and we all know that they are not suitable for reading all Microsoft Documents.

To help people move away from MS Office solutions a robust tool is needed to ensure, or give the user the confidence , that there will be minimal mangaliing of the documents formatting. There many situations where huge documents make it impractical for searching visually for formattiing errors.

If this tool where built on the windows side where most MS users are , and integrated nicely into MS Word ,where most MS will obviuosly be then I can see it
giving businesses an option to start migration ...this more or less applies to academic case and the usercase to a lesser degree. .....
now that i have reached this far , and the web page has finally loaded i realise that i have some what missed the mark with my commentary , :-( i guess i should reightfully say ,ignore the above

Another option: antiword
by situation (1.84) on Thu 18th Jan 2007 17:01 UTC
situation
Member since:
2006-01-10
Fans: 0

Another simpler option is called antiword. It basically dumps out a word doc as plain text, which is nice for piping around and whatever else. Not as robust as going to an XML solution or anything the tool in the article mentions, but hey it's portable, small, and has no dependencies.

http://www.winfield.demon.nl/

I'll have to try out the article's tool too, although it looks like once everything is setup, I could have just opened the word doc in OO.org and resaved as a different format...

RE: Another option: antiword
by eMagius (2.92) on Thu 18th Jan 2007 17:57 UTC in reply to "Another option: antiword"
eMagius Member since:
2005-07-06
Fans: 1

Antiword can do PDF, PS, and XML output as well. In my experience (admittedly with rather simple documents), Antiword does an excellent job.

Besides, Abiword uses wv2 for import of MS Word documents--and wv2 is available as a standalone console application. (KWord also uses wv2.)

I dont understand
by gemidjy (1) on Thu 18th Jan 2007 18:22 UTC
gemidjy
Member since:
2006-10-11
Fans: 0

If you have MS Office installed and you need to convert 'doc' file into a 'odt' file, then you'd have to have application that has to open 'odt' file. Then, each of the applications, Abiword, OO Writer and KWord can deal with both MS Office and Open standard formats, and that means you can simply do 'Save ass..' and save the 'doc' file into 'odt', HTML or plain text file (many more formats).

What is so spectaculous about this tool?

RE: I dont understand
by holywood (1.56) on Thu 18th Jan 2007 20:27 UTC in reply to "I dont understand"
holywood Member since:
2006-09-25
Fans: 0

maybe because they just released version 3 :/ !

So that explains it...
by Sphinx (2.84) on Thu 18th Jan 2007 18:44 UTC
Sphinx
Member since:
2005-07-09
Fans: 12

Could be why Office 2007 announced they are changing the file format, (snuck it in while everyone was blinded by the new tool bar), the old one is just too easily converted to a something usable.

RE: So that explains it...
by sappyvcv (2.36) on Thu 18th Jan 2007 19:13 UTC in reply to "So that explains it..."
sappyvcv Member since:
2005-07-06
Fans: 11

The new format is probably easier to convert than the old actually.

RE[2]: So that explains it...
by glarepate (2.16) on Thu 18th Jan 2007 23:02 UTC in reply to "RE: So that explains it..."
glarepate Member since:
2006-01-04
Fans: 0

The new format is probably easier to convert than the old actually.

What feature of the new format makes it easier to convert than the old one?

RE[3]: So that explains it...
by n4cer (2.6) on Thu 18th Jan 2007 23:18 UTC in reply to "RE[2]: So that explains it..."
n4cer Member since:
2005-07-06
Fans: 5

What feature of the new format makes it easier to convert than the old one?

It's publicly documented XML rather than internally documented binary.

RE[4]: So that explains it...
by hal2k1 (3.16) on Fri 19th Jan 2007 05:03 UTC in reply to "RE[3]: So that explains it..."
hal2k1 Member since:
2005-11-11
Fans: 5

//
{{What feature of the new format makes it easier to convert than the old one?}}

It's publicly documented XML rather than internally documented binary.//

I beg to differ.

http://www.consortiuminfo.org/standardsblog/article.php?story=20070...
http://www.groklaw.net/article.php?story=2007011720521698
http://www.grokdoc.net/index.php/EOOXML_objections
http://www.grokdoc.net/index.php/EOOXML_at_JTC-1

The new version of Microsoft Office formats (known as OOXML) is publically documented obscred, internal (and unspecified) Microsoft-dependent, reinvent-the-wheel-at-every-turn-in-order-to-avoid-open-standards, locked-in XML.

The best advice is to avoid OOXML like the plague.

DO NOT use Office Open XML format to save your documents in. If you save documents in OOXML, you will be up for an absolute fortune going forward.

Edited 2007-01-19 05:09

Interesting concept....
by robinh (2.12) on Thu 18th Jan 2007 20:13 UTC
robinh
Member since:
2006-12-19
Fans: 0

Thumbs up to this one for an interesting implementation, and smart use of xslt. What would be really nice is if they could build the necessary OO libs in to a PHP extension - imagine seeing some of the millions of PHP frameworks/CMSs/Blogs/etc start using something like this.

iWork support?
by DevL (4.32) on Thu 18th Jan 2007 22:09 UTC
DevL
Member since:
2005-07-06
Fans: 0

Does it handle the Pages native format?

A better typewriter
by Doc Pain (2.68) on Fri 19th Jan 2007 23:51 UTC
Doc Pain
Member since:
2006-10-08
Fans: 6

All these converters are useless while people keep doing stupid things:

Aunt Joan Q. Average want's to send a video clip to Joe Q. Sixpack. She opens "Word", copies and pastes the video, and clicks on "send via e-mail", which produces some MICROS~1 memory garbage stuff (non RFC conform).

Student Timmybob Dumb is writing a paper for his exam. He uses the bold, italics, and underline functions along with font faces and font size to structure his document. Of course, it does not contain a titlepage and a table of contents. But most of the text is in "Comic MS" because it looks funny. Imagine oh his joy if he would want to restructure his text! Wow, how big the files get, the contents must be good! Furthermore, he draws his formulas with "Paint" and needs a DVD to take his document to his professor.

Chief analyst Chester R. T. Fullbrain got the mentioned video clip from Aunt Joan. He includes the DOC file in a PPT presentation, but because he's clever, he's making a RAR archive out of it and then he embeds the RAR archive in an "Excel" file which he sends to his deskmate.

Don't tell me anything, I've seen it all. :-)

Most people use "Word" as a better typewriter, no matter if they're employed at the ministry of finance or if they're just doing homework for school. They don't know of (and don't care about) simple functionalities like document templates and paragraph formatting. And why? Because they have never heard about. They god a pirated copy of some older "Word" version ("Word '97" or "Word 2000") and are happy with it. (At least, that's a common fact in Germany.)

So you better use catdoc and typeset it properly with LaTeX. :-)

BTW, as it was mentioned before, working converters are a good way to help people migrating to open standard formats. It's not very interesting for home PCs because the documents created there do not need to exist for a longer time; but in corporate settings it might be the right approach. Because if you have migrated to a standard, you can go anywhere with your documents. That's what companies should be interested in, because the IT infrastructure as they know it will not be available for a longer time, so they should decide now where they want to be in the future - with their data.