What stands a better chance of surviving 50 years from now, a framed photograph or a 10-megabyte digital photo file on your computer’s hard drive? The concern for archivists and information scientists is that, with ever-shifting platforms and file formats, much of the data we produce today could eventually fall into a black hole of inaccessibility.
As long as the file specification is open source, the most important file types will be usable in the future. One might consider to store a few virtual machines with legacy applications to open such older files.
Problem solved.
It may be solvable on a personal level if you stick to it, but not in general. Where are you going to find legacy applications in the future? Or even guarantee there is a virtual machine to run them?
Anyway, this is already happening now. I have a few examples, related to messaging. One is a huge archive of instant messages stored by Miranda in one large file nobody understands (last time I checked there was no export feature), or an SMS archive stored on the iPhone (it seems to use sqlite format but that isn’t helping much), or an RTF file generated by yet another software that is only readable on Windows… Time passes and you inevitably give up on that because there’s just no time to go back to installing virtual machines or try to make existing software to convert from the old formats to newer ones. Sadly, he may be right.
try import it using Miranda IM ( miranda-im.org )… then you can export it again to the format you want using plugins…
(It’ll probably run via WINE too if you don’t use Windows.)
I think in about two or three hundred years React OS ‘might’ have made v1.0. We can always use that to view the ‘lost formats’.
Nope.
Open data can still be locked away by:
Disk encryption
Proprietary file systems
Proprietary storage hardware
Proprietary communication protocols
Being stored by some company that then goes under
Yep, but moving to open data formats and standards when archiving and preserving important documents is still a huge step, and among the most important steps forward in solving the problems mentioned. Proprietary, locked and old data formats are likely and by far the biggest problem in the whole issue.
Like the article says, many archives and national governments have moved from prorietary data formats to open data formats for this exact reason.
But, of course, for example, CDs and magntic tapes may still corrupt over time etc.etc.
you’re so right kroc. Even a very recent format is unusable: an archive from WindowsXP’s built-in backup tool for example can’t be used in Vista. If a format from one generation can’t even be used then indeed we’re looking at a black hole.
Isn’t this basically why the ODF project exists? AFAIK it only supports openoffice like apps but is their goal (at least) eventually to include all kinds of data like email, IM’s even backup archives? Or is there perhaps another project with those goals?
Rob
Indeed, that does solve the file format problem, but that’s not the only problem.
Clever idea, but you still sit with a “what medium” issue. For example. Lets say we use VM’s for the legacy applications. And 1000’s of SATA hard drives to store all the data.
Who says we will be able to read data from a SATA drive in 50 years? I have stacks of 5.25″ floppies in my garage. So the data is there, but I don’t have the hardware (10 years down the line) to read that data anymore? We could have a similar problem in 50 years
with SATA hard drives.
The other problem is failing hardware. You store valuable data on a 1 Terabyte drive for future use. The drive goes faulty – you loose an unbelievable about of data in one go. Books and other printed material don’t have that vulnerability.
This is actually a serious issue…
What about fire? Or any other kind of natural disaster? Obviously if your house burns down your drive will burn too, but it’s much easier to back up all of your data onto new drives than it is to “back up” your books and other physical documents.
maybe the problem is solved. Let’s take a file from several generations ago: say a book report originally done on a 5.25 floppy with a Commodore 64 or Apple2c computer.
My old floppies actually still work and are even writable still, so the argument that the medium isn’t trustworthy doesn’t fly here. So let’s assume that I no longer have the machines nor a hard-copy print out to just type it all over again like a scribe back in the day. With this scenario how would a person go about getting that data from the 5.25 floppy and being able to either use it again or at least view it so it can be printed or exported etc?
Off the top of my head I think one would need:
a) a 5.25 floppy drive
b) software to read it which would probably be in PASCAL I think those machines used back then.
c) would an emulator work? I think those only use ROM files so they are kind of “dumb” for lack of a better term.
I think that is a realistic example for this general problem we are discussing.
We should just go back to those, they last pretty much forever.
For really important historical documents, perhaps we should, for the reason you have given. One would imagine that the Flintstones had a laser printer, and that design might be a good place to start for designing modern stone tablet printing technologies. However, I suspect that it may only have had a small dinosaur inside, likely now extinct, chipping the letters out with its teeth.
Actually, the tech was a little more sophisticated than that. It utilized a small but highly trained prehistoric bird using a hammer and chisel.
So the idea could still be feasible.
Yeah, but stone tablets are no use if you cannot read the content written on them… That is the case when trying to interprete some texts written in lost languages and writing systems…
It can be a bit similar problem with proprietary data formats after, say, a century, if those companies are gone and their proprietary formats and software with them to the grave.
To be honest, I think it’s much ado about nothing. If there’s a need to read these formats someone will find a way. I think we can reasonably presume that the internet and it’s archived content will still exist in some way or the other in 20 to 50 years. Beyond that, lets say that technology will probably solve it.
To paraphrase someone else in this thread, it’s not like the vikings sat around wondering “geee, i wonder if this runestone will last thousands of years. And what if noone understands my writing?”
The problem is not the formats but how to store our data for such extended persiods of times. If you can store it you can always store the instructions for the formats with the data.
There are problems we need to worry about NOW and there are those that we dont. This is one of the donts.
Edited 2008-11-10 17:56 UTC
That only tells that you may not be too interested in history?
We would know much less about Viking history if we didn’t have any original Viking documents left. They, for example, wrote about their travels to America, which made many historians interested in researching the subject, and now we know, from those documents and from archeological findings together, that Vikings really did travel to America before Christopher Columbus.
Uh, and? Preserving their writings for thousands of years wasn’t their concern and that’s not why they wrote.
My point was, as long as we have a way to actually store our data for that long the format is of lesser importance.
But why risking data with obfuscated formats when you can adopt now an open file format that guarantees it will always be easily readable with no fuss nor headaches?
Sure, the media were you store your data is more important, but dismissing the file format issue with a “what could possibly go wrong” is still irresponsible.
If you really want to preserve stuff, both sides of the coin are important, of course. Like the other commentator said: “dismissing the file format issue with a “what could possibly go wrong” is irresponsible“.
Probably? But how? That is what the article tries to ask.
You can, but is it always possible? If you use open standards and formats, that is easy. If you use proprietary closed formats, it can get much trickier, especially if companies don’t want to fully open those instructions and details of their proprietary formats for others to use too.
That’s mostly nonsense, again telling only that you seem to know and care little for history. This goes way off the original topic, but anyway:
In old and ancient times, including the Viking times, when writing was not so everyday thing as today, it was often very much the idea that the few written documents of those times, like chronicles of history or books containing religious myths, were meant to be carefully preserved and/or copied, for decades and centuries. If the Viking people were not interested in preserving their sagas for many future generations, also in written form, we would probably not have them left to read anymore nowadays.
People shouldn’t be shortsighted and undervalue the importance of preserving historical documents. Most of our culture (& a note to narrow one-track tech geeks: including also technology) is based on history and historical achievements. Because of historical documents we don’t have to trust some false claims about important historical events. Such things may often matter for political decision making too, for example. Actually we are human beings largely because of our culture and history.
Unfortunately we don’t.
I don’t know about Vikings. But ancient Egyptians thought a great deal about this kind of thing. At least about the medium. Although they had a goal of transmitting certain information and stories on forever, I’m not sure that they appreciated the format problem. Egyptians, living in a time in which change was quite gradual by our standards, tended to think of their world and culture as being ever-unchanging. They expected that in a thousand years their descendants would live in the same world as they did. That people would come and go, but that the world they knew would remain forever.
The fabric of our world, in contrast, is ever changing. What world will we leave behind when our days our ended? And what kinds of worlds might follow? One of those possible worlds is post nuclear holocaust. Another is post asteroid strike. Another is post-plague or post bio-weapon. All that we take for granted is very fragile, indeed. It is possible… and perhaps even likely that the human race may have to pull itself up by its boot-straps one day. And likely, if that were to happen, it would happen with little to no warning. If we care about those future generations (and that is not a rhetorical question) then we should remain prepared at all times to leave the human race the best chance to be able to recover knowledge quickly. This involves every one of the many links in that chain that allows us to bring up Google, type in a term, hit “I’m feeling lucky” and read/play/render/hear the information we want.
The point of all this? I guess that this is potentially about more than whether we’ll be able to listen to the Eurythmics or view our family photos 15 years hence. This could potentially affect how long the human race remains trapped, suffering, in the next dark age. The human race is yet too shortsighted for us to dismiss that scenario out of hand.
Edited 2008-11-11 01:13 UTC
You’re not the first person to think of that…
http://kk.org/kk/2008/08/very-longterm-backup.php
… as ‘inaccessible’ demonstrates a terrible misunderstanding of the workings of a black hole
This can be a problem in short time like 20-30 years but not in long. If in 200-300 years some archaeologist will like to read any data from now (of course if the medium like disc or CD survives) it will be not problem at all. It does not matter if specification for this file is open or completely lost the power of computers will be so big what they will “brake” the code in minutes.
It is like with Egyptians hieroglyphs. Nobody was using this language for thousand of years but we are able to read it now.
The XX century will be a “dark age”. Just in the beginning of the century we changed technology of making paper, and it is 100% sure what all books printed in XX century (and now) will completely fall apart after 100-200 years.
thanks to french soldiers finding a stone with engravings of the same text in two egyption variants and one greek…
You ever hear of the Rosetta stone? The only reason we are able to read that language is because we found a common story in the language that helped us decipher it. Otherwise we still wouldnt be able to read it. Information loss is a real and long existing problem. The Romans had running water and roads the likes of which didnt get built again until well past 1800.
You are totally of the mark. No language can be translated simply by raw processing power. This is why Navajo speakers were used for top secret radio communications in WW2. The Welsh Guards also communicated by radio in Welsh in Bosnia because they knew that the Serbs couldn’t understand Welsh.
The Egyptian heiroglyphs were only translated because they discovered the Rosetta Stone in the early 1800s. This carried exactly the same passage written in three different languages (two which were well known) including the heiroglyphs. It was then realised that the heiroglyphs were a written form of Coptic a language still spoken widely in Egypt. This eventually made further translations possible.
I really do believe that the digital storage issue is going to be a big problem in years to come, especially for the majority of the computer and digital camera owning public who arn’t as proactive as us.
The biggest problem I see is redundancy in storage. I mean ffs – I backed up all my 30GB of photos from one PC to another over the weekend, and the source PC’s hard drive decided to die whilst copy!
Lucky I mirror that particular directory on a *third* hard drive on another PC I have.
I see JPEG being able to be read many many many years from now…. Reading ancient formats won’t be a problem…. Getting the digital data to even LAST THAT LONG is where the issue will be.
IMHO the safest photo is one which has been printed to paper! By the time the oceans rise due to global warming, flood my garage and destroy my prized lifetime of photos – I probably would have had a few dozen hard drives die on me 🙂
I seriously doubt people who have taken digital photos and kept them on the normal means (their PC), will have them in 50 years time – unlike my parents and grandparents who still have theirs.
yep, i would love to see a hardrive or ssd consist of a collection of smaller parts that one could disassemble and replace without the need for a clean room and identical parts.
thats why im of two minds when it comes to using hardrives as archive media.
they eat up a whole lot of data if they die, and there is basically no chance of recovery save for a expensive specialist company.
with optical media there is more limited space, but you have the reader and the media itself separated, so if the reader dies you can just replace that without risk to the media itself.
and being as cheap as it is, having two copies of optical media is less of a problem then having two copies of a hardrive.
[quote]and being as cheap as it is, having two copies of optical media is less of a problem then having two copies of a hardrive.[/quote]
Yeah, but it’s also a difference if you only need 2x 1 TB of hard drive space, or if you need a few hundred CDs / DVDs
So in that case, I’d rather buy two 1 TB drives to save all my stuff twice instead of burning it to lots and lots of CDs which all last for a few years and then have to be copied again.
imo, time to triage the data then. find out what one really care about and whats “junk” (program related files and so on).
if one where to store that amount of data as physical paper, one would be running out of living space quickly.
but as one can just slap it all onto a device the size of a large book, and stuff it in the back of the closet, one no longer consider the value or waste of it.
You are pretty much on the spot. Most people do not backup, or back up adequately.
Only 30GB? I probably have 30GB in RAW files alone, I shudder at how much storage for my 16 bit TIFF files and full size low compression jpeg files…lol!!!
Formats won’t really be an issue I think, it’s going to be more along the lines of storage medium, and storage medium longevity.
Dave
I remember when we were told that CD’s would last for all our lives, and well, i rescued some of the first CD’s i burned 10 years ago, and the had become almost transparent, and obviously, they couldn’t be read and i had to throw them away. And i was a bit angry because they contained some music i download via dialup i wanted to listen back
I feel hard disks are much safer, but i don’t know if they will last longer than 10 years or so.
Pressed CDs are supposed to last a hundred years or so, but of course they haven’t been around long enough for us to know their actual usable lifespan.
Edited 2008-11-10 12:18 UTC
you are not up to date on this
now it’s known that after 20 years the polycarbonate has degenerated enough (microcracks and so on) that you have to rely on errorcorrection to read a disk in perfect condition
but thats with the large structures in cds
so how long will a dvd last? 10 years?
a bd? 5 years?
we will see…
the problem i see with hardrives is that if the read heads or control electronics dies, there is just no way to take the plates out, pop them into a new case and power back up.
with a optical disk you do not have any dedicated read hardware. any optical drive that can read the format will do.
now the survivability of the disks themselves, thats a whole different story…
…punch cards.
One word: Termites.
Three words:
Carbon-fibre punch cards.
For text based data I always keep in mind RTF files. I have some papers from 20 years ago that I can open in OpenOffice. I like XML files for data also since you can always create a script to parse XML elements regardless of whether a utility exists. Binary data is where you can get in trouble. I have an old, old photoshop version 1 file that has trouble opening. Using open standards like PNG or ODF at least allows you to write an application to read your files. MS Word 97 format is so convoluted and difficult to read that its only benefit is that Office is still around to read it.
Three letters: ZFS
32 words (or 33 if you untangle the contraction):
And how is a file system supposed to solve ANY of the problems discussed…? ZFS doesn’t magically make old, proprietary, undocumented file formats readable, nor does it make worn out media readable.
If you store your data in multiple hard-drives with a suitable redundancy scheme, the risk of losing data due to media corruption is dramatically reduced. One straightforward way of doing this is, I think, RAID-Z on ZFS. My comment was in response to worries about media corruption (as in termites eating punch cards). I was not adressing other concerns, like the use of proprietary formats and protocols.
6 words: Genetically engineered termites with titanium teeth.
http://it.wikipedia.org/wiki/PDF/A
Also, freeing the int4rw3b of some TBs of blog posts wouldn’t really make me cry
I think this is probably going to be solved in one way with the rise of cloud services, as more and more of our data will be stored remotely on larger servers.
However this is a worry, is there any digital medium which can withstand the effects of ageing. HDD’s are incredibly violitile for archive purposes, being effected by magnetic waves and simply just decay.
CD’s also loose there capabilities after only a few years. This can of course be lengthened if the CD’s are kept in a dark stable environment.
And as mentioned by the article there is the big problem of legacy software. Concepts such as Adobe’s DNG (digitial negative) is a great idea but ive found very few manufacturers willing to adopt this format, thus leaving you to do a manual conversion. This of course leads us to the RAW format which has a 1001 variations depending on manufacturer and camera. It is quite worrying.
So far the only formats i can see sticking around for a while longer is PDF, ODF, DOC/XLS/PPT (ive not included the new Office 2007 as i still think these have to be proven in the work space, so currently we are left with millions of MS Office document files which of course suffer compatability problems between versions, i.e. tables misalign every now and again), JPG and BMP. Of course there are many others but these are the first to come to mind.
I suppose we will just have to wait for a really reliable method of data storage and archive. In the mean time i use the method of scattering data over a variety of devices and media. So for my photo’s i have them stored on my main server, everytime a set of photos are uploaded, i back them up to CDR/DVDR and then take the backup to a remote disk based server.
There is no ‘the RAW format’. Raw formats are a byte-for-byte dump of how the chips in a camera record the image data, so of course the format will vary from chip to chip.
Edited 2008-11-10 12:06 UTC
I don’t think that data will get inaccessible. Recently I bought a transfer cable which allows me to transfer data from my C64 to my PC. So I can still reuse everything that I have created a long time ago on that machine. It may be entirely different formats, but I can still use the old software to make it readable. And I could also write some little converter who converts that data into something modern.
As long as there are enough hobbyists who create cables or solutions like that, everything will be completely fine. There are converters, adapters, cables, emulators, and with that, everything should be recoverable.
And if it’s done soon enough, it even solves the problem of unreliable storage media.
The problem is active maintainance of data. Keeping data I consider important around isn’t too hard, the problem is stuff I consider unimportant. If I chuck a couple of CDs with photos on them in a box and forget about them, will my grandchildern be able to view those photos in 70 years time when they find that box in my attic, much like I can view photos I found taken of my grandparents when they where kids? Stuff that I don’t consider worth backing up may be of great historic value to people a couple of generations down the line.
Well okay, that’s an argument, and in that case it’s of course the “unreliable media” problem. But if the CDs are still in a good physical condition, it wouldn’t be a problem to convert the data into the “JPG of the 2070s” format or something
But I agree, in that case unreliable media would be a problem. Maybe, though, in 70 years, there might exist some good CD drives that are ten times better at recovering data (if it’s considered a common problem then).
By that I mean that even if devices will no longer be available in working condition, or software that can natively handle some archive formats won’t be at hand, if the specifications and plans of those devices, the specifications of those formats will be available then a new device can be built to support the old technologu anytime in the future. And yes, you’re right to sense an error in that sentence if you’re thinking ok, but how will we store those specs ? I’d say we need to have something established to address that issue, but still, it’s easier to store the specs for a machine than the machines, or to store the specs of a format than store a proprietary app which handles the data.
Edited 2008-11-10 12:39 UTC
…another ‘prophecy of doom’. Do you think Egyptians sat around going:
“Damn, you think this papyrus will last long?”
I have not had a problem in around 15 years because when new releases of things come up, you import into a new file format. Seriously, thus far from Aldus Freehand files to Adobe Illustrator CS…not a problem.
This is another geek need to gather things, have manuals, books..blah blah..cling on to that MS-DOS floppy for no reason.
If it doesn’t fit in my laptop bag, I don’t need it. y books are on PDF, Music on MP3s and movies on .AVI or .mp4. Now I have to relocate for work and guess what….it will be soooo easy! Once a new ‘hot, geek-ass’ file format comes along and it becomes universal…i will convert it and move on with other, cooler aspects of life
I think you are considering the issue too much from your personal point of view although it is not what the subject is about. Did you read the article and not only the teaser? The article is not about people’s personal information needs they may have at home.
The problem is in big archives (like a national history archive) and in accessing historical resourses for research after a few decades. More and more documents and archives are moved into electronic format. What will happen if those original materials are not usable anymore after, say, 50 years? The article mentions several examples too, like:
These kind of problems can be quite serious for future generations – the more we move information to electroni formats, and especially if those formats are proprietary and can be accesses only with some commercial software that may not even be available anymore.
Many societies deliberately planned tombs and temples to last for an eternity. They knew that texts carved in stone or rock paintings protected from the elements would remain intact for future generations. They anticipated that their societies would be ongoing for ever and wanted to maintain records.
free software on the data rescue!
Kodachrome slide (and movie) film is extremely archival. Slides taken in the 1930s and 1940s look like they could have been taken yesterday, they’re in such good shape. Thus if someone has important digital photos, they could “write” them to the slides. The one thing, though, is that Kodachrome is made in extremely limited quantities these days and is only processed by one remaining lab, so the ability to save our photos may be limited.
NASA found out the hard way when it came to data from the Apollo missions.
They have tapes in a format that could no longer be read. Apparently a computing museum in Australia happens to have a tape drive the can read these tapes.
http://www.abc.net.au/news/stories/2008/11/10/2415393.htm
NASA lucked out this time, they may not be so fortunate in the future.
iirc, there is a similar issue with old stasi computer files. as the computers where soviet made, and the file formats likewise, nobody knows the correct encodings for the bits…
I have some 1994 Microsoft Word docs that are so old they wont open in Word 2003.
would openoffice or abiword work?
Come on!
Such kind of apocalyptic appreciations simply suck! Filling our minds with FUD to the progress, to the new technologies and to the science is an old “tradition” running since the middle age!
The world will still turn and all the information available to the masses will continue in a better, improved way!
What if a file format will not be readible? I think emulators and virtual machines will keep the legacy stuff available…
What if my digital library will be corrupted? I think the hard-printed books will be always available…
Come on! Did you even glimpse the actual article, or only its title? The article isn’t painting any sorts apocalyptic scenarios, not to mention opposing technical progress (where on earth did you get that idea??) but simply trying to find new smart technical solutions to a real existing problem. The article also mentions several real life examples of the problem.
Oh, and the article isn’t talking about your digital library, nor about your neigbours’ digital libraries, but archives and problems on a much bigger level, like national archives, university research etc.
Edited 2008-11-10 16:27 UTC
Do you really think the stuff related in the article are actually problems?
Ok, probably I will not be able to find the e-mail I received ten years ago, but… who cares?
The sensible information is stored in huge databases and such kind of information is what really matters… The article mentions some storage problems found in the real life… but… what about the huge amount of information stored in YouTube, Google or Wikipedia? They store lots and lots and lots of information and they do not seem to have such problem.
And about legacy software… the VMs will have a lot of utility there.
Yes. maybe not to you, but to many others, yes. If you worked at a big archive with lots of different kind of electronic documents in various kinds of formats, maybe you would understand better.
Not now, but what if you would want to see some important YouTube video, related to, say, Obama’s election campaign, after a hundred of years? Would it still be available anywhere? Could your software read it?
Wikipedia is just plain HTML with images. Thus it is based on open standards and data formats easily readable by almost any software. Google is mostly only a search engine, so not so much related to the issues discussed here. YouTube uses Flash, partly a proprietary format, causing a bit bigger problem, although many other programs than just Adobe’s own Flash tools can read Flash files too, which makes the problem smaller.
A VM itself understand a proprietary data formats. besides, it should be easier for us common people to read older electronic documents too. Probably everyone of us reading OSnews have had irritating problems trying to read some documents saved in a different program or a version of a program than what we have and making reading the file practically impossible.
Now, how many different old versions of Word / Works / WordPerfect should and could a big archive have, not to mention all the other dozens of different data formats and programs? If all that data would have been saved using open document formats (like HTML) that other programs could read and support, it would help a lot.
So, some kind of “open standard pdf-like file format” could be a partial solution to this problem?
a typo; I meant:
A VM itself cannot understand proprietary data formats.
How do you get a digital photo to last 500 years?
Put it on a chan.. it’ll remain in the cache of millions..for all time.
Edited 2008-11-10 16:09 UTC
Something that isn’t directly related to this article, but is certainly related to digital archiving.
Printed pages are immuteable. You cannot change them and if you try it is obvious that editing has been done. Can we, in good conscience, trust what will be a digital representation of things such as world history? How do we know that it will not be edited to suit the current time it is being viewed in? Editing electronic data can be done seemlessly and simply. This is what concerns me about digital archiving, rather than the unreliability of the media and/or file formats. Those have solutions. But the all-too-human temptation to create revisionist history has no solution to prevent it. In many ways we already have revisionist history, even with printed materials. It’s just more effort. How easy will this be when everything’s digital?
Never heard of forgery?
Of course I have heard of forgery. But forgery often takes a decent amount of effort, at least in situations where official documents have been watermarked or have other means of being positively identified. Further, forgery involves creating a look-alike sufficiently good enough to fool someone into thinking it is the original. Unless every copy of the original is destroyed, then the forgery can eventually be revealed as such or at the very least as questionable. It often involves a decent amount of effort–not as much as it used to, perhaps, but still a bit more difficult than going into a document and saving your changes. This depends, of course, on what is being copied.
This is not the case for electronic data. One change in the right file, on the right server or disk, is all it can take to propagate the change to the entire set of mirrors, and all backups from that point on–indeed, this is one of many reasons for regular backups. But not all backups are kept forever, and as has already been pointed out many times here, no electronic media will last forever at least none we have currently.
SIMH while an excellent project, (http://simh.trailing-edge.com/) reminds me of the fact who will know how to OPERATE these ancient things….?
Does the world care about TOPS-20, how to log onto it, or anything related to it… I mean sure, it was the first machine onto the internet, but are such things preserved?
32v, and 4BSD while critical & important OS’s have only been able to run under emulation in the last few years… And sure there is information available *now* what about 10 years from now? 100? It’s a shame that more isn’t done now to save/preserve the old stuff. It’s all a mess, much like OS/2 & AmigaDOS.. The lawyers have made for certain that it’ll never be released, and just lost to the winds of time.
An article like this has been posted on osnews before. I canot find it right now, but it should be around here somewhere.
But let’s try it. Write a document in MS-Word 2.0 and try to open it with a modern word processor. I suppose this is already happening. Data becomes unusable. And in a 100 years scientist will be analysing data bit by bit trying to figure out what it meant.
Not only the data, but also the data carrier. Remember the Commodore 64. The way it formatted it’s floppy disks. Incompatible with PC floppy disk controllers. Some newer machines don’t even have sush a controller anymore.
Welcome to the new dark age. The present won’t exist in the future.
It should be a legal requirement that all data is stored in fully documented open formats. All formats should also be freely available as ISO or IEEE Standards. There should be no exceptions allowed at all.
All existing formats and software code including documentation should also pass into the public domain 20 years after being released. This can be achieved by all software being made available at the time of release to an archive such as the Library of Congress. The software (including all source code and documentation) would then be automatically made public domain either after 20 years or immediately when official support ceases.
In Australia a copy of all physical publications (books, magazines, videos, newspapers etc) must be provided free of charge to the National Library of Australia for archival purposes.
Arg, back again with the stupidity…
First and foremost the loss of data accessibility will never occur without something very catastrophic occurring.
You see, for every file format ever devised, closed or open source, that is “extinct” there is some non-legacy piece of software to open/convert that data to something usable today – so long as the file format would hold data important enough to care about.
For instance, what was the first image file format?
Well, scouring the web I can’t find that one out, so I don’t know, but it is probably TGA or RAW ( which TGA basically is ).
Viewing that data is simple today, even if a program doesn’t support opening those file types, it is likely the approximate data they use in-memory ( the raw pixel data ) to display images.
Text is stored in ASCII, so that should never be lost, most file formats will never die so long as someone needs the data contained therein and haven’t yet converted that data.
So, let us journey into a the worst-case situation where a closed-source 50-year extinct formatted file is discovered in the old digital archives containing some data you need ( or want to investigate ).
Now, you have to consider what may or may not be in the file, if you know your job is a lot easier. If you don’t, then you will have more troubles.
If you don’t know what is in the file ( the type of data ), then you need to look for clues. You need to check for all known compression techniques, look for the tale-tale signs of encryption algorithms, and look for any standard markers in the file.
Now, this being 50 years in the future, the machine working for a solution can test thousands of solutions at once, following various possible solution paths ( quantum CPU, AI Algorithms ). There are a finite number of possible tests & solutions. Regardless of the file format or how much we know about it.
So, all you do is grab the data & tell the computer to find out what it is and how to read it, and in 15 ms or less, you have the data. Possibly 60ms for very complex files with large amounts data – I think I can tolerate the wait 😉 Gotta love them qubits!
Of course, someone will need to write the software, but by this time we will not be writing programs the way were are today – by any means. In fact, the software will begin to write itself with just a few instructions as to what you want it to do, this is the future role of the operating system. The best AI wins.
Custom software will very much remain due to market forces & technical reasons, they will just evolve into problem declarations with some of the possible solution paths plotted, allowing the AI to find new paths and optimize everything until the optimized version becomes the “software” to execute. ( You should already know about how qubits works and what they are, if not, what are you doing here? Google it! ).
Of course, that is the worse-case scenario. Here is the likely scenario:
Most data not needed after a jump in format will likely never be needed again, and will likely hold no value in 50 years. No loss if the data is lost.
Most data formats that have enough value that it should be preserved will be migrated into the future formats by those who care to keep their data in that format, otherwise they will just upgrade the data to a newer format ( which plenty of software does automatically today already ).
If a file format is lost completely, and your very interested in the file named “mysecrets.???,” then all you need to do is what I already mentioned. There is already software out there which will check files to see what kind of data they may contain, this will evolve, naturally.
Of course, there are formats out there that make very little sense. But if the data is needed or desired strongly enough, there will be a way to get that data. Period.
Afterall, in 10 years, the 256-bit encryption algorithms which are currently unbreakable by any computer in existence, will be easy to decipher should one try. 10 years should mark the actual deployment of truly usable quantum mechanical processors. If it doesn’t happen sooner. Self assembling computers are on the way. Can you even imagine taking a block the size of a deck of cards, putting it into a “microwave” and pulling out a laptop a couple hours ( or less ) later?
This is becoming reality fast, people!
And yet, somehow, people still think that we will care about the file formats of yore ( or think of computers/advanced tech as anything other than just extensions of ourselves ) or that, somehow, 50 years of progress will make it impossible to decode data if you don’t know what the file format is – we can do that today with enough processing power. All data is organized within rather simple paradigms because they are human-created and must be somewhat comprehendable. Even the most advanced encryption routines will be completely crackable.
We will eventually hit a point in our progress where that even if the currently most advanced encryption routine, started today on the world’s fastest super-computer, encrypting an 8GB HD movie, then encrypting the output in an endless cycle, growing the file size to beyond the peta-byte range, the then-current hardware ( in the future ) will be able to do the job in a reasonable amount of time, such as overnight.
So why all the worry?
–The loon
Much of your proposal is still scifi. We need reliable methods to preserve data now and cannot count on some expensive and maybe time consuming future technology only. What realiable technology do we have now to help solving the situation.
Did you also consider the problem of corrupting media? Also hard drives, CDs, DVDs etc. can get corrupt while paper often does not not.
In an archive, customers expect to get the information quickly. A reply that a super computer and AI might be able to help them a few years from now will not make them happy.
Anyone who archives data on corruptable mediums deserves their data loss.
Otherwise, data stored in archives will generally be migrated to storage devices as the systems receive upgrades, provided that data is important.
The only data to be lost is data that no longer serves any purpose – so where is the disaster?
Yes, CDs/VHS/Beta/8-Tracks etc all degrade, but it is still possible today to take an 8-track and convert it to a digital format.
The only things lost are those which are not worth the investment to recover.
Indeed, even the oldest formats can be read today by one means or another, which isn’t to say everyone is capable of doing it.
BTW.. quantum computing isn’t SciFi, nor is quantum storage or information transmission. Sure, it is mostly done within labs, but there are commercial products already available, and more being developed. Ten years will see some of this entering into the highest end markets.
Now, as far as keeping data today, you have to keep backups & keep them up to date. Data lost to stupidity is NOT due to format issues, it is due to stupidity – which the debate is not about.
So, if any data is lost, it is no real loss. Seriously, find me one extinct format that holds valuable data which cannot be, in some manner, recovered today. Of course, this doesn’t address degrading mediums. But if that data hasn’t been needed in so long that it has degraded on its medium, then the data probably has no real value.
If the data has value, and is being stored on a degradable medium it is the lack of intelligence that causes the data loss, not the medium on which it is stored, and not the format. If the data is degrading, and humans know it is degrading, and no one is willing to invest the effort to recover the data, then the data mustn’t be worth anything – so no loss.
The only thing that would cause the problems in the debate would be a global nuclear war – and then we have larger problems.
Mind you, I’m no stranger to losing irreplaceable – and valuable – data. And I still haven’t learned my lesson, I have AT LEAST 250GB of irreplaceable data that is in no way backed up. Losing it would be the result of my own stupidity, nothing more.
It is up to me to take care of my data, after all, as it is with everyone.
Indeed, if we had a universal storage cloud, the problem scenario would be of concern, but we don’t. We will lose small bits of useless data – even if that data once had value, it no longer does. If it did, the storage medium would be sent to a data recover center to retain that value ( something I couldn’t afford ).
–The loon
Have you visited a big national library or archive? The person interviewed in the article is talking about such environments: endless shelfs containing not only books and other paper documents but also photos, films, recordings, tapes, CDs, CD-ROMs, DVDs, saved data in various kinds of databases etc.etc.
Do you have any clue how much money and man hours it would cost to convert all that stuff located at a big national library into other newer formats? Besides, archives and libraries also aim at preserving stuff in their original format if possible. The original media is important too, not only the content.
It would be therefore ideal if – from the start when the information, like a movie, is first saved to some media – the formats and media used would be as durable and open as possible in order to allow also future generations to use those same documents and media as easily as possible.
If the desire is to preserve the original format in addition to the data, then that work should have started a long time ago.
Otherwise, a simple plan for data safety can be pursued.
Items must be categorized by the vulnerability of their storage medium versus their value. Priorities are made.
Then, a list of requirements for safer, more future-proof & resilient, archiving must be generated. What formats must be read, how would those formats be transferred, and finally how to maintain data integrity during the transfer. At this point, you would likely have the equipment needed to make an original-medium duplicate, then work on protecting and preserving the original.
The originals will be lost, little doubt about that.
There will be data loss, and already has been. Oh well, “we ain’t perfect” 🙂
However, none of that changes the fact that most data of significant value has been migrated from one medium to the next with little to no degradation.
Sure, we will lose (& have lost) some data which holds (mostly sentimental) value, such as the earliest movies & photographs. Paintings of such glory that it changes the onlooker, music so horrid it makes you commit suicide on the third measure, and so on.
I’m not saddened by this, and I do not consider it dire. It could present problems in 100+ years for historians when they are trying to find out who really killed JFK, and what was all this UFO crap? But I suspect the possibly up-and-coming nuclear holocaust could be event more devastating.
Indeed, a good solar flare could wipe out all magnetically stored data on the planet in a single swipe. That is a more grave concern – protecting what we have now. Solid state storage will go a long way to helping, but there seem to be limits with that paradigm which can only be overcome with quantum storage devices.
So the next step in storage is solid state (MLC SSD), which will undergo revision upon revision. But we have already made inroads into quantum storage, and that will ultimately take place of everything.
Indeed, we already can send data from one point to another without interacting with a single point in between. After-all, it is actually possible to be in two (or more) places at once. Data will be stored as a universe state, within the fabric of the universe itself.
Then we have to figure out what will destroy the data then. What will cause those states to change without us saying so?
D*mn, time goes too slow.
–The loon
But your personal data and personal needs at home have not much if anything to do with the article subject here. The article is not really about your or my personal data saved at our homes. We may very well live happily – or maybe even happier – even after our kid destroys our precious Britney Spears MP3 collection while playing Quake on our PC.
The article is really about big archives and libraries on a national, and even bigger, level and scale, and about the interests of them and their customers. Such institutions couldn’t afford to lose access to a lot of their documents from recent decades just because of media corruption and/or old locked proprietary data formats not supported anymore. Converting all their stuff into other data formats is usually out of question too; and actually it might be even illegal if we are talking about locked and encrypted commercial multimedia formats.
They, at national archives and libraries, at least have to carefully consider in advance what they can do to prevent possible problems related to corrupting media and proprietary data formats not supported any more. That is the whole point of the article.
Edited 2008-11-11 20:11 UTC
I include those interests in my assessment.
Fact is that those archives will be brought to modern mediums as storage requirements increase. The only problem is when all hens are in one basket, and that one basket is destroyed.
–The loon
As someone who has 24 years of documents among a half dozen OSes over the years, it’s too expensive to continue converting data to newer formats. ODF is a lifeline in this case. But as far back as 1992, I knew this would be a problem, and have since either written or saved every document in simple text format.
Photo formats I don’t bother with. But as others have said, secure your own work by using ONLY open formats built with Open Standards.
In yesterdays ‘Australian’ newspaper there was an article on recovery data from some old Apollo 11 mission data tapes. It requires a special tape reader of which only one still exists. The tape reader needs to be rebuilt before transcription can occur. This data is only 39 years old
Here’s a perfect example of this:
http://www.abc.net.au/news/stories/2008/11/10/2415393.htm
Well, it’s one of the reasons.
(But I do use patent-infected software. All the time. I wouldn’t be able to do what I do without them. Sad but true.)
“McDonough cites Brazil, the Netherlands and Norway as examples of countries that have mandated the use of non-proprietary file formats for government business.”
Our goverment has passed a law. However… the only place I’ve ever seen an OpenOffice file in use was in a private school. (But I guess change takes time…)
Edited 2008-11-12 18:40 UTC