Since I started working on Twisted Storage way back in 2003 I’ve been spending a lot of time thinking about how we use storage systems. Today we still use the same concepts from 40 or 50 years ago when computers cost millions of dollars, were scare resources and surrounded by the “Priests of High Tech”. But today the economics are all different and yet we haven’t changed our thinking very much.
When disks were big and had small capacities and computers were slow, it wasn’t a surprise to find file systems that recorded just the minimal amount of information – the file name and little else. Today we have disk drives that top out 2TB and cost around $200 I wonder why we have the same file systems? We store 100s of times more than we did in the past and finding things is becoming the real challenge.
You can see things creeping into “file system add-ons” like the open source Gnome Beagle project or the KDE Akonadi software. But I wonder why we need to have them separated? Wouldn’t something with tighter integration work better? After all if it was built into the system when I moved a file from one “directory” to another the references would be automatically updated.
That gets me to another gripe. Directories are so 1960’s! It was fine when disks were small and we wanted to stash stuff away and isolate it from other goodies. Now they serve very little purpose (almost). What I really want to do is search all my files and create a virtual directory with just those objects. I can just imagine not needing an email client: all my emails are sent to a smart storage device where they are indexed and cross referenced. Then I tell the operating system to create a “virtual directory” of all the emails dated today. Now my client is nothing more than a file system browser and when I double click one of the files up pops a simple little notepad like program (which happens to know how to send reply emails)!
This gets me to issue of back up and archive. When disks, tapes and computers were expensive, it made sense to have a “backup” program that took files, lumped them together and wrote them to tape. That tape was unmounted and carried to a tape library. Today we are only slightly better: there is no person dismounting the tape and carrying it somewhere. And if you factor in virtual tape systems one has to wonder why we still use 1950s technology and solutions in the 21st century!
When we write programs we think of individual data records – it is the input that triggers some effect. And the effect is to change another record. So why do we need to aggregate all those little records into one big table and write it out? When it was necessary to maximize all those systems it was the best way to do it. Now a days I think of archiving individual items and never batching up all sorts of data to back it up. It just makes more sense.
The current meaning of backup, as far as I can tell is something you keep for at most a few years. Anything longer is an archive. But that seems totally arbitrary. The word backup should be scrubbed from memory as the 50’s technology it is. Instead archive should be the word used today: any bit of data kept for a day, a week, a month, seven years or one hundred years.
Now if you read all this and understood you should be able to see why Twisted Storage has some of the features it has today. I don’t believe in backup technology. I think storage systems are more than just a place to dump bits.