File systems, backup and archives

November 11th, 2009

Since I started working on Twisted Storage way back in 2003 I’ve been spending a lot of time thinking about how we use storage systems.   Today we still use the same concepts from 40 or 50 years ago when computers cost millions of dollars, were scare resources and surrounded by the “Priests of High Tech”.   But today the economics are all different and yet we haven’t changed our thinking very much.

When disks were big and had small capacities and computers were slow, it wasn’t a surprise to find file systems that recorded just the minimal amount of information – the file name and little else.  Today we have disk drives that top out 2TB and cost around $200 I wonder why we have the same file systems? We store 100s of times more than we did in the past and finding things is becoming the real challenge.

You can see things creeping into “file system add-ons” like the open source Gnome Beagle project or the KDE Akonadi software. But I wonder why we need to have them separated? Wouldn’t something with tighter integration work better? After all if it was built into the system when I moved a file from one “directory” to another the references would be automatically updated.

That gets me to another gripe. Directories are so 1960’s! It was fine when disks were small and we wanted to stash stuff away and isolate it from other goodies. Now they serve very little purpose (almost). What I really want to do is search all my files and create a virtual directory with just those objects.  I can just imagine not needing an email client: all my emails are sent to a smart storage device where they are indexed and cross referenced. Then I tell the operating system to create a “virtual directory” of all the emails dated today. Now my client is nothing more than a file system browser and when I double click one of the files up pops a simple little notepad like program (which happens to know how to send reply emails)!

This gets me to issue of back up and archive. When disks, tapes and computers were expensive, it made sense to have a “backup” program that took files, lumped them together and wrote them to tape. That tape was unmounted and carried to a tape library. Today we are only slightly better: there is no person  dismounting the tape and carrying it somewhere. And if you factor in virtual tape systems one has to wonder why we still use 1950s technology and solutions in the 21st century!

When we write programs we think of individual data records – it is the input that triggers some effect. And the effect is to change another record.  So why do we need to aggregate all those little records into one big table and write it out? When it was necessary to maximize all those systems it was the best way to do it.  Now a days I think of archiving individual items and never batching up all sorts of data to back it up.  It just makes more sense.

The current meaning of backup, as far as I can tell is something you keep for at most a few years. Anything longer is an archive.  But that seems totally arbitrary.  The word backup should be scrubbed from memory as the 50’s technology it is.  Instead archive should be the word used today: any bit of data kept for a day, a week, a month, seven years or one hundred years.

Now if you read all this and understood you should be able to see why Twisted Storage has some of the features it has today. I don’t believe in backup technology. I think storage systems are more than just a place to dump bits.

Social Responsibilty

October 28th, 2009

While Twisted Storage will focus on “storage for the 21st century enterprise”  it will never forget that behind every great idea or product are the people that make up Twisted Storage or are our customers. I have always believed in the “Golden Rule”:  one has a right to just treatment, and a responsibility to ensure justice for others.  It has been around since the time of ancient Egypt.  Today it is becoming an anachronism.

Companies treat employees like cogs – to be used and disposed of just like any other waste product.  Customers are collected and treated like milking cows – to be milked for their money.  Even within the corporate environment the rule was tossed aside: the top management made sure they were given big salaries and bonuses while their companies tumbled and their employees suffered layoffs. The companies exist solely to advance their cause of making money and adding very little else to the community of humanity.

Twisted Storage has decided that one of the foundations of the company is to make the Golden Rule core to the operation of the company.   It will affect every aspect of the company  from the products it develops to the companies that become our customers and to the people that make Twisted Storage its work place.

If you are a customer,we will stand behind what we build and deliver. We won’t point fingers, we won’t shrug our shoulders and we won’t make excuses.   We will work with you to decide what the problem is,  how to solve it and do it! If you ever are unsatisfied you can always contact me. We want you to be successful knowing that if you are we will too.

If you are applying for a job, a real person will contact you and become your guide as you navigate the process.  While I won’t guarantee you will be hired I will tell you that if you are ever dissatisfied with the process, I want to hear about it personally!

I’m sure you’ve heard the stories about the companies that fire the bottom 5 or 10% of their employees. If you haven’t the story goes like this: you are hired by a company and if you fall in the bottom 5 or 10% you are fired. This takes place every so often.  Personally I don’t know what idiot came up with that scheme, but it is so stupid it leaves me speechless.  What I will tell you is that if you are an employee we will work with you to make you useful and productive.

Because we are members of the community of humans that occupy this planet, we can’t escape our other responsibilities to those around us that aren’t directly in Twisted Storage or use our products.  We pledge to take 10% of every sale and invest that money in people around us. 5% of the money will go to food pantries and housing projects.  The other 5% will go to environmental and other  non-profits and open source projects. If you are a customer, let us know what charities interests you and we will make sure it is added to our list.

I’m sure by now you are getting an idea that we aren’t just about technology. We are a company of people and a member of the community of humans.  We occupy this earth, and it is the only one we know about. So we want to make sure it is well tendered; after all we are just the guardians of our children’s legacy.

The Dawning of a New Day …

October 24th, 2009

In 2003 I released a piece of software I called LTS, which today is called “Cloud Optimized Storage”. Originally I was trying to implement the functionality of EMC’s Centera, which I did, but I went way past it. I realized that I wanted finer control over where I wanted the data to be stored. LTS was part of an email archiving system, and it held some 2 million email messages at one time!

I also wanted to control how an object was named and how it was protected. After all, I thought why in the heck would I want to take a hash of an object, use it as the name and store it? It made no sense as to a way to protect a document! Didn’t we all learn in college that you don’t want take one thing and make it do something else? So to me a name was just a name and if I wanted to protect an object I ought to protect it.

Being in the network world since before the advent of TCP/IP (now who doesn’t remember UUCP?) I decided that I ought to be able to talk to this device from any where in the world, and if I could put a front end on LTS I could spread them out every where. Hence LTS started to look like what EMC Atmos is today, and it was done in 2003!

Sure I stumbled on some things. Coming from a telecom space I thought Linux HA was great (after all I used it in the design and implementation of Vonage’s first voicemail system). Only when I started to do the numbers – it would take 500 servers, each with 4 disk drives of 500 GB each to store 1 petabyte – did I realize that conventional HA wouldn’t work. So I needed to find some other approach.

I also started to realize that all those 2000 disk drives would act different than the one I had in my house. After all 2000 of anything is a large data set and something will always be failing. Even the 500 servers is a large number and one or more of them would fail. So I needed to deal with that issue too.

Since I had spent so much time in my career dealing with network and server management (I was part of the IETF that worked on specifiying SNMP and part of the Compaq team to put CPQ servers in the enterprise), I started to realize that configuring those 500 servers or 2000 drives was going to be a challenge as well as watching them operate.

All this led me to the 2005/2006 release of Twisted Storage. Well, that and I wanted to do it in Python. Since that initial release I decided that I didn’t quite hit the mark on some of the things I wanted in the system. And I have been working to solve them ever since.

Today I’ve started to release the newest version of Twisted Storage. It is a complete redesign of the 2005 release (in fact there was one between 2005 and today that didn’t see the light of day!). This is the fourth version of the system! It is as different as the 2005 version was from LTS.

Version 4 is based on some very different design goals, and over the next few weeks I will start to discuss them. Here is the list:

  • Content always available
  • No backup, recovery or restore
  • Incremental, linear scalability
  • Policy driven storage
  • Work flow enabled processing
  • Tune-able “knobs” to trade-off durability and performance
  • No special hardware; support old hardware
  • Minimal administrative overhead
  • Totally distributred, loosely coupled, asynchronous design
  • Support external, legacy storage systems and linkage

Today I think I got darn close to meeting all those objectives. I’ve left some of the harder ones off the list. After all you have to have something to work toward. I’ve always wanted a system that could take common sense reasoning and fix itself (doesn’t that sound very HAL like?). In that same vein I want the system to be able to dynamically reconfigure itself – not just take nodes and storage out of use but to actually change how the program operates! Yes, I know really out there, but I love challenges.

Over the next month I will be releasing each of the new components of Twisted Storage. Yes, it is slow but I want to make sure there is documentation in place (I hate projects that assume you are going to read code or look at generated lists of calls! – That is not the way to do it!).

The first part has been released and I call it TSnosql. It is a python implementation of a rather sophisticated key-value system. Check it out and look at the documentation. It is the heart of Twisted Storage’s management configuration system.

Welcome

October 11th, 2009

This blog, while tied into the Twisted Storage project, is not only about Twisted Storage. Instead it is meant to cover the whole area of enterprise storage software.  I hope this site covers the space well.

I’ve been in this business a long time. What I have noticed over the years is that once the computer business was a really diverse group of people. You would find ex-astronomers. You would find ex-business admin types. I once even ran into an ex-roofer. The point of this is that this diverse group would see different solutions to problems. We would all come at it from our own perspective. Why is this important? Because if the prevailing thought is that only data storage people know data storage, we will have more of the same old thing. Sometime it just takes that different view to see an opportunity.

But enough of my “rants”. From now on I will talk about storage solutions, the storage marketplace and trends I see in the market. Hopefully you will find this all useful. I don’t believe any one person can do it all so I invite you to send me information you might find interesting about storage.

In the mean time, all the best.