Do you ever think about how much your data costs? I wonder how many DBAs and software architects consider the hidden costs of data when cool trends such as big data or fun phases such as "disk is cheap" are bandied about.
I've been in the database field for more than 20 years and for most of that time the projects that I worked on didn't place a great premium on reducing the amount of data that an application stored. In some ways, the amount of data that was practical to store in a typical server that a company could afford tended to limit the amount of data that could be stored. I think that many designs somehow accounted for the amount of data that could be stored without really even thinking about it in a very deliberate way during the design period. But today, when you can buy a laptop from Costco with a 1TB drive for much less than $1,000, I wonder how many companies are thinking big data when perhaps they don't really need to.
Evaluating Disk Space Needs
Before I go much further, I'd like to say, "Please hold the hate mail." I'm not a luddite who doesn't understand the importance of data to a business. I get that modern business-oriented computing architectures are increasingly becoming data centric. I love that. But I do wonder if we might be entering a period in which some businesses are storing more data than their business needs dictate.
I've recently been involved in a system that has a total data size in the 1.5TB range. We recently realized that close to two-thirds of that data is seldom used. I'd like to say that it's never used except I try to avoid saying 'never,' or 'always.' But design characteristics of the system have led to a situation in which the system needs to be built to support very high I/O rates against the full 1.5TB. I don't want to go into specifics, but it hasn't been cheap to achieve this in terms of hardware and services. I can't quite put my finger on it, but somehow I'm just not sure this would have happened in quite the same way five or more years ago when 1.5TB for a small company was a lot of disk space.
Overbuilt Database Systems
Tom LaRock, a SQL Server MVP, has been doing a great series on SQL Azure over the past few months. (Follow LaRock on Twitter: @SQLRockStar.) and several of his blog postings on Azure pricing peaked my interest. He was showing how you could figure out how much the data in your SQL Azure database was costing you per megabyte. I don't think most DBAs of our generation are trained to think this way. Embarrassingly, I certainly haven’t thought this way much over the years. Now, I'm probably more focused on running a business than I am on hard-core technology issues. Perhaps this is why LaRock's blog struck my fancy. As I was reading his posts, it occurred to me that I would have made many different decisions over the years if I had been spending my own money when I was architecting systems rather than accepting the grossly over simplified mantras of "disk is cheap" and "memory is cheap."
Big data is great. But I wonder how many businesses are falling into the trap of building much more than they need? I don't have much research to support this line of thinking; however, I can't help but wonder if this isn't a pervasive problem in today's database systems.
What do you think?