Big Data: What Is the True Cost of Your Data?

Data storage based on business needs

Do you ever think about how much your data costs? I wonder how many DBAs and software architects consider the hidden costs of data when cool trends such as big data or fun phases such as "disk is cheap" are bandied about.

I've been in the database field for more than 20 years and for most of that time the projects that I worked on didn't place a great premium on reducing the amount of data that an application stored. In some ways, the amount of data that was practical to store in a typical server that a company could afford tended to limit the amount of data that could be stored. I think that many designs somehow accounted for the amount of data that could be stored without really even thinking about it in a very deliberate way during the design period. But today, when you can buy a laptop from Costco with a 1TB drive for much less than $1,000, I wonder how many companies are thinking big data when perhaps they don't really need to.

Evaluating Disk Space Needs

Before I go much further, I'd like to say, "Please hold the hate mail." I'm not a luddite who doesn't understand the importance of data to a business. I get that modern business-oriented computing architectures are increasingly becoming data centric. I love that. But I do wonder if we might be entering a period in which some businesses are storing more data than their business needs dictate.

I've recently been involved in a system that has a total data size in the 1.5TB range. We recently realized that close to two-thirds of that data is seldom used. I'd like to say that it's never used except I try to avoid saying 'never,' or 'always.' But design characteristics of the system have led to a situation in which the system needs to be built to support very high I/O rates against the full 1.5TB. I don't want to go into specifics, but it hasn't been cheap to achieve this in terms of hardware and services. I can't quite put my finger on it, but somehow I'm just not sure this would have happened in quite the same way five or more years ago when 1.5TB for a small company was a lot of disk space.

Overbuilt Database Systems

Tom LaRock, a SQL Server MVP, has been doing a great series on SQL Azure over the past few months. (Follow LaRock on Twitter: @SQLRockStar.) and several of his blog postings on Azure pricing peaked my interest. He was showing how you could figure out how much the data in your SQL Azure database was costing you per megabyte. I don't think most DBAs of our generation are trained to think this way. Embarrassingly, I certainly haven’t thought this way much over the years. Now, I'm probably more focused on running a business than I am on hard-core technology issues. Perhaps this is why LaRock's blog struck my fancy. As I was reading his posts, it occurred to me that I would have made many different decisions over the years if I had been spending my own money when I was architecting systems rather than accepting the grossly over simplified mantras of "disk is cheap" and "memory is cheap."

Big data is great. But I wonder how many businesses are falling into the trap of building much more than they need? I don't have much research to support this line of thinking; however, I can't help but wonder if this isn't a pervasive problem in today's database systems.

What do you think?

Please or Register to post comments.

IT/Dev Connections

Las Vegas
September 30th - October 4th

Paul ThurottOur Experts will show you:
• Common SQL Server
Problems
• Best Practices for T-SQL
• SQL Server Integration
Services
• Database Development

Come See Michael Otey & Tim Ford in Person!

Early Registration Now Open

From the Blogs
May 21, 2013
blog

A Common Misconception about MAXDOP

Out of the box, SQL Server is (and has been) able to take advantage of multiple processors/cores without any effort on behalf of administrators....More
May 9, 2013
blog

My ISO 8601-Compliant Signature 2

My family recently just "officially" announced that we're in the process of adopting a child from South Africa. We're quite excited, of course, but there's a ton of paperwork to do—along with the need for gobs of signatures....More
May 8, 2013
blog

Use SSIS for ETL from Hadoop

In this blog post, Mark Kromer walks you through using SSIS as a way to use ETL techniques using Microsoft's Hadoop on Windows (HDInsight) as a source using Hive connectors...More
SQL Server Pro Forums

Get answers to questions, share tips, and engage with the SQL Server community in our Forums.