What he thinks about the latest SQL Server release, Kilimanjaro, Madison, and Azure
| Executive Summary:|
Dave Campbell gives us his thoughts on SQL Server 2008’s features and upcoming SQL Server-related Microsoft products such as Kilimanjaro, Madison, and the Azure Services Platform. Some other topics he touches on are data management trends, how SQL Server works with Visual Studio and Microsoft Office, and data hubs.
Dave Campbell, a Microsoft technical fellow in the Data and Platform division and SQL Server raconteur, met with Karen Forster and me for a lively discussion about SQL Server—past, present, and future. We quizzed him about SQL Server 2008’s features and SQL Server’s path to the cloud. Campbell’s comments about SQL Services, the upcoming Kilimanjaro release, and the Madison project illuminate changes on the horizon in the way you’ll deal with data.
SQL Server Magazine: Did you know that February is the 10th anniversary of SQL Server Magazine?
Campbell: Pretty exciting! Congratulations.
SQL Server Magazine: Thanks! What about SQL Server 2008 excites you most?
Campbell: I’m excited about the amount of features we were able to get into it in less than three years. The story behind the story is that we completely redesigned the process for how we make SQL Server. We envisioned a world with millions of servers and millions of enterprise database servers. Then we redesigned the product with that in mind, to make it take care of itself and to make it much easier to care for.
SQL Server Magazine: What was your role in this process?
Campbell: In SQL Server 2005, I ran a good chunk of the product development team. I went to Paul Flessner \[then the vice president of Microsoft’s Data and Storage division\] when we were getting ready to ship SQL Server 2005 and said, "Hey we’re no longer chasing tail lights. We’re in the leaders’ pack now, and there are a bunch of things we can do to distinguish ourselves from the others." I put together a team that looked at the market, the needs of our customers and ISVs, how we built the product, and the return on our engineering investment. We prioritized things, such as merge statements, that people had requested for a long time, and got them done.
We noticed that space and time would become integral data types going forward, so we added support for spatial data types in SQL Server 2008. We also noticed that even though SQL Server is easier than a lot of other database products to manage server-per-server, managing thousands of SQL Servers was still a lot of work. So that’s how policy-based management came about. It really reduces the administrative burden if you can define a few classes of service (the mission-critical server, the workgroup server, the tier-2 server, the one that’s under someone’s desk), define policies for those classes, bind those servers to one of those classes, and make sure that they remain in compliance.
SQL Server Magazine: What are the big trends you’re seeing in data management?
Campbell: There are three things to think about. We’re entering a world in which the cost of acquiring data is going to zero. Everything is born digital today. The cost of storing data is approaching media cost. The final thing is ubiquitous connectivity and increase in bandwidth. Those three things together are transformative.
SQL Server Magazine: What impact will lower storage costs have on SQL Server?
Campbell: People need the right data, in the right form, in the right place, at the right time. CIOs and CTOs want to know if we can do it faster and cheaper than the other guys. There are some areas in the data warehouse market where we have not seriously gone . . . yet.
SQL Server Magazine: Are you referring to the Madison project?
Campbell: We see more businesses not wanting to throw data away, so they’re building larger data warehouses. The ability to collect, mine, analyze, and provide information back to the business can be the difference between success and failure. The Madison project, which is the work we’ve done with the DATAllegro acquisition, is going to let us get into the market in the data warehousing space into the hundreds of terabytes. The volume is staggering. I think that as people store more and purge less often, they can analyze things over time and look for historical trends. These trends are the basis by which we do predictive analytics to infer correlations over time and predict into the future.
SQL Server Magazine: How does Kilimanjaro fit into this picture?
Campbell: The other half of the story is how do we take the data and the information in the large data warehouse and get it out to the people who are actually doing the work and making decisions every day. We want to enable both the very large data warehouses and self-service business intelligence (BI). It’s not just enough to produce the reports; it’s about putting information in the end-user’s and information analyst’s hands so they can slice it and dice it, and add information that may only be available from their workgroup. A major theme of the Kilimanjaro release is self service.
SQL Server Magazine: In addition to self service, what are the other elements of Kilimanjaro?
Campbell: In Kilimanjaro, SQL Server will have greater than 64 hardware thread support in the relational engine. Another theme is multi-server management. Consider friction in terms of self service. Can you provision servers easily? For companies that have hundreds or thousands of SQL Servers, the multiserver management aspect of Kilimanjaro will enable the administrator to manage them all as a unit.
SQL Server Magazine: Can you discuss why SQL Server and storage are in the same division at Microsoft?
Campbell: That’s on purpose! Go back to my point that all data is born digital. Things that aren’t properly stored in the database directly still need the accountability and control that you want in a database. The unstructured data world and the structured data world are coming together. There are three motivating forces for this: One is that there are just so many of these unstructured data things. Wouldn’t you like a database to manage all the files on your computer, for example? The second is that unstructured data often has other attributes. For example, attributes of instant message \[IM\] conversations include the person it’s from, the person it’s to, and the date. The third is feature extraction. We can do text indexing of that IM conversation. There’s value in being able to query over it.
One big advance in SQL Server 2008 is the FILESTREAM data type. If you’re building an application that’s going to store blogs, do you put them in the file system and store the name in the database, or do you stick them in the database? Everyone’s got their own opinion. We wanted to just take the debate out of the situation and say that we’re going to use the file system for the best of what it does and the database for what it does. Working with the storage teams allowed us to bring those two distinct worlds together and add more value.
SQL Server Magazine: This discussion brings WinFS to mind.
Campbell: A lot of what we’ve shipped in the last year or two is stuff that was a part of WinFS: The entity framework, the entity data model, FILESTREAM integration, the synchronization framework. It’s just a different way to get it to market sooner. When we decided we weren’t going to put WinFS into the version of Windows that became Vista, we said that this technology is going to go forward. It’ll surface first through SQL Server.
SQL Server Magazine: How does SQL Server work with Microsoft Visual Studio and Microsoft Office?
Campbell: Ten years ago, people made purchasing decisions based on the capabilities of the relational database engine. We no longer think of SQL Server as a database server. We think of it as a data platform with broader horizontal services. For BI, we have extraction, transformation, and loading (ETL) in the box, and we have the ability to connect that with SharePoint. We have the ability to connect with Visual Studio. We’ve been using the Visual Studio shell for the ETL design environment, so there’s a commonality of skills. We did a scenario with Visual Studio 2008 called Occasionally Connected Systems for people who want to build mobility apps and not worry about whether the state of the data is connected or not connected. Synchronization is a key part of the data platform. All of this stuff is in the box.
SQL Server Magazine: What value does LINQ add to SQL Server 2008?
Campbell: I think about how to reduce the friction for building applications. There’s been a huge impedance mismatch between the application logic and the database interface layer. And if you go back 15 to 20 years, whether it was database integration with COBOL or database precompilers, in some ways we were in a better spot than we had been when we went to the call-level interfaces. So think about ODBC: It opened the database up to a larger class of development environments. But for the developer it was a little harder for a while. I think LINQ will do a great job of letting developers express queries in very natural ways that are easy to consume in their programs, whether they’re against the database or whether they’re against in-memory structures or in XML.
SQL Server Magazine: What are your thoughts on the Azure Services Platform and delivering the data platform as a service?
Campbell: Our vision is to extend the data platform to the cloud. The Azure platform is another dimension of self service. So today, in the provisioning of a database, you need to go find the server; you need to go find disk space. Imagine if you could just provision a database endpoint and wherever you had Internet access you could get at that database endpoint. That’s an aspect of self service. Azure is about moving databases from the world of the land (client server) to the world of the Internet. Think of how client server databases liberated data and allowed us to bring it down to peoples’ desktops and PCs. The services world will liberate data beyond the corporate LAN and out to any authorized person with an Internet connection.
SQL Server Magazine: What other work can developers do with the Azure platform?
Campbell: It fascinates me to see the new types of applications that people are building. A lot of people think “I’m not going to move my line-of-business applications to the cloud.” And I agree with them. But I see many scenarios, such as supply-chain, business-to-business, electronic design automation, and product lifecycle management, where corporations can share data, put data up on a data hub, and share data from one device to another.
We announced Project Huron in the SQL Services Labs at the Professional Developers Conference \[PDC\]. Huron lets you take an Access database, publish it into SQL Services, and then let suitably authorized people subscribe to it. It pulls the application down but also synchronizes all of the data. It’s a data hub. Individual clients don’t need to be connected with each other; they just need to be able to connect to the hub. The data is secure and backed up on the hub. This opens up a number of new scenarios for developers. We’ve announced some support for data mining and reporting in SQL Services Labs as well. Data integration services will be another interesting piece: You’ll be able to pull data from multiple places into a hub and then do some BI activity over it.
SQL Server Magazine: What else should readers know about data hubs?
Campbell: With SQL Services you’ll be able to build out the generic data hubs and reduce friction. We’ll be working with the .NET Services team, also a part of the Azure platform, to do the access control for it, so you can take your existing Active Directory \[AD\] information and federate it by projecting it into the cloud. Plus you’ll be able to write authorization policies over the data to securely share it across a set of applications and people.
SQL Server Magazine: How do you think SQL Data Services (SDS) will add value to the platform in the cloud?
Campbell: SDS is just the first of a number of services. With SDS you don’t have to know where the machine is. The SDS model is inherently multi-tenanted. If you’re developing an application, you can focus on that application, and as part of the installation process you can have an endpoint provisioned. You don’t have to worry about how to scale databases to petabytes \[PB\]. We built the abstractions in SDS with multitenancy in mind.
SQL Server Magazine: SDS completely changes the idea of the corporate database. Your database becomes much more abstract than in the past.
Campbell: Today one of the challenges a corporation faces is where to put its databases. A database used to be under someone’s desk. Now imagine that IT can say "I’ll let you provision databases at Internet endpoints." All a developer has to say is "provision me an endpoint for 10GB" or something like that. A lot of big corporations have self-provisioning of SharePoint sites; it’s really easy to do. Imagine in this model that developers don’t have to wait for new hardware. They don’t have to wait to talk to the IT staff to get a virtual machine set up. It’s just a matter filling out a form requesting a database, getting an endpoint, and then developing your app. It will increase agility and reduce friction in that space.
SQL Server Magazine: In an article earlier this year in SQL Server MagazineUPDATE \["Cloud Computing: How Will It Affect Corporate IT?" (InstantDoc ID 99835)\], Brian Moran said that cloud computing is a little scary because there won’t be a need for corporate IT anymore.
Campbell: I don’t think that at all. I think it will open up more opportunity. A lot of people confuse cloud computing with outsourcing your hosting—just take the whole mess and give it to me. I think if cloud computing is done right, people can focus on one thing and do it very well. Then there’s an opportunity for IT to have a higher degree of interdependence and remake the value chain, not just the supply chain.
SQL Server Magazine: So what skill set should DBAs and developers be acquiring?
Campbell: I would think about the data lifecycle from birth to archival. We have to wake up and think of a world in which the cost of data acquisition has gone to zero. The cost of data storage has gone to zero. And bandwidth is a bit of a constraint. Latency will be a constraint.
Just think about it—we’re in a world where you can collect vast amounts of data and process it. If you can detect correlations between things, in order to exploit them for business value, you don’t even have to understand causality; you just know that there’s a correlation. Work with data is just going to continue to change dramatically. More and more I see businesses that can capitalize on that. That’s the difference between success and failure.
The frame of mind your readers need in order to face the future isn’t, "Who’s going to take my database and move it to the cloud?" It’s about imagining you have the platform pieces in place to build a generic synchronized data hub.