A SQL Server 2000 approach to global data-warehouse systems
A large US federal government agency required an advanced decision-support system. Because this agency has arguably the most complex network infrastructure in the world, its IT team had to deal with (and enforce) security, bandwidth consumption, and connectivity every day. But the existing system took too long to give users a clear picture of how and where money was being spent. The government agency needed to empower its workforce to make more intelligent business decisions, but to make these decisions, the workers needed more-current information.
The government needed a robust decision-support system that was capable of giving excellent performance to more than 10,000 federal government personnel worldwide. This system would have to receive, evaluate, translate, and publish information in near realtime from 19 different systems and more than 450 locations worldwide. It needed to be flexible, scalable, and capable of publishing gigabytes of data throughout the day while making terabytes of data available to its users. And, the solution needed to return results of ad hoc queries, regardless of user location, in only seconds.
To meet all the requirements, the Teksouth team engineered and built a suite of tightly coupled software components to deliver a robust, scalable, and completely metadata-driven end-to-end data-warehouse solution. The components handle all aspects of any data-warehouse implementation:
- A data dictionary that defines everything from the extraction, transformation and loading (ETL) and business rules to the end user's data representation
- An ETL engine that provides parallel publishing capability
- An automated data-mart publishing and maintenance engine
- Middle-tier software that handles authentication, authorization (role-based security), and load balancing
- An end-user business intelligence (BI) tool
- Web reporting capabilities ranging from executive summary to transaction-level detail
We built these components to allow for flexibility and capitalize on the scale-out architecture that SQL Server 2000 provides. The data warehouse comprises 12 SQL Servers with multiple databases, including both data marts and star schemas. The tables contain from 3 million to 300 million rows. The complete solution receives, evaluates, transforms, and publishes 5GB to 15GB of data to multiple databases within the system throughout the day.
Teksouth's proprietary Intelligent Mid-Tier and Role Server software components evaluate each query request. Role-level security ensures that users see only the specific data elements and rows that they need. Users are allowed to view or access data on a need-to-know basis depending on their job requirements, but no user has direct access to the warehouse. The Role Server works with the Intelligent Mid-Tier software to apply the appropriate constraints to the user's query. The Intelligent Mid-Tier then considers which resources (servers, databases, and tables) are available, builds the appropriate SQL statement, and directs the query to the resource that can provide the fastest answer.
The integrated security roles for handling both the BI tool queries and user-defined custom Web report queries, coupled with detailed audit trails, ensure that this solution's security model meets or exceeds this government agency's security requirements. The Intelligent Mid-Tier returns all results to the Teksouth BI tool and custom Web reports via secure sockets layer (SSL) encryption in compressed format.
After decades of using 30- to 45-day-old data to make decisions, this government agency now has global visibility into financial data that's less than 24 hours old. Visibility into data at both executive and detail levels is available 24x7. The system has scaled up to handle 65 percent more users and 150 percent more queries while returning responses 28 percent faster during each of the past 2 years. Now, more than 10,000 users, querying more than 1TB of data, run an average of 300,000 queries per month, nearly all of which are ad hoc queries. Some 66 percent of all queries return results in 3 seconds or less, and 85 percent return results in 10 seconds or less. During peak usage in September 2004 (at up to 3700 transactions per hour), the system handled more than 600,000 ad hoc queries with an average response time of 19 seconds.