Servers and systems have outages—it’s a fact of life in IT. Clustering SQL Server instances gives us a measure of protection against these failures. Even in well laid out environments in which web and application servers have redundant backups, the SQL Server system is usually just sitting there as a single point of failure, despite taking the longest time to repair and restore. (Getting a database server set up and the database restored to it from tape will take several hours or days, depending on the speed of the restore and the size of the database.) You can avoid hardware and application downtime by setting up a SQL Server cluster in your environment. In this article, I’ll explain how clustering works and the hardware and software that’s needed to create a SQL Server cluster. In addition, I’ll show you how to create a failover cluster using SQL Server 2005 and Windows Server 2003 Enterprise Edition.

How Clustering Works

When you get right down to it, clustering is a fairly basic concept. A service runs on each node of the cluster and checks to see if the SQL Server service is running on any of the nodes. If the SQL Server service isn’t running on any nodes, then it’s started on one of the nodes. What’s actually happening under the hood during this process is a bit more complex.

Learning Path

SQL SERVER MAGAZINE RESOURCES

For more information about clustering:

"Disk Configuration for Failover Clustering"

"Active/Passive vs. Active/Active Clustering"

"Clustering SQL Server"

When you install SQL Server in a clustered configuration, the services are actually installed on all the nodes in the cluster. (Note that SQL Server must be installed using the clustering option; you can’t change a single-server installation into a clustered installation.) If you go into Administrative Tools, Services on any node, you’ll see all the SQL Server services in manual startup mode. When SQL Server is running in a cluster, the service runs on only one physical node at a time. If you manually move the service from one node to another, you’re actually stopping the service on one machine and restarting it on another machine.

When the machine running SQL Server goes offline (e.g., because of a hardware failure), the passive SQL Server system will detect that the active SQL Server system has gone offline, and take ownership of the SQL Server service, as well as the other resources in the resource group (e.g., IP address, network name, disk drives), and bring the service back online. With SQL Server 2000, this process can take several seconds to several minutes, depending on how much data is in the transaction log to be rolled forward and back. SQL Server 2005 starts up much faster because it rolls forward any completed transactions, brings the database online, and then rolls back any completed transactions.

Hardware and Software Requirements

Clustering isn’t easy to set up, nor is it cheap. You need at least two servers and a shared storage solution. Although the servers don’t need to be identical, configuration is easier if they are. It’s recommended that the servers have at least the same number of CPUs and the same amount of RAM. Technically, it isn’t necessary for the CPU count and RAM to match, but it does make it easier to estimate the load the backup server can take. Using identical servers also gives you the luxury of not having to worry about which server the database is running on because all servers will perform the same.

The storage can’t be the RAID array that you plug into the server. Instead, it must be an array specifically designed to be used in a cluster (i.e., shared storage). This storage is most often handled by a Fibre Channel or iSCSI SAN solution; however, the major server vendors all have SCSI or Serial Attached SCSI (SAS) solutions that can be used as well.

The version of SQL Server and the number of nodes (servers) in your cluster will determine what software you need to purchase. The same rules apply to both 32-bit (x86) and 64-bit (x64 or Itanium) systems. You must be running Windows Server 2008 Enterprise Edition, Windows Server 2003 Enterprise Edition, or Windows 2000 Server Enterprise Edition to cluster. If you want to cluster more than four nodes, Server 2008 Enterprise Edition, which supports up to 16 nodes, or Windows 2003 Enterprise Edition, which supports up to eight nodes, is required. If you’re using a SCSI, SAS, or Serial ATA (SATA) storage array, the maximum number of nodes that are supported is two, no matter which OS or SQL Server edition you’re running. Table 1 shows the number of nodes supported by each version and edition of SQL Server. To create a cluster of more than two nodes, both the OS and SQL Server must be an edition and version that supports being clustered in that configuration (e.g., a three-node SQL Server 2005 cluster requires Win2K Enterprise Edition or later and SQL Server 2000 Enterprise Edition or later).

For Windows 2003 and Win2K clusters, all the servers must have a static IP address and be on the same subnet that the cluster IP address will be on. (Although Server 2008 supports cluster nodes being on different subnets and using DHCP for the IP addresses, I recommend using static IP addresses in the same subnet for simplicity.) In addition, each SQL Server instance will need an IP address and cluster name. These are required so that each SQL Server instance can fail over from one node of the cluster to another node independently of the other SQL Server instances.

Planning the Cluster

Just as with any other database, you’ll want your data files, log files, and tempdb to be on different volumes on the server and in different RAID groups within the SAN. You don’t want the high writes of the tempdb and the logs to get in the way of the reads for the database. When dealing with very large databases, you might find that 26 drive letters just aren’t enough. You might consider using mount points within your drive letter to assign additional drive space and I/O capacity to the drive. For more information about mount points, see the web-exclusive sidebar “Creating Mount Points.” Note that if you’re going to be installing more than one SQL Server instance, make sure to allocate each instance its own drive letter. When clustered, each instance has to have its own drive because each instance is in its own resource group.

Each node will need not only an IP address but also a heartbeat IP address. A heartbeat IP address is typically a different subnet than the IP subnet that your network uses. The heartbeat IP address is a cross-connect cable going between the servers in the cluster. In a cluster with three or more nodes, a network switch will be needed, as all the machines’ heartbeat network adapter cards have to be on the same network. The heartbeat network is the network that the Windows cluster service uses to see if the other machines within the cluster are working correctly. In addition, you’ll need an IP address to cluster the Windows OSs together, including the Quorum drive and any Windows services that need to be clustered. Also, you’ll need to assign an IP address to SQL Server.

You’ll also need to have several names for your cluster. Each node will need its own name, as does the server that holds the Quorum and other clustered Windows services. In addition, each SQL Server instance will need its own host name because each instance has its own IP address, so host names can’t be shared. This situation is where a good naming convention comes in very handy. Using a generic name with an incrementing number works fine for standalone servers, but can be challenging to manage when you start clustering. Extending the naming convention to fit clusters is necessary. For example, name the hosts SQL01A and SQL01B and name the Windows cluster SQL01. This naming convention tells you that SQL01A and SQL01B are the nodes and that SQL01 is the base of the cluster. Name the SQL Server cluster SQL01V01. This name tells you that SQL01V01 is the first server on the Windows cluster. If you need to install a second instance, you would name it SQL01V02, with each instance being a separate server in the SQL01 cluster.

When configuring the cluster, you’ll be prompted for a domain account, which will be used to monitor the cluster. Don’t use your personal domain account; instead, set up a separate account on the domain for this purpose. This account will need administrative rights to all nodes in the cluster, and it will need to be able to log in to SQL Server. This account doesn’t need to be a member of the sysadmin fixed server role; it only needs to be able to connect to SQL Server. If the account doesn’t have rights to log in to the instance, the database will fail back and forth between the nodes until you force the database online and add the login. If you plan to remove the BUILTIN Administrators group from SQL Server as part of your security lockdown, make sure to add this new account as a login to the SQL Server instance that’s removing the BUILTIN\Administrators group.

Finally, you need to preconfigure the Domain Groups for Clustered Services page. These are Windows domain groups that contain the domain accounts that will run the various services. These groups will be given administrative rights within the SQL Server instance, so they should contain only the domain account that the SQL Server instance will run under. Three domain groups are needed; however, if the full text service, SQL Server Agent service, and SQL Server service all run under the same domain account, only one domain group is needed.

Creating and Configuring the Cluster

Now that you’ve planned for your SQL Server cluster, you can configure the cluster and install SQL Server. Open the Cluster Administrator from the Start, Programs, Administrative Tools menu and select Create new cluster from the Action drop-down menu, as shown in Figure 1. Next, select the domain that the servers are located in and enter the cluster name (in this case SQL01) in the Cluster name text box of the New Server Cluster Wizard. On the next screen, enter the username, password, and domain of the account that will be used to monitor the cluster.

If you have more than one disk configured on the server when you configure the cluster, click the Quorum button and make sure the selected disk is set up in the Cluster Configuration Quorum dialog box, which is shown in Figure 2. The quorum disk should be a dedicated disk not used for application data or SQL Server data files. It needs to be only 1GB in size, although larger disks will work just fine. If everything is configured correctly, the following screen will look similar to Web Figure 1. After closing the New Server Cluster Wizard, you can see the Cluster Administrator connected to the cluster, as shown in Figure 3.

Now add the second node of the cluster using the Add Nodes Wizard, which you can access via the Cluster Administrator. Click the File drop-down and select Open. Then select Add nodes to cluster from the Action drop-down menu and enter the cluster name in the Cluster or server name field, as Web Figure 2 shows. You can add one machine or many machines using the New Nodes Wizard. If all the nodes are configured correctly, the task bar will be green. Enter the password for the cluster account on the Cluster Service Account page. Note that if an incorrect password is entered, the wizard will still let you continue to the next screen. Two pages past the Cluster Service Account page is where the password will be verified. Once the cluster has been configured correctly, the task bar will turn green.

Next, create the resource group for the SQL Server instance and add the storage to it. To create a new resource group from within the Cluster Administrator, rightclick Groups, select New, and then Group. To move your storage resources into the resource group, select the group the storage resource is currently in, drag and drop the resource into the new group, and click Yes in the two pop-up windows that appear.

Now you can install SQL Server on the cluster. The primary differences between a standard SQL Server installation and a clustered installation are that you have to select the resource group to install SQL Server into, assign a virtual name and IP address to the SQL Server instance, and provide SQL Server with the Windows domain groups that contain the domain accounts that run the services. When selecting the resource group, the disk resources must be online and there can’t be another SQL Server instance in that resource group. When selecting the location of the binaries, don’t install them on shared storage. The binaries should be installed on local storage. The SQL Server installer will automatically install the binaries on all the SQL Server instances in the cluster. Because the installer does so, the installation will take a little longer than usual.

Proceed through the license and preinstallation requirements. Note that the System Configuration Check screen, shown in Web Figure 3, will take a while to load because it’s checking all the nodes in the cluster. When selecting the services, select the Create a SQL Server failover cluster check box, which is located below the SQL Server Database Services check box, as Figure 4 shows. If you want to have SQL Server Integration Services (SSIS) installed on the cluster, you’ll need to install SSIS manually on each node and cluster it manually.

To install SSIS on the second node, run through the SQL Server installer on that node and select only the SSIS service from the service list page. After SSIS is installed on both nodes, open the Cluster Administrator and connect to the cluster. Right-click the resource group you want to put the SSIS service into, and select New, Resource. Enter SQL Server Integration Services as the resource name and select Generic Service from the Resource type drop-down menu. On the following screen, select the nodes that will run the SSIS service and click Add. On the next screen, select the Cluster Name resource and click Add. (If you’re installing into the same resource group as the SQL Server, the resource name will be SQL Server Name.) Enter MsDtsServer in the Service name text box, leaving the Start parameters text box blank. Leave the Registry Replication list blank and finish the wizard. Then enter the new failover cluster name.

On the Virtual Server Configuration page (shown in Web Figure 4), enter the IP address that SQL Server will be listening on (i.e., the failover cluster’s IP address) and click Add. Make sure that the network adapter card that’s selected is the public network, not the heartbeat network. Select the cluster group and enter the path to the data files and logs, as Web Figure 5 shows. Then select the nodes of the cluster that SQL Server will be installed on, as shown in Web Figure 6. These are the nodes that have the potential to host the SQL Server service. Required node is the node that you’re currently logged in to, and the Available nodes are the other nodes in the cluster.

Next, enter the password for the account that you used to log in to the server. This password is used to authenticate against the other nodes of the cluster so that SQL Server can be installed. Next, enter the username, password, and domain for the account that will be running the SQL Server services. When using a cluster, you can’t run the SQL Server services under the local system account. Instead, you must run them under a domain account. The Domain Groups for Clustered Services dialog box will then prompt you to select three domain groups. You’ll need to enter one or more domain groups. The membership of each group should be the domain account that runs the respective service. The SQL Server installer will then ask you for the systems administrator password, authentication modes, and collation. When the SQL Server installer has finished on both nodes, the installation is complete, and you can connect to SQL Server using the name assigned to the SQL Server clustered instance during installation.

Reducing Downtime

Although clustering SQL Server can be expensive when designed and configured correctly, a SQL Server cluster can provide you with a reliable uptime solution to host your database platform on. This extra uptime lets you rest easy when hardware problems arise, and outages last only a few seconds—instead of minutes—when Windows needs to be patched and rebooted.