Few things in the world of the DBA create more anxiety than clustering a SQL Server, partly because clustering was difficult in SQL Server 7.0 and earlier releases. In SQL Server 2000, however, clustering is less intimidating. The six steps I outline in this article form a basic framework that you can follow to set up a clustered environment for SQL Server 2000.
Failover clustering is the most effective way to achieve high availability in a SQL Server environment. When you cluster any Windows servers, you use Microsoft Cluster service to link several servers together. With Cluster service, if a failure occurs in a crucial hardware component or in the SQL Server service, then the drives, SQL Server, and associated services will fail over to the secondary server. The entire failover is automatic and generally takes between 30 and 60 seconds. With other SQL Server high-availability solutions such as log shipping or replication, someone has to manually change the roles to the secondary server if a disaster strikes the main server. Although log shipping can provide a reliable redundancy solution, it relies on manual maintenance and can be difficult to set up.
Before you begin clustering, you need to understand that you use Windows clustering for high availability only. Clustering doesn't improve SQL Server performance because only one server works at a time—the linked servers don't process queries together. For an overview of how clustering fits into the overall SQL Server high-availability puzzle, see Michael Hotek's article "High Availability Solutions."
Basic Cluster Architecture
Figure 1 shows a simplified cluster architecture in which two Windows servers, known as nodes, share a disk array or storage area network (SAN). You can have as few as two nodes and as many as eight nodes in a cluster, depending on which versions and editions of SQL Server and Windows you're running. For example, a cluster running the 32-bit version of SQL Server can include as many as four nodes. Although some clusters can support two active nodes, the architecture that Figure 1 shows includes only one active node; the other node is passive—a standby server. Every few seconds, Cluster service runs a simple query against SQL Server to make sure that the standby server is still online. Client applications such as Enterprise Manager or an application that you develop connect to SQL Server through a virtual name or a virtual IP (VIP) address instead of through the real server name and IP address. When a failover occurs, the ownership of the VIP address moves to the standby server, so you don't have to change the connection string for your application to work.
You can choose between two types of clustering: single-instance clustering or multiple-instance clustering (known in SQL Server 7.0 as active/passive and active/active clustering, respectively). A single-instance configuration has only one SQL Server instance installed in the cluster of two or more nodes. A multiple-instance cluster has multiple SQL Server instances installed in the cluster—typically one instance on each node. When you choose to implement multiple-instance clustering, you must make sure that the server or servers participating in the cluster have enough processor power and RAM to support the workload of multiple instances of SQL Server.
In a single-instance cluster, you have to purchase a license for only the active node. The exception is when you have a per-processor SQL Server licensing agreement and you have more processors on the passive node or nodes than the active node. In such a case, you must license the additional processors. In a multiple-instance cluster, you must license all active nodes that have SQL Server installed.
To create a clustered environment for SQL Server 2000, you first need to cluster two or more Windows Server 2003, Windows 2000, or Windows NT servers. Then, you can install SQL Server 2000 on the cluster. In the example I walk through in this article, I show how to set up a two-node cluster on Win2K Enterprise Edition, but Windows 2003 Datacenter Server supports as many as four nodes, and Windows 2003 expands the number of nodes you can use to eight.
As I mentioned earlier, many DBAs and systems administrators are intimidated by the complexity of clustering. But you can put the process into perspective by breaking it into six high-level steps:
- Set up shared drives and networks to use for the cluster
- Create the cluster
- Configure the cluster
- Install SQL Server on the cluster
- Configure the cluster drives
- Install the necessary Windows and SQL Server service packs
Between each of these steps, you need to check the Windows Event Viewer to make sure no errors have cropped up. Serious problems that you might see include drive-related problems such as sharing-violation errors. After you've installed the cluster, you need to check the Windows system and application logs; sqlstpn.log, the SQL Server setup log, in which n in the filename represents a sequential number of setup attempts; and sqlclstr.log, the log of clustered SQL Servers. (The two SQL Server logs don't exist until you try to install SQL Server for the first time.) The combined information from these logs tells you what's going on in the cluster and whether any problems exist. You can also type SET CLUSTERLOG at a command prompt to see where the Cluster service cluster logs are located. Be sure to correct any errors at each step before proceeding to the next step, or the installation might fail. Now, let's walk through the six steps and see how to set up a two-node cluster for SQL Server.
1: Set Up Shared Drives and Networks
Setting up the drives for your clustered servers—and getting Windows to recognize the shared drives—is generally the most tedious part of creating a cluster because the process requires at least three reboots. To set up a shared drive, you can use either a SAN or a shared SCSI device. I usually choose a SAN because it lets me pool drive space across many servers and lets me easily add drives. After you begin the clustering process, you can't add space to an existing drive, so make sure you add plenty of space at setup time for each drive you want to cluster. Although you can add drives later, adding space to existing drives is difficult because clustering doesn't support dynamic drives (a type of drive configuration that typically allows this type of growth). To add space to an existing drive, you use the Diskpart utility, which is included in the Windows 2000 Resource Kit and is built into Windows 2003. But not all hardware vendors support the Diskpart utility, so creating adequate drive space at setup time is preferable to adding space later.
When setting up the drives on the first server (or node) in the cluster (which I call SQL2KNODE1), make sure you have the second server (SQL2KNODE2) powered off. To set up the drives on the first node, right-click My Computer, select Manage, open the Microsoft Management Console (MMC) Computer Management snap-in, and open the Disk Management tool, which you use for managing drives and disk partitions.
When you name your drives, try to use a consistent naming standard and name each drive according to its purpose. I typically choose names like those that Table 1 shows because a DBA or systems administrator can easily see what a drive's purpose is by looking at its name.
After you've formatted and labeled the drives for SQL2KNODE1, power off SQL2K- NODE1 and reboot SQL2KNODE2. Go back to the Computer Management snap-in's Disk Management tool; you'll see that all the drive letters are incorrect but the volume names are correct. Use the volume names to figure out what the drive letters should be, and reset the drive letters so that they match the letters on SQL2KNODE1. Now the drives for both servers are configured. Power off SQL2KNODE2 and power on SQL2K- NODE1.
You have several options for how to set up the cluster network from a simple single-point-of-failure configuration to a configuration that provides full redundancy. To learn about these options, see the Web-exclusive sidebar "Network Setup Options." For this article, I chose the first option that the sidebar describes—using two network cards—for simplicity; but this configuration does leave a communications single point of failure. When you've chosen the setup you want, name each network communications connection based on its purpose. For example, I called my private communications connection LAN_PRIVATE and the public communications connection LAN_PUBLIC. Now that you've set up the shared drives and network, you're ready to create the cluster.
2: Create the Cluster
Before installing Cluster service on the first server, you need to obtain several IP addresses. You need one IP address for each NIC port on each server, one for the Windows VIP address, and one for each SQL Server instance you plan to install.
Go to the Control Panel and select Add/Remove Programs, Windows Components, Cluster service to launch the Cluster service Configuration Wizard. You'll notice when you check Cluster service that the common components of Internet Information Services (IIS) are automatically installed when you install Cluster service. These IIS components are required for Cluster service to install and communicate properly. The first screen in the wizard asks you to confirm that all the hardware you're running Windows on is on the Hardware Compatibility List (HCL). If your hardware complies, select I Understand, and click Next. If even one component in your server isn't on the HCL, you risk creating an unstable environment. On the next wizard screen, select First Node in the Cluster, and click Next.
The wizard then asks what you want to name the Windows cluster. Choose a name no longer than 15 characters that you can easily increment—for example, SAN-CLUSTER01 can easily become SAN-CLUSTER02 and SANCLUSTER03 when you add clusters. Next, type the username, password, and domain name of the account that you want to start the Cluster service. The account needs to be in the Administrators group and must have a password that doesn't expire.
Select the drives that you want Cluster service to manage, then on the next screen, choose a drive to act as the quorum drive. The quorum is a shared drive (typically about 800MB) that holds common logs and files that all the cluster nodes need to function. If the quorum drive becomes corrupt or has problems, your entire cluster will fail. You need to ensure that your quorum drive is backed up as part of your routine backup rotation.
The next set of wizard screens sets up the network communications connections for the cluster. You must select the network name and specify what you want to use the network for (internal communications, public communications, or both). The first screen in this set asks you what network you want to use for the private network connection. Select the Internal Cluster Communication Only option. For the public network connection or connections, select the All Communications option. The next screen asks you which network has priority for internal communications. The last network screen asks you which IP address the cluster will use for client access.
After you've specified how network communications will work, Cluster service finishes the configuration. You can then boot the SQL2KNODE2 server and start the Cluster service Configuration Wizard again on the second node. This time, select the Second or Next Node in the Cluster option, and type the name of the cluster you want to join the node to. Type the account name and password to connect the new node to the cluster. In the next screen, the wizard prompts you to confirm the password. Then, Cluster service joins SQL2KNODE2 to the cluster.
3: Configure the Cluster
Now that your cluster is installed, you're ready to configure it. In this step, you organize the cluster resources into logical groups so that you can later set up dependencies between the resources. The dependencies between the cluster resources let SQL Server use all the cluster's drives.
To configure the cluster, open Cluster Administrator in the Control Panel's Administrative Tools program group. Cluster Administrator will prompt you for the name of the cluster you want to connect to. If you want to connect to the cluster that's on the server you're signed in to, type a period (.) and click OK.
When you expand the Cluster Administrator's Groups section, which Figure 2 shows, you'll see that by default, Cluster service creates a cluster group for each disk in the cluster. A cluster group is a collection of drives and services such as SQL Server that can be interdependent or tied together with dependencies. In a clustered environment, SQL Server can see only the drives that it shares a group with. The default setup after a cluster install is for each drive to be in its own group.
Organizing groups logically is important because you can't fail over individual resources—just groups. Manual failover is a great way to test a failover or to just do maintenance on another node. To manually fail over a group that contains SQL Server, right-click a cluster group in the Groups section, and select Move Group. You'll see the resources in the group switch from online to offline, then to online again in the new node. The Owner column in the right pane of the Cluster Administrator window shows which node currently owns the given resource group. When you first open Cluster Administrator, you'll want to make sure that every group is where it should be (with its preferred owner).
Because you can establish dependencies only among resources that are in the same group, you need to consolidate the cluster groups by node. To consolidate groups, I usually create a new group and give it the same name as the server of the preferred owner for that group. For example, after cluster setup occurs, the SQL Server installation will use the M, N, O, and P drives that Table 1 shows. So, I might consolidate these drives into a cluster group called SQL2K- NODE1. To create a new group, right-click the Groups icon and select New, Group. Name the group, and specify which nodes can own the group. If you want to move the O drive to its preferred group, right-click the drive, select Change Group, then select the name of the group you want to move the O drive to. Cluster Administrator will prompt you to confirm the selection, then the resource will move to the new group. Repeat the same change-group actions for the other nodes in the cluster.
Sometimes when you're configuring your cluster, you might receive the error message The cluster node is not the owner of the group. This error occurs because resources can't move freely between owners. For example, if SQL2KNODE2 owns the Disk P resource and you want to move Disk P to a group that SQL2KNODE1 owns, you'll receive this error. To remedy the problem, simply fail over the resource group as I explained earlier, then try the change again. After you've made all the group changes you want, you can remove the names of groups that are now empty by selecting the group name and pressing the Delete key.
After you install Cluster Services, you might also notice that both nodes might be able to see a given drive in Windows Explorer but only the node that owns the drive can open and use the drive. When you try to access a drive from a node that doesn't own the drive, you won't be able to see the volume name of the drive—you'll see only the drive letter. And when you try to access the drive, you'll receive the error message Drive is Not Accessible. Now that you've configured the cluster, you're ready to install SQL Server.
4: Installing SQL Server in a Cluster
After you've moved all the cluster resources to their new cluster groups, you're ready to install SQL Server on the cluster. But before you perform the installation, you need to run the comclust.exe command-line utility from a command prompt on the primary node. This tool adds the Microsoft Distributed Transaction Coordinator (MS DTC) resource to the group called Cluster Group so that all nodes can share the resource. Then, run comclust.exe on each of the other nodes in the cluster.
Now, you can install SQL Server on the cluster. Installing SQL Server on a cluster is similar to installing SQL Server on a local drive. Figure 3 shows one of the first installation screens you'll see. When you name the virtual server, SQL Server automatically recognizes that you're trying to install the instance on a cluster and selects the appropriate option (Virtual Server) by default. Type a unique virtual server name—again, I prefer to use a simple naming convention that I can increment (e.g., VSQL01)—and click Next. After you click Next, you might notice a long pause for DNS registration and verification.
Next, SQL Server prompts you to assign a VIP address to the virtual SQL Server name and specify a network for the SQL Server to use, as Figure 4 shows. Type the VIP address and the network name, and click Add.
The next parts of the installation process let you specify where you want to put the data files and what nodes are possible owners of your SQL Server instance. On the screen where you specify the path of the data and binary files, confirm the path to your data once more. I've noticed, for example, that sometimes even though I specified that I wanted the data on the P drive, the selection might change on the final confirmation screen. The SQL Server program binaries will install on the local servers across any node that you selected earlier as a possible owner of your instance. The data files will reside on the shared cluster drive.
The last step in installing SQL Server on the cluster is to type the username and password of a valid administrator account for all nodes that are possible owners of SQL Server. You use this account to copy the data from the node where you've installed SQL Server to the other participating nodes. Then, SQL Server will copy the binaries to all the nodes and register the ActiveX controls (as it does in a typical SQL Server installation).
This process can take as long as 10 minutes, during which time you'll see only the message Setup is performing required operations on cluster nodes. This may take a few minutes. Eventually, the SQL Server service and supporting services start, and the installation is complete. As with a typical SQL Server installation, you might need to reboot both nodes, depending on whether you use updated Microsoft Data Access Components (MDAC) and whether the setup could obtain exclusive access to some of the files.
If an installation fails, you might be in for a long debugging process because during setup, you don't see the error on the screen, so you have to dig through many logs to determine the root cause. To learn some techniques for trouble-shooting a SQL Server installation in a clustered environment, see the Web-exclusive sidebar "Troubleshooting a Failed Installation."
5: Configuring Cluster Drives
After you install SQL Server on the cluster, you have to configure the cluster's drives. SQL Server's default cluster setup allows only one clustered drive for its data. This means that if you have more than one clustered drive, SQL Server won't be able to see the other drives to create backups or write data to. If you want to use additional drives, you need to first ensure that the drives are in the cluster group with the SQL Server service. Then, you need to make the SQL Server resource dependent on the new drives because SQL Server can write only to drives that it knows are available. If the drive fails, then SQL Server too must fail or you risk database corruption.
To set a SQL Server resource to be dependent on the new drives, you must first take the SQL Server resource offline by right-clicking the resource in the MMC Cluster Administrator snap-in, then selecting Take Offline. This selection stops the SQL Server service, resulting in downtime for applications connecting to your database. Next, right-click the resource, and select Properties. On the SQL Server Properties screen, you can select the Dependencies tab to view the current dependencies. Click the Modify button to add new drives.
To find out which clustered drive resources are accessible to the SQL Server instance, you can use the T-SQL function fn_servershareddrives(). In the query
the function will list the drives that SQL Server can access in the cluster. You can use this function for troubleshooting if you've forgotten to create a dependency.
Now that you've set up the cluster and installed and configured your first SQL Server instance, you have a single-instance cluster. To create a multiple-instance cluster, you simply install another instance of SQL Server on a separate node.
6: Installing Service Packs
Even though you've successfully set up your cluster, you're not finished yet. The last step in creating your cluster is to install Win2K Service Pack 4 (SP4) and SQL Server 2000's latest service pack, SP3a. Win2K SP4 includes many clustering fixes and solves some frustrating problems you might encounter, such as a mysterious failure when SQL Server loses drive connectivity. SQL Server SP3a also provides clustering fixes and is as simple to install on a cluster as it is to install in a typical SQL Server environment.
One final note about running your cluster long term: If your SQL Server is dedicated to database processes (i.e., it's not a Web server), you don't need to install antivirus software on the server. Installing antivirus software on your SQL Server might cause problems when a failover occurs. For example, during a failover, the antivirus software sees new drives on the standby node and begins to scan them. If you have drives that contain hundreds of gigabytes or terabytes of data, the antivirus scan can force your processors to spend too much time scanning and not enough time taking SQL Server requests, resulting in degraded performance during the scan. If you want to run antivirus software on your SQL Server, as many corporations mandate, make sure you don't scan your quorum drive or data, log, or backup files.
Clustering is not a mysterious art; it's a straightforward process that you can break into manageable steps. Although this article describes the bulk of what you might encounter when installing a cluster, you might need to practice deploying your cluster a couple of times before you find all the quirks of your hardware.