Using LOCK_TIMEOUT to Skip Locked Data

Download the Code icon

Besides the READPAST hint, SQL Server 7.0 has a SET option that gives you another way to skip locked data. The option, LOCK_TIMEOUT, tells SQL Server not to wait more than a specified number of milliseconds for a lock to be released. Setting LOCK_TIMEOUT to zero means that SQL Server won't wait at all if it finds any locked data. The documentation for LOCK_TIMEOUT implies that it's interchangeable with READPAST, but there's a crucial difference. READPAST skips individual locked rows. Setting LOCK_TIMEOUT to zero causes SQL Server to stop processing the statement when it finds a locked row.

Related: 54 Administration Tips

Although the LOCK_TIMEOUT setting might sound good, use it with extreme caution. The timeout doesn't automatically roll back a transaction. So if SQL Server reaches its lock timeout value, it stops trying to modify rows in the current table and moves on to the next statement. Instead of the transaction being an atomic, all-or-nothing operation, you might be left with part of the transaction incompletely executed. If you want the transaction to be all-or-nothing, you need to include a specific test for error 1222 and include an explicit ROLLBACK TRANSACTION as the action to perform when SQL Server encounters the error. You might consider adding this test to all data-modification statements in applications that adjust the LOCK_TIMEOUT value. This test is the only way to guarantee that the transactions maintain their consistency.

Within SQL Server's storage engine, the T-SQL BULK INSERT statement, bcp, and the Data Transformation Services (DTS) Bulk Insert task all execute the same code. The T-SQL BULK INSERT statement can perform faster than the others because BULK INSERT executes completely within the SQL Server process. Bcp and DTS, utility programs that execute in their own processes, must bear the overhead of interprocess communications to pass data from the source text file to SQL Server. When you run bcp or DTS on a client computer rather than on the SQL Server system, the need to copy the data across a network connection adds significant overhead and further slows the bulk copy operation.

In my test results, the potential advantage of the T-SQL BULK INSERT statement didn't always translate into faster loading. According to Microsoft Product Support Services (PSS), the advantage is significant with very large tables (Microsoft routinely tests with 50 million–row tables), when you load multiple tables into a database in parallel, and when you use file groups to segregate onto different disk volumes the tables that you plan to load in parallel. Other factors also influence the overall performance of a bulk copy operation: Floating-point fields are faster than character fields, and files created to store data in SQL Server's native, internal, binary data format import more quickly than files in external ASCII character format.

A Database by a Different Name

Q. I'm having a problem restoring a database. I backed up database ABC from server1 to the abc.bak file. Then, I created a new database called XYZ on server2. The servers belong to different companies and aren't connected to each other. I wanted to restore the abc.bak backup on the XYZ database on server2, so I selected Restore, From device, and on the Option tab, I selected Force restore over the existing database. In the option below the Force restore option, I specified current data and log file path for Move to Physical File Name. However, the Logical File Name shows ABC database names. If I try to change the ABC database names to XYZ names, the restore fails. But if I keep the ABC database names, the restore works. How can I overcome the Logical File Name problem?

A. You can't change the logical name of a file during a restore, so you must use the existing database names. Simply restore the database by using the ABC database names, and move the files to their new locations. You can then use the following ALTER DATABASE statement to change the logical names to XYZ names:

ALTER DATABASE MODIFY FILE

Note that this solution works only on SQL Server 2000, not on SQL Server 7.0.

Hidden Treasure

You often need to query the master database for all tables or databases and perform an action on the results, such as DBCC CHECKDB. Traditionally, you used a cursor to perform this type of query, but a query this size leaves a lot of room for error. In SQL Server 6.5 and 7.0, Microsoft ships a set of undocumented stored procedures that let you perform the query in one line.

Two little-known stored procedures, sp_ MSforeachtable and sp_MSforeachdb, ship with SQL Server 6.5 and 7.0 and let you substitute a question mark for a table or database name. You can run DBCC CHECKTABLE on each table in the database you're in by using the one-line cursor query

sp_MSforeachtable "print '?' DBCC CHECKTABLE ('?')"

Microsoft has improved sp_MSforeachdb in SQL Server 7.0. In SQL Server 6.5, you have to create a temporary table, populate the table, then run sp_MSforeachdb. However, in SQL Server 7.0, you can run the following one-line query to perform a DBCC CHECKDB on all databases:

Sp_MSforeachdb @command1="print '?' DBCC CHECKTABLE ('?')"

Safely Changing SQL Server's Service Account

When you use the Control Panel Services applet to change the service account that SQL Server 7.0 runs under, beware. You might need to change the server account from a local system account to a domain user account, for example, to enable email capability with a Messaging API (MAPI)-compliant email system. But using the Services applet to change the account causes Full-Text Search to stop working.

To safely change the service account SQL Server runs under, select the database's Properties sheet, then the Security tab. In the Startup Service Account box, make the appropriate account selection. If you want to change the account to a domain user, you must be logged in as the account that you're changing SQL Server to run under (i.e., domain\user _account). If you're not logged in as the domain user that you're setting the account to, you'll receive the following error message:
A required privilege is not held by the client.

If you've already changed the account under the Services applet and Full-Text Search isn't working, follow the above procedures, then rebuild the full-text index for the table.

Difficult Data Types

Q. I have a table with a decimal data type in a field called COD. If I submit a query with a filter that passes a number, as in

SELECT * FROM table1 WHERE COD = 123

then SQL Server applies an index scan and returns the results very slowly. If I use a decimal point after the number, as in

SELECT * FROM table1 WHERE COD = 123.

then SQL Server applies an index seek and returns the results much faster. Why does using the decimal point make a difference in the way SQL Server handles the query?

A. If you don't use a decimal point at the end of the number, then SQL Server sees the number as an integer value and needs to convert the data to compare it to an integer. If you use the decimal point, SQL Server correctly sees the data type as numeric (which is what the decimal data type is) and can use an index seek, which is a faster method than using an index scan in this case.

Log Shipping Tips

Large databases present the greatest challenges when you're performing management tasks, including log shipping. Here are some of the things I've learned through experience.

Size matters. Databases larger than, say, 20GB can take a long time to back up, copy, and restore. A 30GB database can take more than an hour to back up and can produce a backup file nearly as large as the database itself. I found that, over time, setting up log shipping for large databases one by one was easier than trying to set them up all at once.

Avoid interference. Long-running database backups block scheduled transaction log backups, producing error messages in the SQL Agent jobs. I decided to schedule transaction log backups only outside the database backup window.

The database as a stage. Loading large amounts of data into the production server's databases and performing massive updates on the loaded data often dramatically increased the size of the transaction log backup files. I loaded data that needed massaging into a staging database that didn't need log backups and shut down the transaction log backup jobs on other log-shipping databases during large data loads.

No reporting, please! You can manually restore the database and transaction logs to the standby server with STANDBY, which puts the database into read-only mode, or with NORECOVERY, which leaves the database in a non-operating mode. However, in STANDBY mode, if any user is querying the database or maintaining any kind of connection to it, the log-shipping load job can't restore any logs because restoring a transaction log requires exclusive use of the database. Because our transaction log backup frequency was 5 minutes, we had to implement a policy that users (and developers) can't access the standby server databases.

Multiple databases with STANDBY. When you specify a database restore with STANDBY, you must also specify an undo file name. I found that if I tried to restore more than one database at a time, the databases couldn't use the same undo filename. If they did, the restores failed. Perform your full database restores with NORECOVERY instead.

Yet another service pack. When you install a SQL Server 7.0 service pack, all databases must be in a recovered mode. Consequently, to apply a service pack, you must recover the databases, apply the service pack, reinitialize the standby server's databases from a full database backup, then restart log shipping.

Get Real About Data Modeling

When you're creating the physical model, place yourself in a production environment and salt your model with a healthy dose of pragmatism. For example, in a perfect world, each salesperson would represent one of your vendor companies. But in the real world, you might meet a salesperson and want to record that salesperson's name and phone number in your database before you start buying from (and consequently add data about) the represented company. Also, salespeople change jobs; if there's little difference between the companies, you might prefer to maintain your business relationship with the salesperson. For example, if you constrain the relationship between Vendor and SalesPerson with a composite primary key of VendorID + PersonID, you can't enter the salesperson's information in the SalesPerson table because the associated company isn't in your database. You can't add a record to a table until you have the entire primary key. So, rather than restrict what you and your data users can do, you model your database to accommodate the world's imperfections.

Profiler Trace Paused

Q. When I use SQL Server Profiler to trace the execution of a procedure, including each statement the procedure executes, Profiler generates TraceAutoPause events. These events seem to stop Profiler from tracing until a TraceAutoRestart event occurs. Why is Profiler generating the TraceAutoPause events?

A. Occasionally, Profiler can't consume the information it's tracing fast enough. Rather than slow down SQL Server or run the risk of capturing only intermittent commands (which would be worse than an explicit warning that Profiler has paused the trace), Profiler stops producing events until it can read all existing events from the Profiler queue. Event production starts again when Profiler catches up.

Here's a way around the pause. You might have noticed the Server Processes SQL Server trace data check box on the General tab of Profiler's Trace Properties dialog box. Selecting this check box causes SQL Server to capture all events and write them to a trace file. SQL Server captures all events when it runs a trace in this mode, but you run the risk of slowing SQL Server performance under heavy trace-event loads.

DTS Migration from Access

Beware when trying to import Microsoft Access tables into SQL Server 2000 by using the DTS wizard. After you successfully create the tables, the indexes that were on the tables in Access no longer exist, and you have to rebuild each table manually in SQL Server. DTS won't move all of your schema (including indexes) unless the source is SQL Server. DTS will move the base table structure, but it won't move other important schema components, such as indexes. You can move the indexes manually or use the Access Upsizing Wizard. If you choose to use the Access Upsizing Wizard, you need to upgrade to the newest online version of the wizard at the Microsoft article "OFF2000: Administrative Update Available for Access 2000 and SQL Server 2000 Readiness" (http://support.microsoft.com/support/kb/articles/q287/4/84.asp). Be careful: The Access 2000 Upsizing Wizard version that ships in the box doesn't work with SQL Server 2000. Even the downloadable wizard does a poor job of upsizing SQL Server schema. A reasonable compromise might be to upsize the Access database .mdb file to a new SQL Server database simply to generate the indexes. Then, you can use Enterprise Manager to script out the Access-generated indexes and apply them to the SQL Server database.

The LIKE Clause Problem

Every once in a while, the "LIKE clause problem" shows up in the SQL Server newsgroups. Say your users need to perform string searches against the title_id field in the Pubs database. They need the complete functionality that the LIKE clause provides, including the ability to use wildcards. Entering the following SELECT statement against the Pubs database shows that some title_ids begin with the letter P:

SELECT * FROM titles WHERE title_id LIKE 'P%'

Now, let's duplicate the LIKE clause problem. Declare a local variable to accept the value the user types in, then use the local variable with the LIKE clause in your SELECT statement:

DECLARE @i char(10)
SELECT @i = 'P%'
SELECT * FROM titles WHERE title_id LIKE @i

When you execute the query, you get no records in your resultset. You know the records exist, so what's the problem? Char columns are fixed-length columns. When you store a shorter string than the column can hold, SQL Server automatically pads the rest of the column with spaces. So SQL Server interprets the SELECT statement above as if you had entered

SELECT * FROM titles WHERE title_id LIKE 'P% '

This SELECT statement asks for title_ids that begin with 'P' and end with eight spaces—and of course, none exist. The solution is simply to use varchar columns for local variables that you use with the LIKE clause. The following code snippet changes the data type of @i from char to varchar and solves the problem:

DECLARE @i varchar(10)
SELECT @i = 'P%'
SELECT * FROM titles WHERE title_id LIKE @i

I recommend that you always use varchar local variables or parameters to contain strings you use with the LIKE clause.

Flexible Extended Properties

Q. When you use Enterprise Manager in SQL Server 2000 to create a table, the bottom half of the screen lists several properties of the selected column: Description, Default Value, Precision, Scale, Identity, Identity Seed, Identity Increment, Is RowGuid, Formula, and Collation. How can I use a SELECT statement or function to return the Description property for a particular column?

A. Enterprise Manager creates and stores the Description property as an extended property. You can use extended properties to store application- or site-specific information about the database and the following database objects: tables, views, procedures, functions, defaults, rules, columns, parameters, indexes, constraints, and triggers.

You use three system stored procedures and a function to create and manage extended properties:

  • sp_addextendedproperty
  • sp_updateextendedproperty
  • sp_dropextendedproperty
  • fn_listextendedproperty()

Enterprise Manager uses these commands for creating, managing, and retrieving the description property. Figure 1 shows how to add a description for the au_id column of the authors table in the Pubs database. The following query shows how you can use fn_listextendedproperty() to retrieve the extended property information you just added:

SELECT *
FROM ::fn_listextendedproperty (NULL, 'user', 'dbo',
'table', 'authors', 'column', default)

This code produces a table that describes the extended property, named MS_Description. This naming taxonomy is consistent with objects that you name through Enterprise Manager. For more information about extended properties, see Bill Vaughn, "Managing Extended Properties," July 2001, and the "Property Management" topic in SQL Server Books Online (BOL).

Backing Up an MSDE Database

You can use several techniques to back up an MSDE database. One of the easiest ways is to use Enterprise Manager wizards to set up a SQL Server Agent maintenance job. But if you don't want to set up a job, you can easily write the code yourself to back up the database. You can choose to have the users do the backup, or do it yourself without their knowledge.

My sample application (part of which Listing 1 shows) lets users choose when to back up, but the application reminds them to back up if they haven't done so for a couple of days. Because this database doesn't have much data and changes are infrequent, skipping a couple of days shouldn't be a problem. The code at callout H shows the routine to back up the database. Note that I'm using an open ADO Connection cn to execute the backup action query. This code doesn't use SQL-DMO; it simply executes T-SQL's BACKUP DATABASE command with a few options to make sure the database backup file is initialized each time it's backed up. If you have a more complex or more heavily used database, you probably won't want to use the WITH INIT option but instead will let the BACKUP command append each backup to the existing backup set. That way, you can restore the database from any of the existing backups. The sample code also records the date of the last backup to the registry so that you can keep track of when to prompt for another backup.

When you set up your database script, be sure the user has rights to back up and restore the database. If you set up a special admin account, you'll have to use this account to open another connection before attempting the backup.

A Simple Tool for SQL Server 2000 Log Shipping

Enterprise Manager's log shipping utility is available only for the SQL Server 2000 Enterprise Edition and SQL Server 2000 Developer Edition. So can you log-ship databases that run on SQL Server 2000 Standard Edition? Microsoft's answer for the Standard Edition is the Simple Log Shipper tool, which you can find in the Microsoft SQL Server 2000 Resource Kit. Simple Log Shipper uses a linked-server relationship between the primary server and the secondary server. The log shipping activity is controlled from a stored procedure called sp_ShipLog, which you run in a SQL Server Agent job on the primary server. This stored procedure backs up the primary server database's transaction log to a Universal Naming Convention (UNC) file location and makes a remote procedure call (RPC) to sp_ApplyStandbyLog on the secondary server to restore the log file to the secondary database. These stored procedures typically are located in each server's master database, but that location isn't a requirement. Simple Log Shipper doesn't provide cleanup for transaction-log backup files, so you have to archive them and delete the old ones manually. You can monitor Simple Log Shipper by reading the primary server's SQL Server Agent job history, SQL Server error log, and Windows event log.

Upgrading from SQL Server 7.0 to 2000

You can take any of three approaches to upgrading from SQL Server 7.0 to 2000. First, you can simply upgrade in-place, replacing your SQL Server 7.0 installation with SQL Server 2000, so that all the databases are upgraded and functioning. Alternatively, you can install a named instance of SQL Server 2000 on the server and run it side by side with SQL Server 7.0. (Note that you can't run SQL Server 7.0 and the default instance of SQL Server 2000 at the same time on one computer.) Then, you can transfer the databases one at a time to the SQL Server 2000 instance. The third option is to install SQL Server 2000 on a new computer and move the databases over to the new computer. While you're upgrading, you can also change to a different edition of SQL Server. For example, you can change from SQL Server 7.0 Standard Edition to SQL Server 2000 Enterprise Edition.

The first upgrade option, the in-place upgrade, removes your SQL Server 7.0 installation, so make sure that you have backups of the databases and the SQL Server 7.0 installation before you begin the upgrade, just in case. The new instance of SQL Server 2000 is the default instance.

The second and third options let you keep SQL Server 7.0 up and running, with no risk of downtime for the production databases. You can then transfer the databases and users while retaining the SQL Server 7.0 versions as a fallback option. You can move the databases manually, using a detach/reattach strategy, but the Database Copy Wizard offers an even easier option. It builds the detach and reattach jobs as a DTS package. You can copy not only the databases but also the logins, users, and security information.

Note that to avoid losing data, you must disconnect the users from the SQL Server 7.0 database before you move it, then reconnect them to the SQL Server 2000 database when it's ready. From that point on, the SQL Server 7.0 database won't be updated unless you specifically set up replication or distributed transactions to keep it up-to-date. As always, careful planning is the key to upgrade success.

If you elect to use the upgrade method that lets you run a named instance of SQL Server 2000 alongside SQL Server 7.0, tools such as Enterprise Manager and Query Analyzer will be upgraded to the 8.0 (SQL Server 2000) versions. Your Start menu will show two program groups, one for SQL Server 2000 and one for SQL Server 7.0. Many entries in the SQL Server 7.0 group will be modified to point to the 8.0 tools. The BOL entry in the SQL Server 7.0 program group will continue to point to SQL Server 7.0 BOL.

If you have replication set up, you have to use the in-place upgrade. The alternative is to remove and reestablish your replication setup.

Running Virus-Scanning Software on the Same Server as SQL Server

Let's evaluate three kinds of scanning software—active virus scanning, virus sweep, and vulnerability scanning—to see how they apply to a SQL Server installation.

Active virus-scanning software. We strongly recommend active virus scanning for SQL Servers that

  • run on the Internet
  • read and execute files from other servers
  • run HTTP servers such as Microsoft IIS or Microsoft Apache (e.g., if you use SQLXML to access SQL Server through HTTP on IIS)
  • host file shares
  • use SQL Mail to handle inbound or outbound email messages

The main performance hit takes place at file open and close, and although active virus scanning can cause a perceptible drain on a heavily loaded system, we recommend it for safety purposes.

Virus-sweep software. If the virus-sweep software has no way to exclude SQL Server database files from its scan, the software can degrade SQL Server performance by flooding the I/O subsystem with reads, especially on servers with large numbers of files. And if the sweep has opened a file for scanning and you start SQL Server, the software could mark the database as suspect, rendering the database unavailable for access. This type of software catches viruses after they've invaded your hard disk, so your organization is better off reducing the risk of infection in the first place by adopting sound security practices.

Vulnerability-scanning software. Vulnerability-scanning software has been available for HTTP servers for some time. Internet Security Systems (ISS), for example, offers a scanner for SQL Server 7.0 and 6.5. (You can download a trial version at http://education.iss.net/products/dbsql.) This kind of virus scan is a great way to lock down potential holes. We're not aware of any SQL Server performance loss when you use vulnerability-scanning software.

We also recommend that you subscribe to Microsoft's e-mail security alert alias. For information about how to subscribe, go to http://www.microsoft.com/security, then click Security Bulletins under For IT Professionals.

Importing a Word Table into SQL Server

Have you ever needed to move data from a Word table into SQL Server? If so, you know that Word has no OLE DB provider, so DTS can't easily access a Word document. Your best bet is to write a VBScript macro to connect Word to SQL Server. Using the VBScript macro, you can load the data directly into SQL Server without using a SQL Server utility.

Alternatively, you can partially automate the import by copying the Word table into Microsoft Excel, then exporting the data to DTS. Follow these steps:

  • In Word, select a table by clicking the Table Select icon in the table's top left corner, then press Ctrl+C to copy the table.
  • To convert the table into an Excel row, in Excel, paste the table into a workbook by clicking the top left cell (A1), then pressing Ctrl+V.
  • To transpose the newly pasted row to a column, copy the row, tab to a new worksheet, click in the A1 cell, click the drop-down arrow on the paste icon, then select the Transpose option. The transposing action turns each row into a column.
  • Select the data row, then paste it to the end of the master sheet. When all tables have the same form and structure, include the row headers only the first time you paste a row into the sheet.
  • Repeat Steps 1 through 4 for each Word table.

When you've completed this process, you'll have a worksheet with one title row and all the data rows in a format that DTS can load. If you're planning to export many Word tables and you want to avoid keystrokes, you might consider using macro automation to drive both Word and Excel.

Choose Custom SQL Server 2000 Installation

In the Setup Type dialog box, which Figure 2 shows, the Installation Wizard asks you to choose among three installation types: Typical, Minimum, and Custom. If you choose Typical or Minimum, SQL Server uses default options for components and subcomponents, collations, and Network Libraries. Because of the Typical installation's potential to cause tricky problems, I strongly recommend that you always choose Custom even if you think that the defaults meet your installation needs. Some of the aforementioned options—particularly collation settings—are very hard to change after installation if the defaults turn out not to meet your needs. Custom installation lets you double-check those options. Walking through a few more dialog boxes is a small price to pay to make sure that you select the right installation options.

When to Use Multiserver Jobs

Multiserver jobs work well when you need to run the same job on multiple servers—for example, when you need to back up the system's databases or multiple replicated copies of a database on different servers. As federated databases grow in popularity, you might find even more uses for multiserver jobs. In a federated database, a table's data is partitioned among multiple databases on multiple servers. If a federated database has the same name on each server—but you can't assume that it does—a multiserver job is a great way to back up an entire distributed table. If you have any influence on design, make sure that the database uses the same name on each server instead of, for example, Customers1, Customers2, and so on.

Multiserver jobs aren't necessarily as useful when you have several servers, each with a different set of databases. You couldn't set up the same job for different databases unless you wrote a script and set up a job to cycle through the user databases one by one on each server.

Because Microsoft decided not to let SQL Server Personal Edition be a master server or a target server, the company has eliminated a couple of possible scenarios in which multiserver jobs could be very useful. For example, you can't set up a computer with SQL Server and Windows 2000 Professional to act as an administrative center and push jobs to the production servers. Also, you can't use the multiserver job functionality to control jobs for a group of users who have SQL Server Personal Edition on their desktop machines, although these users are probably least likely to be able to set up their own maintenance tasks. Once again, marketing decisions, rather than technical issues, affect how you administer your SQL Server network.

Scheduling the Log Reader Agent

If you have multiple replicated databases that you don't have to update in realtime, you typically schedule your Distribution Agent to run hourly or even once a day. This practice decreases the processor load on your SQL Server machine. You can further alleviate the processor load if you also schedule your Log Reader Agent instead of running it continuously, which is the default.

Set Log Reader Agent start times so that they don't overlap. SQL Server will execute one Log Reader Agent job at a time in a cycle that encompasses all the replicated databases. Thus, you change many jobs executing in parallel into a series of jobs that cycle through the databases.

Besides decreasing the processor load, another less obvious advantage of using scheduled Log Reader Agents is that now you can access the Log Reader Agent history and see, for example, the time of day when the maximum number of transactions occurred.

Listing 2 shows the code that will schedule all Log Reader Agents to run one after the other. The start time is arbitrary; for this example, I chose midnight plus the recurrence interval. The interval between jobs depends on the number of replicated databases and the cycle time, which you tailor so that the jobs won't overlap. If you have four replicated databases and you choose to schedule the Log Reader Agent to run at 15-minute intervals on each database, the stagger interval can be between 1 and 3 minutes. To include more replicated databases in the cycle, you must increase the total length of the cycle or decrease the stagger interval to accommodate all the jobs. In the example that Listing 2 shows, you could extend the number to 14 databases running the Log Reader Agent at 1-minute intervals.

Listing 3 shows the stored procedure in the master database that will let the code in Listing 2 schedule all Log Reader Agents on the server. To reset the Log Reader Agents to the default (i.e., running continuously), run the stored procedure with no parameters besides the database name.

Migrating Backward

Q. I need to migrate my SQL Server 2000 database for use on a SQL Server 7.0 database. I haven't used any SQL Server 2000-specific features. What's the best way to move this database back to SQL Server 7.0?

A. You can migrate the database in three ways, depending on the volume of data involved and how often you will perform this move. If you have small amounts of data, you can set up a linked server (SQL Server 7.0 linked to SQL Server 2000). On the SQL Server 7.0 machine, use SELECT INTO against the linked server to move data table by table. Or if you have a lot of data, use the SQL Server 2000 bulk copy program (bcp) to pull data out of SQL Server 2000 in Native format and write the data into SQL Server 7.0. You can easily write a bit of code to generate an appropriate batch file to copy the data to SQL Server 7.0, as the following code sample shows:

SELECT 'bcp.exe "' + TABLE_CATALOG + '.\[' +
TABLE_SCHEMA + '\].\[' + TABLE_NAME + '\]" OUT
"C:\TEMP\' + TABLE_NAME + '.bin" -n -T'
FROM INFORMATION_SCHEMA.TABLES
-- To generate the statements for input at the
-- SQL Server 7.0 server
SELECT 'bcp.exe "' + TABLE_CATALOG + '.\[' +
TABLE_SCHEMA + '\].\[' + TABLE_NAME + '\]" IN
"C:\TEMP\' + TABLE_NAME + '.bin" -n -T'
FROM INFORMATION_SCHEMA.TABLES

Using Native mode with the -n switch is release-specific, so to move data in character mode, use the bcp.exe that comes with the earliest SQL Server release (in this case, SQL Server 7.0) or use the -c flag. If you're crossing character-set boundaries, consider using the -w flag so that you can apply Unicode data formats. However, using Unicode data formats can considerably expand the size of the transferred files and increase the processing required to do the task. Use Enterprise Manager 2000 to script out definitions for all objects (e.g., tables, indexes, primary keys, foreign keys, triggers, stored procedures). Then, run the scripts against SQL Server 7.0 to create the empty schema. Finally, if moving the data is an ongoing requirement, consider a third way to migrate your data—replication from SQL Server 2000 to SQL Server 7.0.

A faster alternative to bcp for loading data is BULK INSERT. However, BULK INSERT's error handling isn't as strong as bcp's error handling, and typically you would use BULK INSERT only with very clean data. To use BULK INSERT to load data, modify the earlier sample SELECT statement as follows:

SELECT 'BULK INSERT ' + TABLE_CATALOG + '.\['
+ TABLE_SCHEMA + '\].\[' + TABLE_NAME + '\] FROM
"C:\TEMP\' + TABLE_NAME + '.bin" WITH (DATAFILETYPE="native")'
FROM INFORMATION_SCHEMA.TABLES

Before you use any of these methods to migrate your data, examine your data. If the data contains IDENTITY columns or NULL values on fields whose default values you need to retain, remember to handle them appropriately. Use the -E flag to control the way bcp.exe handles identities (or use KEEPIDENTITY for BULK INSERT). And use the —k flag for bcp and NULLS (or use the KEEPNULLS setting for BULK INSERT). You need to consider these settings when you use the IDENTITY field to generate surrogate keys for linking tables because if the value changes, the link between the rows could be lost.

One final comment: These examples assume the use of the default SQL Server. If you're unloading the data from or loading the data into a named instance of SQL Server or from a remote SQL Server, you should add the -S flag to the generated bcp.exe statements to specify the server name.

Views in Control

Q. I have a Microsoft Access 2000 application with SQL Server 2000 as the back end. To prevent Access users from viewing all the data in a SQL Server 2000 table, I want to use a view that displays only data rows that the user has been authorized to see. Do SQL Server views support such a scenario?

A. Yes, you can create a view that limits user access to SQL Server data—if each user logs in to Access with a unique userid. The following sample statement creates such a view:

CREATE VIEW v_data AS
SELECT
FROM dbo.mytable AS a
INNER JOIN dbo.authtable AS b
ON (a.Pkey = b.DataKey
AND b.userid = suser_sname())

This view will restrict access based on userid and will require you to maintain a table (authtable) with the user name matched to specific primary keys in the data table (mytable). If your situation is less complex and doesn't require you to manage row access for multiple users, you could insert the userid column into the data table, as the following code shows:

CREATE VIEW v_data AS
SELECT >
FROM dbo.mytable AS a
WHERE a.userid = suser_sname()

A Quick Recovery

One of the biggest benefits of using the file and filegroup backup strategy is the ability to recover quickly from isolated disk failures. If only one file or filegroup is damaged, you can easily recover by restoring only the damaged files or filegroups.

Using the case study from the main article, imagine that only file PubsTestRWFile2 has been damaged. In this case, the database is suspect but most of the database files are still accessible. To recover up to the time of the failure, you proceed very much like you do with the whole database recovery, except you need to look at only one file: the damaged PubsTestRWFile2.

However, before you proceed, be aware of a problem that exists with read-only filegroups and isolated file and filegroup failures. During recovery, transaction log restores will fail if a filegroup (even one that isn't being restored) had the READONLY property set when the log being restored was backed up. The failure occurs only when you're attempting to restore a subset of the database (as in the case of isolated failures); it doesn't occur when you're recovering the whole database. I recommend that you avoid this problem by using filegroups for read-only purposes— perhaps even logically setting up a filegroup solely for read-only data.

I don't recommend enforcing read-only activity by setting the READONLY property. As of SQL Server 2000 Service Pack 2 (SP2), you can't restore transaction logs after an isolated failure occurs and only file or filegroup backups have been applied. One workaround (if you're in the middle of recovery and discover the problem) is to restore the entire database as in the main article. This solution can add a tremendous amount of downtime, but you won't have data loss and you'll be able to recover up to the minute of failure. Or, you can avoid this bug entirely by not using the READONLY attribute for filegroups.

If you want to experiment with this scenario, you can use the PubsTest case study to set the READONLY property for the ROFG filegroup. A section in the FileFilegroupStrategiesCaseStudy.sql downloadable script has this code commented out. Search for "MODIFY FILEGROUP ROFG READONLY" and remove the comments on this ALTER DATABASE statement. Next, run the entire script to recreate the PubsTest database. Finally, follow the strategy detailed here to recover from isolated disk failure. The script that Listing 4 shows will successfully restore the NewPubsTest database after an isolated failure only if the READONLY attribute has never been used for any filegroups within the database.

So, to recover up to the minute in a case of isolated failure in the PubsTestRWFile2 file, you need to start by recovering that file. In the example backups, you backed up the entire filegroup of which PubsTestRWFile2 is a member, so you can restore just this file from this filegroup. The last full filegroup backup was number 3. The syntax to restore this full file from the full filegroup backup is

RESTORE DATABASE PubsTest FILE = 'PubsTestRWFile2' FROM DISK = N'C:\Program
FilesMicrosoft SQL Server\MSSQL\BACKUP\pubsbackup.dat' WITH FILE = 3, -- This is
the backup number by position. NORECOVERY

The next step is to restore the latest differential backup of this file. Again, you backed up this file only as part of the RWFG filegroup. You can use the last differential backup of RWFG (number 11) to get the differential backup of this file:

RESTORE DATABASE PubsTest
FILE = 'PubsTestRWFile2'
FROM DISK = N'C:\Program FilesMicrosoft SQL Server\MSSQL\BACKUP\pubsbackup.dat'
WITH FILE = 11, -- This is the backup number by position.
NORECOVERY

Finally, you must apply the correct sequence of transaction log backups to roll forward the database to the time of the failure. Using the msdb query from the main article to determine the minimum effective log sequence number (LSN), you'll find that the first log backup you need to apply is number 12. In the event of an isolated failure in a large database, the file and filegroup backup strategy ensures rapid recovery and minimal downtime.

Filegroups and the Large Database

Deciding what type of user data should go into your filegroups can take some planning. To start, I recommend that the primary filegroup contain only one file (an .mdf) that contains only the system tables (sysobjects, sysindexes, sysusers, and so on). For all user-defined objects, I recommend user-defined filegroups. Also, I recommend that you have only one transaction log file (an .ldf). Only one transaction log file is necessary because frequent log backups minimize the space the changes require. If you back up your log every minute, the log needs to hold only 1 minute's worth of log entries.

However, long-running transactions might require a larger log. The following three hints can help give you the best performance for your transaction log:

  1. Place the transaction log on a drive that's isolated from all other activity.
  2. Make sure the log's initial size is large enough to handle the bulk of the database activity that occurs between backups without autogrowth.
  3. If autogrowth is still necessary, make sure the autogrowth rate is set reasonably. If your database has several small autogrowths, the transaction log will be fragmented into many small virtual log files (VLFs). VLFs aren't visible—they're part of the transaction log and are for SQL Server's use only—but they can degrade performance.

Group as many files as necessary to create a filegroup large enough to hold the data destined for that filegroup. Here are a few recommendations for filegroup assignments:

  1. One filegroup to hold a large table that might need backup and restore granularity at the table level
  2. One filegroup that holds read-only tables
  3. One filegroup for read/write tables
  4. One filegroup for text or image data (text, ntext, and image data types)

Ensure Msdb Accessibility

To make sure you have the information necessary to successfully (and easily) restore from the file and filegroup backup strategy I discuss in "Before Disaster Strikes," September 2002, you need to ensure the accessibility of msdb by backing it up often. However, by default, msdb's recovery model is set to Simple. This recovery model prevents quick, easy, and frequent log backups; the Simple recovery model is typically used for smaller databases or development databases where data loss isn't a problem. However, having access to msdb during recovery can simplify recovery. Regular backups of msdb and frequent log backups will ensure that the information within msdb is protected. To do this, you must change the recovery model. Unfortunately, even if you change the recovery model to Full, the SQL Server Agent resets the status back each time the SQL Server Agent starts up. To solve this problem, you can use the following series of steps:

  1. Create a job that resets msdb's recovery model on agent startup, then does a full database backup of msdb.
  2. Create a job that performs log backups every n minutes. I suggest something fairly frequent—for example, every 10 minutes.
  3. Copy the msdb backups (both the database and log backups) to an offsite location to ensure accessibility in case of site failure. The usage of msdb on another server will be limited because some tables within msdb rely on the servername. If msdb is being restored to another server solely for recovering backup history from sysbackuphistory, restore msdb to a database with a different name. If you need to restore and use msdb on another server, I recommend that you script out the jobs and rescript them in. However, the following query will help you update the msdb database to use the new server's name and will update all jobs on the new server:
UPDATE msdb.dbo.sysjobs
SET originating_server = 'NewServerName'
WHERE originating_server = 'OldServerName'

The Proper Authority

A primary condition for enabling SSL encryption is that your server and clients must have a digital certificate from a trusted root Certificate Authority (CA). The server and client certificates must be from the same CA. Most Windows systems have Microsoft Certificate Services installed on the PDC, but you can use a third-party provider such as VeriSign. For the example in this article, I used Microsoft Certificate Services to issue certificates for all clients inside the company.

You can choose from two major types of CA: an enterprise root CA or a standalone root CA. A standalone CA doesn't require Active Directory (AD) services, which makes it more popular; many small and mid-sized companies don't use AD. By default, a standalone CA collects certificate requests, then processes them. The CA administrator must approve or deny each request for a certificate. An enterprise CA processes each request immediately.

To request a certificate, SQL Server must run under a domain service account, not the default LocalSystem service account. To verify that you're using the proper account, right-click the name of the SQL Server instance in Enterprise Manager, choose Properties, and select the Security tab. Verify that the option This account is selected and that the proper domain account name is specified.

Do Consider a DSS

Indexing in a transactional environment is a balancing act. You try to enhance query response by creating indexes—but doing so decreases performance of inserts, updates, and deletes. If your querying compromises the database's transactional requirements, and if you decide that you can't resolve the two conflicting requirements, consider creating a separate decision support system (DSS). A DSS is essentially a database and a set of queries that produce reports. The DSS can be a simple copy of the transactional database enhanced by tables of summary data and views that make running reports simpler and quicker than using the transactional database. The DSS doesn't need to be a full-fledged data mart or data warehouse running on its own server. The DSS can share the same hardware platform that the transactional database is running on as long as the hardware is robust enough to support all the user connections.

Don't Over-Index Transactional Tables with Heavy I/O Activity

A transactional table is a table in an online transaction processing (OLTP) database. You can use transactional tables for reports or as input to programs that export data (e.g., in a SELECT ... FOR XML query). However, the OLTP database's primary purpose is capturing data, and if you over-index a transactional table, you might inhibit performance of INSERT, UPDATE, and DELETE operations. Each index is a database structure—a table that contains a column of values arranged in ascending order, with bookmarks that point to associated rows in the data pages. Therefore, when you add a record to a transactional table, an INSERT operation occurs not only on the data page, but also in all associated indexes. And when you delete a row from the data page, SQL Server removes the corresponding row from all associated indexes. Certainly you need to create some indexes on heavily used OLTP tables, but don't overdo it—an UPDATE operation involves not only the data page but also the number of index pages that matches the number of indexed columns that you need to modify. For more information about how indexing affects UPDATE operations, see Kalen Delaney, Inside SQL Server, "Indexing Updates," July 2002.

Ensure Runtime Security

Although a package's definition might be stored in SQL Server, at runtime, packages run externally from SQL Server through either the DTSRun or the DTSRunUI utility. When running through these client executables, DTS packages assume the security context of the user who initiates the package. This rule is important to remember because it also extends to packages that run as a scheduled job through the SQL Server Agent. Specifically, if a package's schedule was created by a user who belongs to the SQL Server sysadmin security role, that package assumes the security context of the SQL Server Agent service. A package whose schedule was created by a user who doesn't belong to the sysadmin role assumes the context of the user who created the schedule. Therefore, to avoid runtime security conflicts, you need to ensure that each user's role memberships—or more importantly, the security roles of the person or process executing and scheduling the DTS package—are consistent across your development, testing, and production environments.

More DTS Portability Tips

Thanks for Shane Dovers' great article about seven tips for building portable DTS packages ("DTS on the Move," June 2003). I wish I could have read this article 12 months ago. Another take on the topic could be, "How do I build in portability—and control the transfer from test to live data?" Our company has one development box and two live environments at opposite ends of the United Kingdom. We put a lot of work into importing data from flat files into various databases. We needed a test location for flat files and a test database, but when we sent the package to the live server, the locations needed to be live. We solved the problem by using Universal Data Link (UDL) data sources, which we always stored on the S:\ drive. By mapping the S:\ drive appropriately for each machine, we ensured that all sources and destinations were either live or test. (Unfortunately, this didn't work with the AS/400 queries, where the library name needed to be in the SQL explicitly.)

Another portability issue we uncovered was that when calling a child package, if the parent had been moved from test to live data, the references to the child package would still be pointing to the test child. We solved this problem by having the first task for each package delete the reference to the PackageID for each DTS Execute Package task.

Don't Auto-Grow Tempdb on Startup

Most DBAs know that growing a file dynamically is an expensive operation that degrades performance because write and read activity to the file must temporarily be blocked while the file is growing. However, when managing space for the tempdb database, many DBAs forget the performance impact of auto-growing files. SQL Server rebuilds tempdb each time the database system is stopped and restarted. Tempdb files might have auto-grown, which is the default configuration, since SQL Server was last started. SQL Server doesn't remember the size of the tempdb files when it rebuilds tempdb during a restart. Instead, SQL Server resets the files to the size they were manually configured to be.

Consider the following situation. Tempdb in SQL Server 2000 defaults to having one data file that has the logical name tempdev. The initial size of this file is 8MB. Assume that you manually increase the file size to 500MB by using the ALTER DATABASE command. The file subsequently grows to 550MB through auto-grow operations. SQL Server will reset the tempdev file to 500MB when it rebuilds tempdb. Now, assume that no one manually increased the file size from 8MB to 500MB. Instead, SQL Server auto-grew tempdb to 550MB. In this case, SQL Server resets tempdev back to 8MB when it's stopped and restarted.

The operations that grow a database file can diminish performance because reads and writes are blocked each time the file is grown. In addition, auto-growing the file might contribute to fragmentation on the physical disks. Also, the default growth increment for tempdb is 10 percent. So, for example, SQL Server would need to invoke many auto-grow operations to grow tempdev from 8MB to 550MB. That's bad. However, it's worse if the same cycle happens whenever SQL Server is restarted, assuming that 550MB is the average high-water mark that tempdb reaches during typical operations. Even worse, you might have a hard time troubleshooting why performance is sometimes slower immediately after a server is rebooted.

You can use the query that Listing 5 shows to find tempdb files that have auto-grown since SQL Server was last stopped and started. In general, manually setting tempdb files to a reasonable size will help you avoid significant auto-growth operations. Most importantly, if tempdb files do auto-grow, you need to determine whether the new file sizes are reasonable high-water marks. If they are, consider manually setting the file size to compensate. For example, if a file auto-grew from 8MB to 550MB, you could manually set the file size to 551MB. The file would then be rebuilt to a size of 551MB when SQL Server was stopped and restarted the next time.

An Ounce of Prevention

By design, you can prevent a user from dropping a table. First, only the table owner, a member of the db_ddladmin or the db_owner role, or a systems administrator (sa) has the appropriate rights to drop a table. Second, you can prevent a drop from being successful even if someone who has the permission to drop the table attempts a drop. If the table is being referenced by a foreign key or otherwise solid dependency chain, SQL Server won't allow the referenced object to be dropped until the reference (i.e., the dependency) is removed. (The dependencies force the dependent objects to stay intact with no modifications or removal.) However, not every object has a foreign key dependency on it. When an object has no foreign key dependency on it, you can use schemabound views to create a dependency chain for the sole purpose of preventing the accidental table drop. For a description of how to use schemabinding to prevent accidental table drops, see "T-SQL Tutor Quick Tip."

More Mixed-Mode Authentication Tips

After reading Geoff Craig's "Best Practices for Mixed Authentication" (August 2003), I wanted to share a few tips of my own. First, administrators often choose mixed-mode authentication instead of Windows authentication for Web environments because many properly secured environments don't have domain relationships between their neutral zone and the databases that reside inside the firewalls. In regard to the article's section about brute-force attacks against the sa password, you can counter such attacks by using a simple procedure that tracks failed user logins. We've set up alerts to page a DBA if the procedure detects more than three failed login attempts.

The article warns about xp_cmdshell being an open window to the OS, but sp_OACreate and other sp_OA* stored procedures also let you create multiple threaded ActiveX objects to do whatever you want on the local system or remote systems. Simply put, you should disable all xp_ and sp_OA* procedures or remove them when possible.

Craig's article focuses on securing the sa password, but you also have to secure your environment correctly. A user account that has access to xp_cmdshell or any other procedure can do a lot of damage, as can users allowed to schedule jobs within SQL Server. One major hole to guard against: scheduling a user-created DTS package in SQL Server Agent; 90 percent of the time, the DBA will create the account under sa or the account he or she is logged in to under NT authentication and think nothing of it. But you need to check each package before you create and schedule it to ensure it won't cause problems within your environment. Securing your database is more than just securing the database server. You need to open up conversations with your developers and talk about these various security issues. For example, many applications are poorly written and vulnerable to hacking from within the Web browser. It's an ongoing education process.

Use CREATE INDEX's DROP_EXISTING Clause When Recreating a Clustered Index

The internal structure of a nonclustered index in SQL Server varies based on whether the table also has a clustered index. Tables that have a clustered index are clustered tables, and tables that don't have a clustered index are heap tables. Nonclustered indexes based on a heap table contain a row ID (RID) that lets SQL Server find a data page from the leaf level of the nonclustered index. This process generally improves performance because it dramatically reduces the overhead associated with maintaining a nonclustered index when you update data. (For more information about nonclustered indexes, see Kalen Delaney's articles "Time for a Tune-Up," August 2001, and "The Big Cover-Up," September 2001).

However, because the nonclustered index contains the clustering key, if you drop a clustered index on a table that has nonclustered indexes, SQL Server must also drop the nonclustered indexes and recreate them by using a RID rather than the clustering key as a record locator. SQL Server must drop and recreate the nonclustered index again if you add back a clustered index because the nonclustered index will need to shift from using a RID back to using the clustering key. This dropping-and-recreating process can take a tremendous amount of time and I/O and CPU resources when you're dealing with large tables.

Changing your clustered indexes by using the CREATE INDEX statement's DROP_EXISTING clause is faster. The DROP_EXISTING clause tells SQL Server that the existing clustered index is being dropped but that a new one will be added in its place, letting SQL Server defer updating the nonclustered index until the new clustered index is in place. (Note that you can use DBCC DBREINDEX to rebuild existing indexes because it won't cause SQL Server to rebuild a nonclustered index. But you can't use DBCC DBREINDEX to change the columns in an existing index.) With DROP_EXISTING, you save one complete cycle of dropping and recreating nonclustered indexes. Additionally, SQL Server won't rebuild the nonclustered index at all if the clustered index key doesn't change and is defined as UNIQUE, which isn't an obvious performance benefit of defining a clustered index as UNIQUE. Using the DROP_EXISTING clause can be a huge time-saver when you need to change the clustered index on a table that also has nonclustered indexes.

Configuring Oracle Linked Servers

Many organizations need to connect their SQL Server bases to Oracle databases. One of the best ways to do that is using SQL Server's linked servers. To connect a SQL Server linked server to Oracle, you first need to install the Oracle client software on your SQL Server system. Then you can create a linked server either by using Enterprise Manager or by executing T-SQL statements. I prefer using T-SQL because it documents the exact values that SQL Server uses in the creation of the linked server, so you can more easily create similar objects later by copying and changing your T-SQL linked-server creation script.

The following code is an example of creating a linked server to the Oracle tecao system that I defined by using the Oracle Net Configuration Assistant:EXEC sp_addlinkedserver 'TecaOracle', 'Oracle', 'MSDAORA', 'tecao'This example code tells SQL Server to refer to the linked server as TecaOracle. The second parameter is the OLE DB product name; you can leave this value as NULL or change it to any other value. In this instance, I used the word Oracle to describe the linked server. The third parameter specifies the OLE DB provider that the linked server will use. MSDAORA is the name of the Microsoft OLE DB Provider for Oracle. This provider installs by default with SQL Server 2000. The final required parameter of the sp_addlinkedserver stored procedure is the Oracle Net Service name. I created this example's Net Service Name tecao earlier with the Oracle Net Configuration Assistant.

Don't Use Printed Reports as Database Templates

One of the most common errors novice data modelers make is to use a printed report as a template for a table by transferring the fields from a printed report into a table or entity. Perhaps the most obvious example is the invoice—a printed report that you send to customers after you've sold them a service or product. I've seen entities and tables named Invoice that contain many different arrangements of attributes that really belong in three different tables: Sale, Product, and SaleItem.

A printed report is both a business requirement for and one outcome of your database design. To produce a report, operationally you need to store the attributes that compose the report in the database. But a printed report is rarely a template for a table layout.

Use a CASE Tool

A CASE tool is to a data modeler what a versioning tool (e.g., Visual SourceSafe) is to a programmer: an invaluable aid to getting the job done. CASE (Computer-Assisted Software Engineering) is more than a way to stay organized. A good CASE tool lets you:

  • quickly create different types of data models (functional decompositions models, system diagrams, context diagrams, business process models, both data-flow and process-flow, entity-relationship models)
  • visualize the data through the use of the models
  • share your models with others on development and management teams (requires a shared directory or metadata repository)
  • enable team design projects (requires support for sub-models and a check-out/check-in/synchronize process)
  • document data and processes
  • document change in the data and processes
  • rapidly develop new database structures
  • rapidly integrate different database management system (DBMS) platforms

Most people are put off by the high cost of entry into the CASE software market. The initial user license can be costly (from a few hundred dollars for low-end products to thousands of dollars for high-end packages). And learning to use a CASE software package can mean an investment of days or weeks. The payback is in time saved, enhanced knowledge of corporate data, and reduced time-to-market on new database projects.

In considering possible CASE tools, remember one thing: No CASE tool will ever replace a person who understands how the business works. The CASE tool can only assist that person in creating visual representations of the business process.

Make a Quick Database Copy

You've implemented your database design in your development environment, and now you want to move it to an environment where you can share it with others. You can always back up and restore, but my personal favorite technique for moving databases from one environment to another is to detach them, then reattach at the new location.

Sp_detach_db is quick and easy to use. The following code detaches the pubs database and updates the statistics before the database is detached from your originating server.

USE master
GO
EXEC sp_detach_db 'pubs'

You can then use any method of file transfer (including Windows drag-and-drop for small databases) to relocate the .mdf, .ndf, and .ldf files to the target server.

To attach the pubs database to the target server, you have to know the full path to the individual database files, and you have to specify each file individually, or the attach process will fail.

EXEC sp_attach_db 'pubs',
c:\Program Files\Microsoft SQL Server\MSSQL\Data\pubs.mdf',
c:\Program Files\Microsoft SQL Server\MSSQL\Data\pubs_log.ldf'

Before you run sp_attach_db, you can use the sp_helpfile and sp_helpfilegroup stored procedures to confirm the location of your database files.

If you've used SQL Server filegroups in the database, and you have to attach more than just the .mdf and .ldf files, write the sp_attach_db command with the .mdf file first, followed by the .ndf files:

EXEC sp_attach_db 'ColoProd2001',
'h:\SQL7data\TY_data_files\ColoProd2001_data1.mdf', 'h:\SQL7data\TY_data_files\ColoProd2001_land.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_sale.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_data2.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_valu.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_prop.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_bldg.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_log.ldf'

Finally, if you want to rename your database (in your development environment, you can do that), detach the database, then reattach it with the new name:

USE master
GO
EXEC sp_detach_db 'pubs'
GO
EXEC sp_attach_db 'MyTestPubs',
c:\Program Files\Microsoft SQL Server\MSSQL\Data\pubs.mdf',
c:\Program Files\Microsoft SQL Server\MSSQL\Data\pubs_log.ldf'

Checking Results, T-SQL Style

Sometimes when I'm trying to tune the performance of a production query, I need a way to check the results of two queries to verify that they match. However, I've never taken the time because the resultsets have been simple enough to validate visually in Query Analyzer. On my last consulting assignment, however, I encountered complicated queries with aggregate functions that combined multiple tables and views that used left and right outer joins, and frequently, some—but not all—of the columns matched. I had to write a stored procedure that would compare not only row to row but column to column. Listing 6 shows the first part of the resulting spCompareTables procedure.

First, the code assembles the column list. The stored procedure expects two SELECT queries as parameters. The two dynamically created views let the procedure use the INFORMATION_SCHEMA.COLUMNS table to insert the list of columns, along with their data types, into a temporary table. Here's an example call to spCompareTables:

EXEC spCompareTables
‘SELECT TOP 100 PERCENT keycol, col1, col2 FROM Table1 ORDER BY keycol',
‘SELECT TOP 100 PERCENT keycol, col3, col4 FROM Table2 ORDER BY keycol'

The stored procedure will accept any query, including one containing ORDER BY, but the syntax rules for views force some modifications. This example shows the trick of using the TOP clause to force SQL Server to accept an ORDER BY clause in a view. The primary purpose of having the sorted resultset is to make sure that both resultsets have the same sequence of rows.

The procedure compares the two resultsets by building a cursor for each query and iterating through the rows. To make simultaneously iterating through two cursors easier, the procedure builds and dynamically executes a temporary stored procedure. The syntax for cursors requires that each column value be loaded into a variable; Listing 7 shows the next step of creating the variable declarations for the temporary stored procedure. The column definitions are in the full listing, which you can download from http://www.sqlmag.com.

Listing 8 shows the last part of the procedure, which builds the code that iterates through the cursors. You could modify this code to iterate through the rows in the second resultset until it finds a match for the current row in the first resultset. This change would let the procedure compare resultsets with an unequal number of rows. When you're trying to identify the differences between two queries, it can be helpful to know when two queries have identical values in the rows they share.

SpCompareTables is a useful tool for proving that two queries return the same results, and because it executes on the server, it runs quickly even for large resultsets. When you're looking for alternatives to a long-running query, knowing that the results are identical will help you determine whether a better-performing query is a good replacement.

Watch Out for Multivalued Attributes!

Failing to recognize and resolve a multivalued attribute is one of the most common mistakes data modelers make. A multivalued attribute contains a list or a collection of values for each instance of that attribute. Typically, neophyte designers treat these lists or collections as separate attributes of an entity or separate columns in a table, as the SERVER table (entity) in Figure 3 shows. This flattened design might initially be easy to program against (it's especially compatible with the detail report that displays the general system configuration), but this design is terrible to work with for summary data and is counterproductive when you manage or update data in the database. A multivalued attribute needs to become a child table in which each of the multiple values becomes a row in the child table and each row of the child table contains a foreign key that links to the parent table's primary key. In Figure 4, the data model for the transition from multivalued attributes to child tables, I broke out the multivalued attributes from Figure 3 into the DISK_DEVICES and INSTALLED_COMPONENT child tables.

More Mixed-Mode Authentication Tips

After reading Geoff Craig's "Best Practices for Mixed Authentication" (August 2003), I wanted to share a few tips of my own. First, administrators often choose mixed-mode authentication instead of Windows authentication for Web environments because many properly secured environments don't have domain relationships between their neutral zone and the databases that reside inside the firewalls. In regard to the article's section about brute-force attacks against the sa password, you can counter such attacks by using a simple procedure that tracks failed user logins. We've set up alerts to page a DBA if the procedure detects more than three failed login attempts.

The article warns about xp_cmdshell being an open window to the OS, but sp_OACreate and other sp_OA* stored procedures also let you create multiple threaded ActiveX objects to do whatever you want on the local system or remote systems. Simply put, you should disable all xp_ and sp_OA* procedures or remove them when possible.

Craig's article focuses on securing the sa password, but you also have to secure your environment correctly. A user account that has access to xp_cmdshell or any other procedure can do a lot of damage, as can users allowed to schedule jobs within SQL Server. One major hole to guard against: scheduling a user-created DTS package in SQL Server Agent; 90 percent of the time, the DBA will create the account under sa or the account he or she is logged in to under NT authentication and think nothing of it. But you need to check each package before you create and schedule it to ensure it won't cause problems within your environment. Securing your database is more than just securing the database server. You need to open up conversations with your developers and talk about these various security issues. For example, many applications are poorly written and vulnerable to hacking from within the Web browser. It's an ongoing education process.

Use CREATE INDEX's DROP_EXISTING Clause When Recreating a Clustered Index

The internal structure of a nonclustered index in SQL Server varies based on whether the table also has a clustered index. Tables that have a clustered index are clustered tables, and tables that don't have a clustered index are heap tables. Nonclustered indexes based on a heap table contain a row ID (RID) that lets SQL Server find a data page from the leaf level of the nonclustered index. This process generally improves performance because it dramatically reduces the overhead associated with maintaining a nonclustered index when you update data. (For more information about nonclustered indexes, see Kalen Delaney's articles "Time for a Tune-Up," August 2001, and "The Big Cover-Up," September 2001).

However, because the nonclustered index contains the clustering key, if you drop a clustered index on a table that has nonclustered indexes, SQL Server must also drop the nonclustered indexes and recreate them by using a RID rather than the clustering key as a record locator. SQL Server must drop and recreate the nonclustered index again if you add back a clustered index because the nonclustered index will need to shift from using a RID back to using the clustering key. This dropping-and-recreating process can take a tremendous amount of time and I/O and CPU resources when you're dealing with large tables.

Changing your clustered indexes by using the CREATE INDEX statement's DROP_EXISTING clause is faster. The DROP_EXISTING clause tells SQL Server that the existing clustered index is being dropped but that a new one will be added in its place, letting SQL Server defer updating the nonclustered index until the new clustered index is in place. (Note that you can use DBCC DBREINDEX to rebuild existing indexes because it won't cause SQL Server to rebuild a nonclustered index. But you can't use DBCC DBREINDEX to change the columns in an existing index.) With DROP_EXISTING, you save one complete cycle of dropping and recreating nonclustered indexes. Additionally, SQL Server won't rebuild the nonclustered index at all if the clustered index key doesn't change and is defined as UNIQUE, which isn't an obvious performance benefit of defining a clustered index as UNIQUE. Using the DROP_EXISTING clause can be a huge time-saver when you need to change the clustered index on a table that also has nonclustered indexes.

Configuring Oracle Linked Servers

Many organizations need to connect their SQL Server bases to Oracle databases. One of the best ways to do that is using SQL Server's linked servers. To connect a SQL Server linked server to Oracle, you first need to install the Oracle client software on your SQL Server system. Then you can create a linked server either by using Enterprise Manager or by executing T-SQL statements. I prefer using T-SQL because it documents the exact values that SQL Server uses in the creation of the linked server, so you can more easily create similar objects later by copying and changing your T-SQL linked-server creation script.

The following code is an example of creating a linked server to the Oracle tecao system that I defined by using the Oracle Net Configuration Assistant:EXEC sp_addlinkedserver 'TecaOracle', 'Oracle', 'MSDAORA', 'tecao'This example code tells SQL Server to refer to the linked server as TecaOracle. The second parameter is the OLE DB product name; you can leave this value as NULL or change it to any other value. In this instance, I used the word Oracle to describe the linked server. The third parameter specifies the OLE DB provider that the linked server will use. MSDAORA is the name of the Microsoft OLE DB Provider for Oracle. This provider installs by default with SQL Server 2000. The final required parameter of the sp_addlinkedserver stored procedure is the Oracle Net Service name. I created this example's Net Service Name tecao earlier with the Oracle Net Configuration Assistant.

Don't Use Printed Reports as Database Templates

One of the most common errors novice data modelers make is to use a printed report as a template for a table by transferring the fields from a printed report into a table or entity. Perhaps the most obvious example is the invoice—a printed report that you send to customers after you've sold them a service or product. I've seen entities and tables named Invoice that contain many different arrangements of attributes that really belong in three different tables: Sale, Product, and SaleItem.

A printed report is both a business requirement for and one outcome of your database design. To produce a report, operationally you need to store the attributes that compose the report in the database. But a printed report is rarely a template for a table layout.

Use a CASE Tool

A CASE tool is to a data modeler what a versioning tool (e.g., Visual SourceSafe) is to a programmer: an invaluable aid to getting the job done. CASE (Computer-Assisted Software Engineering) is more than a way to stay organized. A good CASE tool lets you:

  • quickly create different types of data models (functional decompositions models, system diagrams, context diagrams, business process models, both data-flow and process-flow, entity-relationship models)
  • visualize the data through the use of the models
  • share your models with others on development and management teams (requires a shared directory or metadata repository)
  • enable team design projects (requires support for sub-models and a check-out/check-in/synchronize process)
  • document data and processes
  • document change in the data and processes
  • rapidly develop new database structures
  • rapidly integrate different database management system (DBMS) platforms

Most people are put off by the high cost of entry into the CASE software market. The initial user license can be costly (from a few hundred dollars for low-end products to thousands of dollars for high-end packages). And learning to use a CASE software package can mean an investment of days or weeks. The payback is in time saved, enhanced knowledge of corporate data, and reduced time-to-market on new database projects.

In considering possible CASE tools, remember one thing: No CASE tool will ever replace a person who understands how the business works. The CASE tool can only assist that person in creating visual representations of the business process.

Make a Quick Database Copy

You've implemented your database design in your development environment, and now you want to move it to an environment where you can share it with others. You can always back up and restore, but my personal favorite technique for moving databases from one environment to another is to detach them, then reattach at the new location.

Sp_detach_db is quick and easy to use. The following code detaches the pubs database and updates the statistics before the database is detached from your originating server.

USE master
GO
EXEC sp_detach_db 'pubs'

You can then use any method of file transfer (including Windows drag-and-drop for small databases) to relocate the .mdf, .ndf, and .ldf files to the target server.

To attach the pubs database to the target server, you have to know the full path to the individual database files, and you have to specify each file individually, or the attach process will fail.

EXEC sp_attach_db 'pubs',
c:\Program Files\Microsoft SQL Server\MSSQL\Data\pubs.mdf',
c:\Program Files\Microsoft SQL Server\MSSQL\Data\pubs_log.ldf'

Before you run sp_attach_db, you can use the sp_helpfile and sp_helpfilegroup stored procedures to confirm the location of your database files.

If you've used SQL Server filegroups in the database, and you have to attach more than just the .mdf and .ldf files, write the sp_attach_db command with the .mdf file first, followed by the .ndf files:

EXEC sp_attach_db 'ColoProd2001',
'h:\SQL7data\TY_data_files\ColoProd2001_data1.mdf', 'h:\SQL7data\TY_data_files\ColoProd2001_land.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_sale.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_data2.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_valu.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_prop.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_bldg.ndf',
'h:\SQL7data\TY_data_files\ColoProd2001_log.ldf'

Finally, if you want to rename your database (in your development environment, you can do that), detach the database, then reattach it with the new name:

USE master
GO
EXEC sp_detach_db 'pubs'
GO
EXEC sp_attach_db 'MyTestPubs',
c:\Program Files\Microsoft SQL Server\MSSQL\Data\pubs.mdf',
c:\Program Files\Microsoft SQL Server\MSSQL\Data\pubs_log.ldf'

Checking Results, T-SQL Style

Sometimes when I'm trying to tune the performance of a production query, I need a way to check the results of two queries to verify that they match. However, I've never taken the time because the resultsets have been simple enough to validate visually in Query Analyzer. On my last consulting assignment, however, I encountered complicated queries with aggregate functions that combined multiple tables and views that used left and right outer joins, and frequently, some—but not all—of the columns matched. I had to write a stored procedure that would compare not only row to row but column to column. Listing 6 shows the first part of the resulting spCompareTables procedure.

First, the code assembles the column list. The stored procedure expects two SELECT queries as parameters. The two dynamically created views let the procedure use the INFORMATION_SCHEMA.COLUMNS table to insert the list of columns, along with their data types, into a temporary table. Here's an example call to spCompareTables:

EXEC spCompareTables
‘SELECT TOP 100 PERCENT keycol, col1, col2 FROM Table1 ORDER BY keycol',
‘SELECT TOP 100 PERCENT keycol, col3, col4 FROM Table2 ORDER BY keycol'

The stored procedure will accept any query, including one containing ORDER BY, but the syntax rules for views force some modifications. This example shows the trick of using the TOP clause to force SQL Server to accept an ORDER BY clause in a view. The primary purpose of having the sorted resultset is to make sure that both resultsets have the same sequence of rows.

The procedure compares the two resultsets by building a cursor for each query and iterating through the rows. To make simultaneously iterating through two cursors easier, the procedure builds and dynamically executes a temporary stored procedure. The syntax for cursors requires that each column value be loaded into a variable; Listing 7 shows the next step of creating the variable declarations for the temporary stored procedure. The column definitions are in the full listing.

Listing 8 shows the last part of the procedure, which builds the code that iterates through the cursors. You could modify this code to iterate through the rows in the second resultset until it finds a match for the current row in the first resultset. This change would let the procedure compare resultsets with an unequal number of rows. When you're trying to identify the differences between two queries, it can be helpful to know when two queries have identical values in the rows they share.

SpCompareTables is a useful tool for proving that two queries return the same results, and because it executes on the server, it runs quickly even for large resultsets. When you're looking for alternatives to a long-running query, knowing that the results are identical will help you determine whether a better-performing query is a good replacement.

Watch Out for Multivalued Attributes!

Failing to recognize and resolve a multivalued attribute is one of the most common mistakes data modelers make. A multivalued attribute contains a list or a collection of values for each instance of that attribute. Typically, neophyte designers treat these lists or collections as separate attributes of an entity or separate columns in a table, as the SERVER table (entity) in Figure 3 shows. This flattened design might initially be easy to program against (it's especially compatible with the detail report that displays the general system configuration), but this design is terrible to work with for summary data and is counterproductive when you manage or update data in the database. A multivalued attribute needs to become a child table in which each of the multiple values becomes a row in the child table and each row of the child table contains a foreign key that links to the parent table's primary key. In Figure 4, the data model for the transition from multivalued attributes to child tables, I broke out the multivalued attributes from Figure 3 into the DISK_DEVICES and INSTALLED_COMPONENT child tables.

More Mixed-Mode Authentication Tips

after reading Geoff Craig's "Best Practices for Mixed Authentication" (August 2003), I wanted to share a few tips of my own. First, administrators often choose mixed-mode authentication instead of Windows authentication for Web environments because many properly secured environments don't have domain relationships between their neutral zone and the databases that reside inside the firewalls. In regard to the article's section about brute-force attacks against the sa password, you can counter such attacks by using a simple procedure that tracks failed user logins. We've set up alerts to page a DBA if the procedure detects more than three failed login attempts.

The article warns about xp_cmdshell being an open window to the OS, but sp_OACreate and other sp_OA* stored procedures also let you create multiple threaded ActiveX objects to do whatever you want on the local system or remote systems. Simply put, you should disable all xp_ and sp_OA* procedures or remove them when possible.

Craig's article focuses on securing the sa password, but you also have to secure your environment correctly. A user account that has access to xp_cmdshell or any other procedure can do a lot of damage, as can users allowed to schedule jobs within SQL Server. One major hole to guard against: scheduling a user-created DTS package in SQL Server Agent; 90 percent of the time, the DBA will create the account under sa or the account he or she is logged in to under NT authentication and think nothing of it. But you need to check each package before you create and schedule it to ensure it won't cause problems within your environment. Securing your database is more than just securing the database server. You need to open up conversations with your developers and talk about these various security issues. For example, many applications are poorly written and vulnerable to hacking from within the Web browser. It's an ongoing education process.

Use CREATE INDEX's DROP_EXISTING Clause When Recreating a Clustered Index

The internal structure of a nonclustered index in SQL Server varies based on whether the table also has a clustered index. Tables that have a clustered index are clustered tables, and tables that don't have a clustered index are heap tables. Nonclustered indexes based on a heap table contain a row ID (RID) that lets SQL Server find a data page from the leaf level of the nonclustered index. This process generally improves performance because it dramatically reduces the overhead associated with maintaining a nonclustered index when you update data. (For more information about nonclustered indexes, see Kalen Delaney's articles "Time for a Tune-Up," August 2001 and "The Big Cover-Up," September 2001).

However, because the nonclustered index contains the clustering key, if you drop a clustered index on a table that has nonclustered indexes, SQL Server must also drop the nonclustered indexes and recreate them by using a RID rather than the clustering key as a record locator. SQL Server must drop and recreate the nonclustered index again if you add back a clustered index because the nonclustered index will need to shift from using a RID back to using the clustering key. This dropping-and-recreating process can take a tremendous amount of time and I/O and CPU resources when you're dealing with large tables.

Changing your clustered indexes by using the CREATE INDEX statement's DROP_EXISTING clause is faster. The DROP_EXISTING clause tells SQL Server that the existing clustered index is being dropped but that a new one will be added in its place, letting SQL Server defer updating the nonclustered index until the new clustered index is in place. (Note that you can use DBCC DBREINDEX to rebuild existing indexes because it won't cause SQL Server to rebuild a nonclustered index. But you can't use DBCC DBREINDEX to change the columns in an existing index.) With DROP_EXISTING, you save one complete cycle of dropping and recreating nonclustered indexes. Additionally, SQL Server won't rebuild the nonclustered index at all if the clustered index key doesn't change and is defined as UNIQUE, which isn't an obvious performance benefit of defining a clustered index as UNIQUE. Using the DROP_EXISTING clause can be a huge time-saver when you need to change the clustered index on a table that also has nonclustered indexes.

Configuring Oracle Linked Servers

Many organizations need to connect their SQL Server bases to Oracle databases. One of the best ways to do that is using SQL Server's linked servers. To connect a SQL Server linked server to Oracle, you first need to install the Oracle client software on your SQL Server system. Then you can create a linked server either by using Enterprise Manager or by executing T-SQL statements. I prefer using T-SQL because it documents the exact values that SQL Server uses in the creation of the linked server, so you can more easily create similar objects later by copying and changing your T-SQL linked-server creation script.

The following code is an example of creating a linked server to the Oracle tecao system that I defined by using the Oracle Net Configuration Assistant:EXEC sp_addlinkedserver 'TecaOracle', 'Oracle', 'MSDAORA', 'tecao'This example code tells SQL Server to refer to the linked server as TecaOracle. The second parameter is the OLE DB product name; you can leave this value as NULL or change it to any other value. In this instance, I used the word Oracle to describe the linked server. The third parameter specifies the OLE DB provider that the linked server will use. MSDAORA is the name of the Microsoft OLE DB Provider for Oracle. This provider installs by default with SQL Server 2000. The final required parameter of the sp_addlinkedserver stored procedure is the Oracle Net Service name. I created this example's Net Service Name tecao earlier with the Oracle Net Configuration Assistant.

Don't Use Printed Reports as Database Templates

One of the most common errors novice data modelers make is to use a printed report as a template for a table by transferring the fields from a printed report into a table or entity. Perhaps the most obvious example is the invoice—a printed report that you send to customers after you've sold them a service or product. I've seen entities and tables named Invoice that contain many different arrangements of attributes that really belong in three different tables: Sale, Product, and SaleItem.

A printed report is both a business requirement for and one outcome of your database design. To produce a report, operationally you need to store the attributes that compose the report in the database. But a printed report is rarely a template for a table layout.