Don't Index Wide Columns
If a table's columns are wide (i.e., the columns have a data type of char(25) or larger), indexes created on those columns might result in a large number of pages, thereby negating the benefit of using an index in the first place. Nonclustered indexes in SQL Server 2000 and 7.0 are structured differently from those in earlier releases. If you're not careful, you can create indexes that take more space on disk than the original data pages. (For more information about changes in index structure and the possible detrimental effect on your database.
Related: Move Up to SQL Server 2000
Help the Optimizer Help You
The query optimizer can't do its job well without the right tools and an environment that lets it choose from a broad range of possibilities. Here are a few guidelines to help you supply the right tools:
- Create indexes on frequently joined columns. Clustered indexes enable merge joins and fast lookups with nested-loop joins.
- Create covering indexes on combinations of frequently fetched columns. These indexes satisfy queries from the index's leaf level, and you can use them as input for merge joins if the first index column is also the join column.
- Create indexes with care, especially covering indexes. Indexes improve performance of SELECT queries but degrade performance of modification operations.
- Avoid performing joins on columns with few unique values. These joins can be more costly than table scans, so the query processor might decide not to use them.
- Separate tables that participate in joins onto different disks by using filegroups to exploit parallel disk I/O.
Constraints vs. Triggers
How do you choose between a constraint and a trigger? If a constraint will do the job, such as enforcing referential integrity, use it. If you need more complex logic, use a trigger. Understanding the basic differences can help you choose between the two.
First, constraints are proactive: They prevent unwanted actions from happening. Triggers are reactive: They undo the damage. A constraint tells you not to delete a customer with outstanding invoices. A trigger rolls back the DELETE operation if you try to delete a customer with outstanding invoices. Constraints apply to the current table. Triggers can reference other tables, even in another database or server. You need to program the action you want a trigger to take. With constraints, SQL Server defines the actions. Also, triggers allow custom error messages. A constraint just returns a cryptic message about a constraint violation of the foreign key relationship, which means something to you, but not to most database users. You'd have to handle the returned error codes and translate them to a friendly message from the client application to the user.
You can quickly determine which tables in a database don't have an index by using the query that Listing 1 shows. This query returns a result set containing the names of tables that don't have an index and uses the rows column in sysindexes to return an estimate of the number of rows in those tables. Note that the only way to find the exact number of rows in a table is to execute a SELECT COUNT (*) against the table.
Queries that use the OBJECTPROPERTY column can provide a variety of helpful information about your SQL Server databases. For example, most tables should have a carefully chosen clustered index. In the above example, you can replace the TableHasIndex property check with TableHasClusteredIndex to quickly generate a list of tables that don't have a clustered index.
Profiles in T-SQL
To list which files belong to which filegroups for a particular database, you can run sp_helpfile, which shows information about the files in a database. But if you want to use the T-SQL SELECT queries for a custom application, look at the T-SQL code in the sp_helpfile procedure. You'll be amazed at how much advanced T-SQL you can learn by reading system procedures.
Another solution to your problem is to let SQL Server Profiler find the answer for you. When you want a particular piece of information from Enterprise Manager but don't know a T-SQL query that will display the information, simply run Profiler while you perform your task in Enterprise Manager. Profiler will display the T-SQL queries that Enterprise Manager generates to provide the information. Enterprise Manager issues T-SQL queries to get information about the filegroup that a file belongs to, then displays that information in the Properties dialog box for the database you're working in.
The Best (Indexed) View
To get the most out of indexed views,
- analyze your workload so that you're familiar with the types of queries your users run
- enable the proper SET options before creating any tables used in the view definition
- use the OBJECTPROPERTY function's IsIndexable option to make sure you can index the view
- make sure the first index on the view is unique and clustered (other indexes don't have to be unique or clustered)
And remember that you don't need to change your applications or queries to take advantage of any indexed views you build. The optimizer determines whether an indexed view is available and useful, and your applications need never change.
After the Upgrade
You need to attend to a few housekeeping tasks after upgrading to SQL Server 2000. First, refresh the statistics to make sure that the query optimizer has the best possible information for its decision-making. SQL Server Books Online BOL) warns that using SQL Server 7.0 statistics with the SQL Server 2000 optimizer can result in poor performance. You can use the sp_updatestats stored procedure to refresh the statistics.
If you're using full-text indexing, be aware that the upgrade process marks the databases as full-text disabled because the format of the full-text indexes has changed. The upgrade process could have made the change automatically, but updating full-text indexes can be time-consuming. Rather than make the upgrade process excessively long, the programmers opted to let you rebuild your indexes after the database upgrade is complete.
Don't Be Afraid of Supertypes and Subtypes
Data modelers use supertype and subtype entities to distinguish the different categories of an entity, such as the PERSON entity and its subtypes EMPLOYEE and AUTHOR, as Figure 1 shows. When modeling, you need to demonstrate the differences between an employee and an author, and you need to show which attributes modify each of the three entities (PERSON, EMPLOYEE, AUTHOR). The supertype/subtype paradigm forces you to identify the attributes and relationships that interact with both the supertype and the subtypes. The common attributes—which in Figure 1 are PersonID, FirstName, MInit, LastName, Address, Phone, and Email—modify the supertype entity PERSON. You need to record these attribute values for all people, both employees and authors. For each subtype entity, you have to identify the attributes and relationships that are specific to that subtype. In Figure 1, an employee is involved in publication activities and has specific skill sets. The author writes articles and receives royalties.
Some relationships involve only the supertype, rather than the subtypes. For example, the PERSON entity is involved in the relationship PERSON_PUBLISHER. The implication is that all people—employees and authors—work for a publisher.
If you need to represent categories of an entity in your design but fail to use the supertype/subtype structure to help analyze the data requirements for your database, you risk incomplete understanding of your data. You also run the risk of building modification anomalies into your database. For instance, in Figure 1, a person can be both an employee of a publishing house and an author for that firm. If you didn't use the supertype/subtype structure in this design (i.e., if you included only an EMPLOYEE entity and an AUTHOR entity), data about that dual-role person would be stored in both the EMPLOYEE and AUTHOR tables. This duplication would lead to non-key data redundancy within the database, which can result in insert, update, and deletion anomalies that cause unsynchronized data and loss of data integrity.
Index Sorting, Grouping, and Aggregating Columns
Consider indexing columns that you use to order by and those that you use in a grouping expression. You might benefit from indexing the columns that the MIN(), MAX(), COUNT(), SUM(), and AVG() functions use to aggregate the data. When you use the MIN() and MAX() functions, SQL Server does a simple lookup for the minimum and maximum values in the column, respectively. If an index's data values are arranged in ascending order, SQL Server can read the index to quickly determine the correct values of MIN() or MAX(). The range-of-values query incorporates a filter or constraint (expressed in the SELECT query's WHERE clause or HAVING clause) to limit the rows that the query returns. Similarly, when you have an index, you can optimize data sorting (by using the ORDER BY clause) and data grouping (by using the GROUP BY clause), especially if the table or tables you're querying contain many rows.
Rules for Consistent Naming
Using inconsistent naming conventions (or using no naming convention at all) for database objects can lead to confusion and errors in data retrieval. Here are some rules for creating useful names for database objects:
- Use descriptive names that are meaningful to the entire organization; don't incorporate jargon that's familiar to only a small section of the company.
- Use a name that clearly identifies the database object. Consider using a variation of the notation scheme that Microsoft introduced in Access 1.1, in which you prefix objects with three-letter identifiers (e.g., you designate the Employee table as tblEmployee).
- Use the minimum number of words necessary to convey the meaning of the database object. In SQL Server, you're limited to 32 bytes for any object name—which is still too long to repeat in your SQL Server queries and stored procedures.
- Don't confuse the meaning of the name by adding redundant terms to it (e.g., tblRedundantTable).
- Don't use acronyms, and use abbreviations judiciously. You can't count on the availability of a business metadata repository to document and decode the meaning of acronyms and abbreviations.
- Don't use names that implicitly or explicitly identify more than one topic (for tables) or characteristic (for columns).
- Use the singular form of a name wherever possible, especially for entities and tables; it helps you correctly distinguish the entity relationships (1:1, 1:M, M:N).
- Don't include spaces in database object names (such as Employee ID). No other database management system (DBMS) supports these spaces, and you might have to integrate another DBMS with your SQL Server environment someday.
As you choose a naming convention, keep in mind that no convention is perfect and that any naming convention is better than no convention. Pick one convention and use it.
Setting Up File Locations
I follow three basic rules when I set up the physical file locations for a database:
- Keep the log and the data on separate physical drives.
- Keep the data and the master database on separate physical drives.
- Mirror the log.
These simple rules will help improve performance and form the foundation for no-data-loss recoveries.
File and Filegroup Backup and Restore Tips
Here are a few details to remember when you're backing up and restoring files and filegroups:
- You can back up individual files and filegroups only when your database is using the Full or Bulk_Logged recovery model, because you must apply log backups after you restore a file or filegroup; you can't make log backups of a database that's using the Simple model.
- Unlike full database and differential backups, a backup of a file or filegroup doesn't back up any portion of the transaction log. So, none of the transaction log is in a file or filegroup restore, and you still need to restore the log.
- You can restore individual file or filegroup backups from a full database backup.
- Immediately before restoring an individual file or filegroup, you must back up the transaction log. You must have an unbroken chain of log backups from the time the file or filegroup backup was made until the time the files or filegroups are restored.
- After restoring a file or filegroup backup, you must restore all the transaction logs made between the time you backed up the file or filegroup and the time you restored it. Restoring these interim transaction logs guarantees that the restored files are synchronized with the rest of the database.
Check Database Consistency
Ensuring database consistency and recoverability comes first. To guarantee that you have this job covered, develop a checklist of steps that brings your database system in line with this goal—and make sure you can check off every step. The first thing on the list is to plan your system's on-disk subsystem and on-disk structures with recoverability in mind. Except perhaps for the smallest databases, you should have overridden SQL Server's installation defaults and put your data and log files on different drives, with log files on mirrored volumes. Next, you need to understand SQL Server's recovery models and how they affect the way your system handles transaction logs. SQL Server's recovery models are part of the database properties. SQL Server 2000's Simple recovery model is suited only to the smallest installations and doesn't allow for point-in-time recovery. The vast majority of applications require Full or Bulk-Logged recovery models.
Make sure your checklist includes a backup plan that accounts for both backup and disaster-recovery scenarios. In other words, you need to be able to restore the SQL Server system itself as well as the data that resides on the system. Last, but certainly not least, you need to test all aspects of your restore and recovery plans. The absolute worst time to test these plans is when you need them. I can think of better ways to spend a Saturday morning than watching a system's backup tapes whir. However, there's no other way to know whether your backup and disaster-recovery plans will work when you need them.
7 Hints for Better Performance
SQL Server performance depends on the interaction of many factors, ranging from the hardware that the system runs on to the application coding techniques you can use. Here are seven ways you can improve SQL Server's performance:
7. Use SQL Server Profiler.
6. Use the SQL Performance Monitor.
5. Use the Query Analyzer ShowPlan option.
4. Use the Index Tuning Wizard.
3. Use automated tuning.
2. Plan disk subsystem usage.
1. Get enough memory.
Using T-SQL to Avoid Unwanted NULLs
You can avoid returning NULL values from your queries by using the IsNULL() function in your T-SQL query or stored procedure, as the code in Listing 2 shows. This way, if a column contains a NULL, the server substitutes either a string or another specified value (the data types must match). In Listing 2's code, for example, I return -1 as a substitute for any NULLs encountered in the Year_Born column data.
Criteria for Evaluating Candidates for Primary Key
|Not Null: Will the candidate value ever be null?||No|
|Brevity: Is the candidate value more than a single column?||No|
| Simplicity: Does the candidate value contain embedded |
spaces, special characters, or differential capitalization?
| Data Type: Is the candidate value something other than a |
number or fixed-length character data type?
| Nonidentifying Value: Does the candidate value contain any |
intelligence or have any meaning?
|Never Change: Is the candidate value likely to change?||No|
An Unbroken Chain
Ownership chains can help you accomplish security objectives, which include making data available only to authorized users and ensuring data integrity by providing consistent and controlled access to data. You can employ an unbroken ownership chain to limit users to a subset of columns (vertical partitioning), a subset of rows (horizontal partitioning), or both by creating objects such as filtering views, stored procedures, and functions that filter data from underlying tables. Authorized users who don't have direct access to the tables can still access data through the referencing object because the ownership chain is unbroken. However, users are confined to the filtered data subset. Consequently, you can better control access to sensitive data by granting permissions on the objects rather than allowing direct access to tables.
A View to a System Table
Microsoft stresses that writing queries directly against system tables is a bad idea. Microsoft reserves the right to change the underlying definition of system tables, so the correct and safest way to query system data is through a system function, a stored procedure, or one of the ANSI-standard information schema views. Information schema views are handy, and I use them all the time. However, sometimes I want to see how the information schema views get their information, so I peek under the covers. The information schema views are visible in the master database, just as any other view is. Reading Microsoft-supplied stored procedures is a great way to learn undocumented tips and cool T-SQL coding tricks. And reading the information schema views is a great way to learn about the inner workings of SQL Server.
A New Default Location
When SQL Server is first installed, it sets the default directory for data and transaction log files. Sometimes, when users use Enterprise Manager to create a database, they don't change the default transaction log path in the Database Properties dialog box. The default places the transaction log in the same directory as the data. Regardless of which version of SQL Server you use, you should store the log on a separate physical disk so that you'll still have access to the transaction log if you lose the disk containing the data. In SQL Server 7.0 and earlier releases, you can't permanently change the default location for data and the transaction log. You have to manually select the path in the Database Properties dialog box and change it to the desired location every time you create a new database in Enterprise Manager.
However, in SQL Server 2000, you can permanently change the default. Open Enterprise Manager, right-click the server name, and select the Properties option from the pop-up menu. Next, click the Database Settings tab, and set a new location for the default data and log directory.
Don't Create Duplicate Fields; Model to 3NF
Creating the same attribute in more than one entity is a temptation that many novice data modelers can't resist because they don't yet believe in data normalization or they haven't yet understood that SQL joins can successfully reunite data fields that need to be together on a report or form. The only attributes you have to duplicate are primary and foreign keys.
Learning to model a database to third normal form (3NF) might be the hardest and most controversial task facing a DBA or programmer, especially if you've been pulled into database design without any background in data modeling theory. To fully understand a data model's business requirements and the way the data items interact with one other, you must model to 3NF, at least at the conceptual stage of database design. Then, when you implement the design, you can denormalize the design for production efficiencies or for programming convenience, if you need to. However, when you denormalize a data model, you have to expect to compensate for the resulting data anomalies with client-side or middle-tier programming and database triggers. The benefit of having modeled to 3NF is that you know exactly where to expect the synchronization problems caused by insert, update, and delete anomalies that the denormalizing operation introduces. And you can more easily create the compensatory programs that keep the redundant data in check and synchronized.
—Michelle A. Poolet, March 2004 "Why You Need Database Normalization," March 1999
Connecting to Oracle Linked Servers
If your linked server connection doesn't work, you won't get far when you try to run distributed queries. Here are some common problems you might encounter when configuring the Oracle client, along with ways to overcome them:
Receiving an Invalid Login message. This problem occurs when the login you use to connect to the linked server isn't valid on the Oracle server. To test the login's validity, first try to connect using the Oracle sample Scott/tiger user ID and password.
Receiving the message Credential Retrieval Failed. This message appears when the client machine is trying to use NTS authentication and authentication is failing. You can correct this problem by commenting out the following parameter in the sqlnet.ora file on the client machine: #Sqlnet.Authentication_services = (NTS). (The # symbol designates a comment.) You can find the sqlnet.ora file in the \oracle\ora92\network\ADMIN directory.
Can't connect from a WAN link or from a different subnet. This problem can occur when the Oracle listen port isn't available. Make sure port 1521 is open. If this port is open, double-check the Oracle listener's configuration to determine whether the default port has changed.
Centralize Data Files
When a Data Transformation Services (DTS) package is executing, it runs under the security context of the person or process executing the package. So the security context, be it through the user's account or the SQL Server Agent service's startup account, must have appropriate permissions to read from source files and write to destination files. To reduce administrative overhead, consider using one folder on one server for all data files. By centralizing data files to one server and folder, you can easily manage security access. If data files are in multiple locations, you have to manage the security for all source and destination folders separately. By keeping data files in one place, you know exactly where to locate your source and destination files, reducing the time you spend looking for them. If you need to separate a test file from a production file, simply create a separate folder in your main directory.
Metadata from Oracle Linked Servers
If you aren't comfortable using Oracle tools to query the Oracle data dictionary, some SQL Server system stored procedures can help you obtain Oracle metadata from a linked Oracle server. The procedures in Table 1 accept from four to seven parameters. They all accept the same first three parameters (linked server name, table name, schema name). SQL Server Books Online (BOL) gives full details about the use of these procedures.
In addition, SQL Server's sp_columns_ex system procedure, which returns a list of columns in an Oracle table, requires at least three non-null parameters. The sp_linkedservers procedure, which returns a list of all linked-server definitions, takes no parameters. The sp_catalogs system procedure works when your linked servers are both SQL Server, but it doesn't work with Oracle linked servers.
One way to get good performance from your database is to stripe the data over as many physical disks as possible. For data files, the best-performing solution is using a hardware-based RAID-10 controller. However, you don't always have the budget to implement such an I/O subsystem, so you might choose to use several regular disks or several RAID 1 controllers and place each data file on a separate disk, thereby achieving the striping that SQL Server performs anyway.
If you choose this method, you need to beware of one gotcha. SQL Server applies a proportional fill algorithm to the data files, meaning that it writes data proportional to the free space available within the file among the files in the filegroup. This algorithm results in all the files filling up at the same time. If you use the auto-growth feature of database files, SQL Server uses a round-robin algorithm to determine which file to expand when the database gets full. This means that only one file expands when the database gets full, and all the following inserts are directed to it and not to the other files because it's the only file with free space. When this file gets full, SQL Server expands another file, and so on.
For better performance, manually expand the files before they get completely full, making sure to expand all files, not just one. You can use an automated script that expands the files before the database gets full if their fill percentage passes a certain threshold that you determine.
The first step in ensuring Data Transformation Services (DTS) package portability is to localize all components. Component localization means that all the workings your DTS package requires must be present on the local system. This is important because a DTS package executes on the system from which you initiate it. If you initiate a DTS package from your workstation, the processing takes place locally. Therefore, all objects, third-party add-ins, ODBC or OLE DB drivers, and access to data sources must be available on all workstations and servers from which a DTS package might be executed. If any of these things are missing from the initiating system, you'll get errors such as Unable to create object.