Some dos and don'ts for getting the best views
When you create views, it’s likely that users can and will use them appropriately. But do you know what you can’t put in a view? And if something isn't allowed, is there a workaround? Even if you can put something in a view, should you do it? Can you update, delete, and insert data through a view? Is every SELECT statement valid within a view’s definition, or will some SELECT statements generate an error and keep you from creating the view? Let's explore the limitations on the kind of SELECT statement that you can place within a view’s definition. We'll also touch on a few good habits and some things to avoid.
There are many reasons why you might create a view; simplification is a typical one. Whether you're simplifying user access to data, simplifying the granting of permissions, or isolating schema changes from the user, you're certainly simplifying access to commonly accessed sets of data. If you have data sets that need to be gathered and returned or analyzed often, you might want to create a view to simplify access to that data. As with any data set, you might perform numerous operations on a view: computations against columns, ordering for readability, aggregations of sets for statistical analysis, and so on. So, what if you want to include all these operations within a view? Not every SELECT statement is valid. In fact, depending on syntax, the view creation might fail.
Remember, a view defines a set of data almost in the same way that a table does. Table data doesn't have a defined order, meaning that you can't expect the data to be returned with any specific ordering unless you add an ORDER BY clause to the query. Although you can use indexes to logically order a table, the table simply defines the set of stored data, not how that data set is presented to the user. A view does the same thing: It defines a set of data that can be used the same way that a table is used. (In fact, as of SQL Server 2000, a view can be indexed, but such a view is still only virtual. I'll discuss indexed views in a future article.) But because a view is nothing more than a named, saved SELECT statement that can include computed columns, there are some restrictions on what you can include in a view definition.
Because a view resembles a table in usage (to access a view, you use the view name in places where a table name is allowed), one restriction on views is that a view must return a data set that resembles a table in form. In other words, the view must return a result set, and that result set must be consistent in shape: Every row must have the same number of columns and the same type of data as a table does. A view can't return results for any SELECT that produces a nontabular result set. At this point, you might be thinking, What kind of SELECT returns a nontabular data set? The answer is a clause type you might not be familiar with: COMPUTE and COMPUTE BY.
You can't include COMPUTE or COMPUTE BY clauses within a view definition. This restriction is based on the way the data is delivered. COMPUTE and COMPUTE BY statements produce a separate result set for each of the computed values (just as GROUP BY would) while including the details that produce the computed values (which GROUP BY would not). COMPUTE and COMPUTE BY produce multiple data sets—one for each computation and one for each set of data you're aggregating. If you need to aggregate a data set by using COMPUTE or COMPUTE BY, use the view to create the data set, then query against the view to aggregate it. (I find COMPUTE and COMPUTE BY so useful for analyzing aggregated data that I cover these clauses in my next column.)
So far, the primary restriction with a view is that the view must produce a tabular set of data, but that's the whole advantage to creating a view to begin with. A view is nothing more than a logical definition of a data set, which doesn't exist except when someone queries the view. In other words, you can think of views as logical or virtual tables; no storage location is associated directly with the data in a view, so the view doesn't take up additional disk space. The data always resides with the base tables (the tables on which you define the view).
Moreover, because the purpose of a view is to provide logical, not physical, access to a set of data, a view can't include the INTO clause. If you want to copy a data set to another table (permanent or temporary), use SELECT INTO when querying the view, not in the view definition. For example, you can execute
to create a new table based on the result set that the view query returns.
Another restriction is that you can't use ORDER BY in a view definition unless you also specify TOP. This restriction returns to the idea that a view defines a data set but not how it's displayed. With ORDER BY and TOP, you define the criteria from which to derive the top n rows. For example, if the data is in descending order by the sum of sales, asking for the TOP 10 will return the top 10 highest sales. Using the Northwind sample database, you can write a query to see the top 10 best-selling products based on the sum of their sales. Listing 1 contains the view definition that represents this data set. The view’s SELECT is straightforward; using TOP restricts the data set to only the TOP n rows. (If you’re not familiar with TOP, see the SQL Server Books Online—BOL—entry about TOP and TOP WITH TIES. Adding TOP to a query may significantly affect the results returned.) TOP n is ideal for creating a small, commonly accessed set of only the most important data based on an ORDER BY clause. Thus, ORDER BY is allowed in a view when you also use TOP.
Another option for TOP is using TOP n PERCENT instead of TOP n. TOP n returns the first n rows based on the ORDER BY clause. TOP n PERCENT returns that percentage of rows based on the number of rows that would have been in the complete result set. For example, if you ask for TOP 10 PERCENT of a result set that has 100 rows, you receive 10 rows of data. What about TOP 100 PERCENT? Here, you can put an ORDER BY in a view of the entire table.
Before you get too excited about this "trick," let me add a bit of caution. Using this syntax in the wrong places can compromise performance. The only time I suggest ordering the data that a view returns is when you know that the view is used only for reporting. Using an ordered view in conjunction with other objects (i.e., joining it to something else) will cause unnecessary overhead in processing the join.
For example, Listing 2 shows two views that gather data about product sales. Comparing Listing 1 to Listing 2 shows that Listing 2 returns the entire data set rather than just the TOP 10 rows as Listing 1 does. However, Listing 2 creates two views. The first view lists all product sums without an ORDER BY; the second view lists the product sums with an ORDER BY coupled with TOP 100 PERCENT. Also in Listing 2 are two queries that access these views. The first query uses an ORDER BY when accessing the view and the second lets the view perform the ordering.
At first glance, including the ORDER BY in the view seems like an excellent shortcut, especially because, as Figure 1 shows, the execution plans for these two queries are exactly the same. If the queries have the same execution plan and their performance is identical, why would I caution you to avoid TOP 100 PERCENT? Why not include the ORDER BY in the view? Continue to Listing 3, which shows a more complex query that joins the two views Listing 2 created to the Products table to produce the name of the product as well as its sales. Listing 3's queries order the data by ProductName; Figure 2 shows the execution plan for these two queries. Notice the difference in the execution plans as well as the total cost of the plans relative to the batch. In the second plan, you can see two sorts—one happens because of the sort in the view and one because of the ORDER BY in the query. The sort performed before the join is unnecessary—a waste of time and resources. Although using TOP 100 PERCENT can be extremely useful for reporting, I recommend against it as a regular practice because you might not be aware of how your users use the view. Instead, use ORDER BY in views only when you’re really asking for a subset of rows based on a TOP criteria or when you’re creating a view to be used for reporting only (meaning that it's queried directly, with no joins).
There are further restrictions on what you can place in a view if you plan to create indexes on the view or if you plan to update through the view. And there are even more complex restrictions for modifications through a view if the view is based on partitioned tables. And even when you've successfully created the view, there are further restrictions on modifications to certain columns within a view. But these are all topics for future articles, so stay tuned.
Other Points of View
Before I close this article, I want to make sure you realize that you should avoid some things even though they're allowed. Let me also mention a few habits that can improve your performance and consistency when using and returning data from a view.
I recommend that you always explicitly state all the columns you’re interested in. One of the worst things to place in a view definition is SELECT *, yet I see it all too often. Some people create views for every table in a database, each with nothing more than SELECT *. Seeing this always surprises me; the main reasons to create a view are to simplify object access and to present more interesting data sets to the users. A view you define with SELECT * isn't restrictive or particularly interesting. When you're creating a view with SELECT *, SQL Server replaces the * with the set of columns that exists at the time you create the view. If you add columns to the table later, they won't be visible when you query through the view unless you first alter the view or drop and recreate it. If an application is expecting a certain structure and the view is inconsistent, the application might not be able to handle these differences properly. But if the view always uses a defined (static) column list, you never have to worry about applications breaking from inconsistent results. Generally, I recommend that you never use SELECT * in views.
As another good practice with views, try to include as much information about your objects as possible. When you specify a table, always owner-qualify the table. Owner qualification can improve performance by eliminating ambiguous object names. I also recommend including a table qualifier or alias for every table and every column of a view, mainly to improve readability. Six months after you’ve written a myriad of complex views and you need to go back and change something, you’ll thank me. You’ll have no problem jumping right back in and remembering the query because all the tables will have reasonable aliases and all the columns will be qualified with their aliases.
Here's one more tip: Be wary of using optimizer hints in a view. Remember, a view should act just as a table does. Because you can join a view to other tables, views, or functions, you don’t want to restrict the view to handling the data access with only one specific, forced plan. Restricting the view with a hint can have negative effects similar to what we saw with Listing 2, in which the ORDER BY was forced even when it was unnecessary.
You should avoid using hints with the logical data set; instead, use hints only when absolutely necessary and only when you’re accessing or manipulating the data through the view. For SELECT INTO, COMPUTE, COMPUTE BY, and optimizer hints, the better choice is to use them in a query against the view. If you want to save those queries for reuse, consider using a stored procedure instead of creating a view. The access and usage differs in that you can only execute a stored procedure (as opposed to selecting from a view), so you can't change the result set. Views are more flexible in access because you can change the columns and rows with restrictive SELECT lists and WHERE clauses. But where you don’t need the ability to restrict the result set, you can use a stored procedure instead of a view.
Overall, the restrictions on a view's SELECT statement aren't very restrictive. What's more, for each restriction, there’s likely a workaround. Where restrictions exist, you can usually use the view where you would have used the table and then execute the options such as SELECT INTO, COMPUTE BY, ORDER BY, or optimizer hints against the view. Above all, make sure you thoroughly test each view before you put it into production.
Next time: COMPUTE and COMPUTE BY