You can get too comfortable with a favorite view
"Views are nothing more than named, saved SELECT statements" is always my opening line in a lecture or conversation about views. But no matter how simple views are, they still generate many questions and create a lot of confusion. As soon as I start to talk about views, people ask:
- Do views have any data stored directly with them?
- Do views slow performance?
- What permissions do I need to create views?
- What permissions do I need to use views?
- Can I update a view?
- Can I use SELECT * in a view?
- Can I index a view?
- How does a view differ from a function or a stored procedure?
- Can I export or replicate a view?
- Can I access a view if the tables on which the view is defined are dropped?
Because so many people have questions like these, I decided to write a series of articles that explain everything you need to know about using views and achieving the best query performance with them. To test the syntax for this article's examples, you can download the ViewsPartIExamples .sql commented script file at http://www.tsqlsolutions.com, InstantDoc ID 27229.
By definition, a view is a query that you've decided to save for reuse. Instead of retyping the query every time you need the query's data set, you simply access the view as if it were a table. A query that references the view (a view query) uses the same plan and records the same performance as a query that refers directly to the base tables (a base-table query). For example, let's assume you need to see the details of an order in the Northwind sample database. You can write the five-table join that Listing 1 shows to return data about who purchased the order items, what they purchased, the quantity, and the order ID. Or you can create a view to hide the complexity of this five-table join, then access the view every time you need this data. If you want to base the view on the SELECT statement, you first need to name it according to the data that the base-table query defines. For Listing 1's data, let's call the view ProductCategoryDetailsView. Then, to create the view, you add the following code to Listing 1's query:
In this example, you can cut and paste the five-table join query into the SELECT section of the view. (However, in some situations, you can't place the same query directly into a view because of certain restrictions within views. I'll address that topic in a future article.)
When you need to access the data that this view references, the only thing you need to type is the following view query:
You can also add a WHERE clause, as the following code snippet shows:
WHERE CategoryName = 'Beverages'
Whether you type the base-table query with the added WHERE clause or type the view query with the WHERE clause, both queries execute at the same level of performance. Listing 2 shows the view query with the added WHERE clause and the base-table query with the added WHERE clause. Figure 1 shows the query plans that the two queries in Listing 2 produce. As you can see, the queries take the same amount of time, claim the same overhead, and generate the same plan. The only difference between the two is the amount of T-SQL code you need to type in each time.
The most fundamental benefit of using a view is simplifying access to the data. The view that you just created successfully simplifies access because
- you don't have to know the underlying schema to retrieve the data you need.
- you don't have to type in the base-table query every time you need to access this data.
- you can use the asterisk (*) shortcut to simplify accessing all the columns that the view defines.
- you can add WHERE clauses to limit the data that the query returns and customize your query against the data.
I have one warning about the * shortcut. We programmers, who are an inherently lazy lot, often forget that SELECT * simplifies the query but often complicates the query processing. When you use SELECT * against base tables, SQL Server must return all columns. However, when you reduce in the view definition the number of columns that the view returns, using SELECT * is efficient because SQL Server doesn't have to access all the columns from the base tables. In fact, using * against a view imposes no negative effect on performance.
Although you need to be judicious in your use of the * shortcut, adding WHERE clauses has the biggest potential for problems. If you use the five-table join query or view to return order data that includes customer, product name, category name, and ID, you can easily add a WHERE clause. And if you want to add a WHERE clause that limits only the product type or the customers or the categories, using the view to execute this query works well. For example, if you want to return only beverages, you can execute the following statement:
WHERE CategoryName = 'Beverages'
If you want to retrieve orders only for customers whose name begins with B, the following statement will efficiently return the correct results:
WHERE CompanyName like 'B%'
However, what happens if you refine your search further? Would you feel comfortable using this view if you wanted to return a list of customers who've placed an order and whose name begins with B? To return that specific data set, you can write a view query as above — with the SELECT statement referencing the view and with the added WHERE clause — or if you have access to the base tables, you can write the query as follows:
FROM dbo.Orders AS O
INNER JOIN dbo.Customers AS C
ON C.CustomerID = O.CustomerID
WHERE CompanyName like 'B%'
The result set from these two queries is the same. However, the view query gathers data from five tables but needs data from only the Customers and Orders tables. Because the joins to Order Details, Products, and Categories only further describe the data, the view's inner join doesn't change the number of rows in the join. In fact, this schema is called a snowflake schema. Imagine a snowflake comprised of the query's components. The largest table — Order Details — is the fact table. This fact table branches out to descriptor tables, or lookup tables (e.g., Products), which in turn branch out to additional descriptor tables (Categories). When you're looking only for details about an existing Order Details row (the join to the Products table), your result set doesn't increase or decrease in size; you only further describe the Order Details data. For example, in the Order Details table, you can see information about products. Each Order Details row has a specific ProductID ordered, and each Order Details row only exists with a valid ProductID. The database design (which shows that the ProductID column doesn't allow NULLs) and referential integrity constraints (which tell you that a foreign key is defined from \[Order Details\].ProductID to Products.ProductID) give you this information. Therefore, every Order Details row produces exactly one valid ProductID. In this case, the number of rows joining from Order Details to Products doesn't change. When you use the view query, you force the five-table join and cause a severe performance hit. However, the base table query accesses only two tables and promotes smooth performance. So why join five tables when you can join only two?
Let's look at the example query in a different way. When you want to access detailed information about the category (such as CategoryName) and information about the product (such as ProductName), you must apply the join to the Order Details, Category, and Products tables. If some products don't have a CategoryID, the join eliminates Order Details rows that lack a CategoryID. In the Northwind database, however, all products have a valid CategoryID, and the Products and Category tables are used only as supporting or descriptor tables. In this database, queries that join to the Products and Category tables will only detail the data and not increase or decrease the number of rows that the set defines.
So where's the problem? Listing 3 shows the view query, which uses a WHERE clause against the five-table join view, and the simplified two-table base-table query. Figure 2, page 14, shows their respective execution plans. The five-table join against the view is the first (top) plan, and the rewritten two-table join is the second (bottom) plan. The first thing you notice is the difference in the plans. The first plan is more complex (it contains more items) than the second. The Query cost (relative to the batch) numbers are also significant. By running both queries at the same time, you can determine their cost relative to the batch as well as their cost relative to each other. The first plan costs 82 percent of the batch plan, and the second plan costs a little more than 18 percent of that plan. These results show that the first plan is more than four times as expensive in resource consumption as the second plan. What started as a seemingly simple view to hide the complexity of a five-table join was abused. Instead of querying the data directly, the user incorrectly queried the view. This behavior is typical: Users who write their own queries often become comfortable with one or two important views, which they query for all the data they need. Poor performance follows. In this case, to access a list of customers who've placed an order and whose name begins with B, you need a view that accesses only the two relevant tables — Orders and Customers.
Views are flexible and can limit the data returned to the user. By working closely with users to determine the information they need, developers can maintain better control of the requested data and keep the server running smoothly. Here are a few questions developers can ask users to determine whether they're using the views for other than their intended purposes:
- Do you use DISTINCT a lot? This question should generate a "no" answer. If users constantly filter out duplicate rows, they might be querying the wrong view for the information. Returning excess data from the wrong view requires users to eliminate the rows that the join gathered.
- Do you always ask for SELECT * against your views, or do you often eliminate the columns from the view query? The preferred answer to this question is choosing SELECT *, which is a sign that users are querying the view correctly.
- Do you use different views, or does one view seem to support most of your queries? Developers hope to hear that the user relies on a variety of views. Sometimes, users find a view that answers most of their data needs. Then, they continue to use that view for all their queries, constantly tweaking the view to return the exact data they require. More than likely, though, the tweaking slows query performance — and the server.
Views are powerful tools — when you use them properly. To create the views that match user requirements, developers need to discuss with their users what their needs are and create lots of views to cover all their data requests. Then, to prevent misuse, developers need to document the views and constantly work with users to monitor the slowest-performing and highest-priority views. Well-designed views simplify access and improve performance. In a future article, I'll describe which types of SELECT statements you can't include in a view, why you can't include them, and how else you can retrieve the required data. In the meantime, check out the Web sidebar "T-SQL Tutor Project," InstantDoc ID 27361, for a homework assignment that will help you hone your skills for creating and accessing views.