Think inside the box with geometric T-SQL
Editor's Note: Send your experts-only T-SQL tips to Itzik Ben-Gan at email@example.com. If we use your tip in the magazine, you'll receive $100 and an exclusive T-SQL Black Belt shirt.
Several industries—including cartography, or mapmaking—collect and manipulate geographical data to produce maps or for other purposes. People in these fields sometimes run across interesting problems that require the exercise of geometry and SQL against data that's stored in a database. For example, suppose you have a table containing the coordinates of points of interest such as motels. Given a list of certain locations—for example, highway exits—you might need to find the coordinates of the smallest rectangle that encloses all the input locations. You might then feed these coordinates to a mapping application that uses them to generate the smallest map that includes all the locations. This problem is easy to solve in T-SQL because you're not limited by a list of possible rectangles—rather, you use the locations' coordinates to calculate the rectangle's coordinates. You simply use the MIN() and MAX() functions. However, I recently encountered in the Microsoft SQL Server Programming public newsgroup (news://msnews.microsoft.com/microsoft.public.sqlserver.programming) a problem that has similar requirements but is much more complex than the problem I just mentioned. Let's look at it, then examine one solution I came up with.
The scenario involves center and target locations in a two-dimensional plane. The person who posted the problem didn't specify what those center and target locations were, so let's use the motel example. Suppose that the motel-chain company stores in its database the locations of its motels and highway exits leading to each motel. Center (motel) locations information is stored in a table called Centers and includes the center ID and its x, y coordinates. Target (exit) locations information is stored in a table called Targets and includes an arbitrary location ID, the center ID to which the target location belongs, and the x, y coordinates of the target location. The company's mapping application can generate a map from each combination of four target locations that are positioned in the four corners of a rectangle. You can run the script that Listing 1 shows to create the Centers and Targets tables and populate them with sample data.
Here's the problem you need to solve: For each center, find the coordinates of a rectangle with the smallest area such that the rectangle is formed by four target locations at its corners. In the motels example, this means that for each motel, you need to find the smallest rectangle formed by four highway exits enclosing the motel. The mapping application will use the coordinates of the four exits nearest each motel to generate maps for the motels' clients. You need to generate the smallest possible map that includes the center location, using only target locations that form a rectangle. For simplicity's sake, you can assume that at least one such enclosing rectangle exists for each center and that if more than one "smallest enclosing rectangle" exists, you can return them all. In reality, you'd need to add a factor that takes into account the fact that you probably can't form an exact rectangle from highway exits.
To get a better understanding of the problem, look at Figure 1. I've marked the target locations with an x and the center locations from the sample data as circles with a cross. The figure also contains the smallest enclosing rectangles that you're looking for. The elements that belong to Center 1 appear in red, and the elements that belong to Center 2 appear in gold.
Creating a single query that solves the problem is a difficult task. Such a query would be highly complex and hard to understand and maintain. A better approach is to break the problem into steps and use views to encapsulate the intermediate queries. You could express the first step this way: "For each center, find the coordinates of all rectangles that can be formed from the target locations that belong to the center." In our example, this means "for each motel, find all rectangles that can be formed from the highway exits that belong to it." Note that a valid rectangle must contain four target locations at the four corners. However, returning the coordinates of only the top-left and bottom-right corners in the output is sufficient.
Running the script in Listing 2 creates the VRectangles view, which returns the results of the first step—finding all rectangles that belong to a center—by joining the Targets table to itself. We're interested in the top-left corners from the Targets instance called TL and in the bottom-right corners from the instance called BR. To match each top-left corner in TL with all possible bottom-right corners from BR, the code in Listing 2 uses the following JOIN condition:
TL.centerid = BR.centerid AND TL.X < BR.X AND TL.Y > BR.Y.
The center IDs in each pair must match. In addition, the x coordinate of the top-left corner must be smaller than the x coordinate of the bottom-right corner, and the y coordinate of the top-left corner must be larger than the y coordinate of the bottom-right corner.
But the JOIN condition alone isn't enough to find whole rectangles. So far, the query returns pairs of target locations that might or might not be the top-left and bottom-right corners of a rectangle that can be formed from four target locations. The two EXISTS() predicates that appear in the WHERE clause verify that matching top-right and bottom-left corners appear in the Targets table.
A SELECT * query against the VRectangles view returns the output that Table 1 shows. The query found three rectangles for each center. Note that the rectangles don't necessarily enclose the centers, as in the highway-exit example. You can verify which rectangles are valid by imagining the valid rectangles from Figure 1. Note that you can form two enclosing rectangles for Center 1, but the valid one is the rectangle with the smallest area. For contrast, Figure 2 shows some non-enclosing rectangles.
The next step is to create a view that returns only the enclosing rectangles for each center. Run the script that Listing 3 shows to create the VEnclosingRecs view, which returns the desired results of this step. The view definition contains a simple query that joins the Centers table with the VRectangles view from the previous step. The JOIN condition matches each center with the rectangles that belong to the center, where the center's coordinates fall within the rectangle's coordinates. A SELECT * query against the VEnclosingRecs view returns the output that Table 2 shows.
Now that you have only the enclosing rectangles, you need to find the ones that have the smallest area for each center. You can achieve this goal by using a simple correlated subquery against the VEnclosingRecs view, as the code in Listing 4 shows. Note that you can return the coordinates of all four corners from the coordinates of the two corners that exist in the view. The x coordinate of the top-right corner is equal to the x coordinate of the bottom-right corner, the y coordinate of the top-right corner is equal to the y coordinate of the top-left corner, and so on. Table 3 shows the output of Listing 4's query. You can revisit Figure 1 to verify that the result is accurate.
Different Way of Thinking
I'm always surprised at the wealth of problems that T-SQL can solve. As this and my previous two columns demonstrate, T-SQL can be useful for graphical and geometrical puzzles, which require a different way of thinking from the usual business-scenario problems. Practicing different ways of thinking can prove useful when you need to think "outside the box" to implement similar ideas in other business situations you face. Every puzzle you solve makes you more prepared for the next puzzle—and for real-world problem solving.