Congratulations to Ken Haley, development manager at SPECTRUM Human Resource Systems in Denver, Colorado, and Ewen Stewart, Software Architect at Active Media Australia in Melbourne. Ken won first prize of $100 for the best solution to the August Reader Challenge, "Speeding Up the Query." Ewen won second prize of $50. Here’s a recap of the problem and the solution to the August Reader Challenge.

Learn from other Reader Challenges at "Unencrypted Information" and  "Cutting Down on Servers."

Problem


While troubleshooting a slow distributed query in his application code, Donald finds that SQL Server is choosing an inefficient execution plan. The distributed query performs a join between a large table residing on remote server Prod1 and a small table residing on local server Prod2 and retrieves most columns from the remote table. However, SQL Server retrieves all the remote table’s rows and performs the join on the local server.

To debug the query, Donald uses Northwind’s Orders and Order Details tables to create a test scenario. On the remote server, he creates the following test table:

-- On remote server Prod1
USE Northwind
SELECT od.OrderID, od.ProductID, od.UnitPrice, od.Quantity,
        od.Discount
  INTO \[New Order Details\]
  FROM (
        SELECT TOP 500000 od1.OrderID, od1.ProductID,
               od1.UnitPrice, od1.Quantity, od1.Discount
          FROM \[Order Details\] AS od1
         CROSS JOIN \[Order Details\] AS od2
         ORDER BY od1.OrderID
     ) AS od
CREATE CLUSTERED INDEX IX_NewOrderDetail_ID ON
       \[New Order Details\]( OrderID )

Then, on the local server, he writes the following test query, which resembles his production distributed query:

-- On local server Prod2
SELECT o.OrderID, od.ProductID, od.UnitPrice, od.Quantity,
        od.Discount
  FROM Northwind.dbo.Orders AS o
  JOIN Prod1.Northwind.dbo.\[New Order Details\] AS od
    ON od.OrderID = o.OrderID

Help Donald improve his slow distributed query’s I/O performance and execution time.

Solution


Donald decides to use the REMOTE JOIN hint because it instructs SQL Server 2000 (and SQL Server 7.0) to perform the join on the remote table’s server instead of on the local server. Because the remote table contains considerably more rows than the local table, the REMOTE JOIN hint proves effective in speeding up Donald’s distributed query. The REMOTE JOIN hint is useful when the left table in the query is local and the right table is remote.

Donald modifies the sample query that he used for testing by adding the REMOTE JOIN hint, as the following code shows:

SELECT o.OrderID, od.ProductID, od.UnitPrice, od.Quantity,
        od.Discount
  FROM Northwind.dbo.Orders AS o
 INNER REMOTE JOIN Prod1.Northwind.dbo.\[New Order Details\]
         AS od
    ON od.OrderID = o.OrderID

By analyzing the SQL Server Profiler trace output, Donald confirms that the REMOTE JOIN hint improves his query’s performance. (He can also execute SET STATISTICS TIME ON and observe the output in Query Analyzer.) The following sample results from a Profiler trace show statistics for the two queries:

Test Run 1
*  CPU: 28581, Reads: 500302, Writes: 3, Duration: 90550
       (without hint)
*  CPU: 12708, Reads: 50, Writes: 0, Duration: 77593
       (with REMOTE JOIN hint)
Test Run 2
*  CPU: 33057, Reads: 500321, Writes: 0, Duration: 123840
       (without hint)
*  CPU: 14010, Reads: 50, Writes: 0, Duration: 97840
       (with REMOTE JOIN hint)
Test Run 3
*  CPU: 33057, Reads: 500321, Writes: 0, Duration: 125483
       (without hint)
*  CPU: 14391, Reads: 50, Writes: 0, Duration: 102446
       (with REMOTE JOIN hint).