T-SQL conditional sort
I've often encountered this situation: a stored procedure needs to display a list of records ordered by a dynamic parameter. In Transact SQL, the Microsoft SQL server, one cannot do this elegantly in any way. I will list them all and tell you what the problem with each is.
First of all, let's start with an example. Assume we have a table called Test with a lot of rows, which has a datetime column which has an index on it. Let's call that TheDate to avoid any SQL keywords. We want to do something like this:
Notice that I want to get the top 10 rows, which means I only need a small part of the total. I also order directly by TheDate. In order to release a piece of code we also need to test it for performance issues. Let's look at the execution plan:
Now, let's try to order it dynamically on a string parameter which determines the type of the sort:
As you see, I've used CASE to determine the sort order. There is no option to give a parameter as the sort order. The execution plan is this:
Surprise! The execution plan for the second query shows it is ten times slower. What actually happens is that the entire table is sorted by the case expression in a intermediate table result, then 10 items are extracted from it.
There must be a solution, you think, and here is an ingenious one:
I transform the datetime value into a float and then I use a mathematical expression on it, multiplying it with 1 or -1. It is the simplest expression possible under the circumstances. The execution plan is:
Bottom line, there is no exception to the rule: when you order by an expression, SQL Server does not use indexes, even if the expression is easily decompilable. Don't get mislead by the apparent functional programming style of SQL syntax. It doesn't really optimize the execution plan in that way.. Even if the column is an integer, it will not work. Ordering by TheInteger is fundamentally faster than ordering by -TheInteger.
And now the solution, ugly as it may be (imagine the select is a large one, with joins and where conditions):
Yes, the dreaded duplication of code. But the execution plans are now equivalent: 50%/50%.
This post was inspired by real events, where the production SQL server went into overdrive trying to create and manage many temporary result tables from a stored procedure that wanted to avoid duplication using the CASE method.
Update: there is, of course, another option: creating an SQL string in the stored procedure that is dynamically modified based on the sort parameter, then the SQL executed. But I really dislike that method, for many reasons.
More information from another blogger: Conditional Order By. They also explore rank using windows functions and in one of the comments there is a reference to SQL 2008 "Grouping Sets" which I have not covered yet.
First of all, let's start with an example. Assume we have a table called Test with a lot of rows, which has a datetime column which has an index on it. Let's call that TheDate to avoid any SQL keywords. We want to do something like this:
SELECT TOP 10 * FROM Test ORDER BY TheDate ASC
Notice that I want to get the top 10 rows, which means I only need a small part of the total. I also order directly by TheDate. In order to release a piece of code we also need to test it for performance issues. Let's look at the execution plan:
Now, let's try to order it dynamically on a string parameter which determines the type of the sort:
SELECT TOP 10 * FROM Test ORDER BY CASE WHEN @sort='ASC' THEN TheDate END ASC, TheDate DESC
As you see, I've used CASE to determine the sort order. There is no option to give a parameter as the sort order. The execution plan is this:
Surprise! The execution plan for the second query shows it is ten times slower. What actually happens is that the entire table is sorted by the case expression in a intermediate table result, then 10 items are extracted from it.
There must be a solution, you think, and here is an ingenious one:
DECLARE @intSort INT = CASE WHEN @sort='ASC' THEN 1 ELSE -1 END
SELECT TOP 10 * FROM Test ORDER BY CAST(TheDate AS FLOAT)*@intSort ASC
I transform the datetime value into a float and then I use a mathematical expression on it, multiplying it with 1 or -1. It is the simplest expression possible under the circumstances. The execution plan is:
Bottom line, there is no exception to the rule: when you order by an expression, SQL Server does not use indexes, even if the expression is easily decompilable. Don't get mislead by the apparent functional programming style of SQL syntax. It doesn't really optimize the execution plan in that way.. Even if the column is an integer, it will not work. Ordering by TheInteger is fundamentally faster than ordering by -TheInteger.
And now the solution, ugly as it may be (imagine the select is a large one, with joins and where conditions):
IF @sort='ASC'
BEGIN
SELECT TOP 10 * FROM Test ORDER BY TheDate ASC
END
ELSE
BEGIN
SELECT TOP 10 * FROM Test ORDER BY TheDate DESC
END
Yes, the dreaded duplication of code. But the execution plans are now equivalent: 50%/50%.
This post was inspired by real events, where the production SQL server went into overdrive trying to create and manage many temporary result tables from a stored procedure that wanted to avoid duplication using the CASE method.
Update: there is, of course, another option: creating an SQL string in the stored procedure that is dynamically modified based on the sort parameter, then the SQL executed. But I really dislike that method, for many reasons.
More information from another blogger: Conditional Order By. They also explore rank using windows functions and in one of the comments there is a reference to SQL 2008 "Grouping Sets" which I have not covered yet.
One more option, maybe even more ugly, is having a stored procedure for each possible sort criteria
ReplyDeleteWe are smart people, we can find ever uglier methods of doing the same thing. The point of this article was to find the elegant ones :)
ReplyDeleteWhat I found especially annoying is not being able to "cache" a CTE in order to use it in several statements. It would have been a breeze to give a name to the select expression and then only duplicate the ORDER BY part.