Weird SQL Server query problem

Question

So I have this weird problem with an SQL Server stored procedure. Basically I have this long and complex procedure. Something like this:

SELECT Table1.col1, Table2.col2, col3
FROM Table1 INNER JOIN Table2
     Table2 INNER JOIN Table3
     -----------------------
     -----------------------
     (Lots more joins)
WHERE Table1.Col1 = dbo.fnGetSomeID() AND (More checks)
     -----------------------
     -----------------------
(6-7 More queries like this with the same check)

The problem is that check in the WHERE clause at the end Table1.Col1 = dbo.fnGetSomeID(). The function dbo.fnGetSomeID() returns a simple integer value 1. So when I hardcode the value 1 where the function call should be the SP takes only about 15 seconds. BUT when I replace it with that function call in the WHERE clause it takes around 3.5 minutes.

So I do this:

DECLARE @SomeValue INT
SET @SomeValue = dbo.fnGetSomeID()
--Where clause changed
WHERE Table1.Col1 = @SomeValue

So now the function is only called once. But still the same 3.5 minutes. So I go ahead and do this:

DECLARE @SomeValue INT
--Removed the function, replaced it with 1
SET @SomeValue = 1
--Where clause changed
WHERE Table1.Col1 = @SomeValue

And still it takes 3.5 minutes. Why the performance impact? And how to make it go away?

How long does it take to execute `dbo.fnGetSomeID()` on its own — Eoin Campbell, Apr 22 '09 at 08:53
But that shouldn't matter cause I replace with the hard coded value in the end. — Malik Daud Ahmad Khokhar, Apr 22 '09 at 08:57
The 2nd one should be faster. In general, you don't want functions in WHERE clauses as they get re-evaluated for every single potential row in the resultset. — Joe, Apr 22 '09 at 10:40

score 2 · Answer 1 · answered Apr 22 '09 at 13:08

Even with @SomeValue set at 1, when you have

WHERE Table1.Col1 = @SomeValue

SQL Server probably still views @SomeValue as a variable, not as a hardcoded 1, and that would affect the query plan accordingly. And since Table1 is linked to Table2, and Table2 is linked to Table3, etc., the amount of time to run the query is magnified. On the other hand, when you have

WHERE Table1.Col1 = 1

The query plan gets locked in with Table1.Col1 at a constant value of 1. Just because we see

WHERE Table1.Col1 = @SomeValue

as 'hardcoding', doesn't mean SQL sees it the same way. Every possible cartesian product is a candidate and @SomeValue needs to be evaluated for each. So, the standard recommendations apply - check your execution plan, rewrite the query if needed.

Also, are those join columns indexed?

score 1 · Answer 2 · edited May 23 '17 at 11:47

As is mentioned elsewhere, there will be execution plan differences depending on which approach you take. I'd look at both execution plans to see if there's an obvious answer there.

This question described a similar problem, and the answer in that case turned out to involve connection settings.

I've also run into almost the exact same problem as this myself, and what I found out in that case was that using the newer constructs (analytic functions in SQL 2008) was apparently confusing the optimizer. This may not be the case for you, as you're using SQL 2005, but something similar might be going on depending on the rest of your query.

One other thing to look at is whether you have a biased distribution of values for Table1.Col1 -- if the optimizer is using a general execution plan when you use the function or the variable rather than the constant, that might lead it to choose suboptimal joins than when it can clearly see that the value is one specific constant.

If all else fails, and this query is not inside another UDF, you can precalculate the fnGetSomeID() UDF's value like you were doing, then wrap the whole query in dynamic SQL, providing the value as a constant in the SQL string. That should give you the faster performance, at the cost of recompiling the query every time (which should be a good trade in this case).

score 0 · Answer 3 · answered Nov 12 '10 at 03:01

 (Lots more joins)
WHERE Table1.Col1 = dbo.fnGetSomeID() AND (More checks)

This is not a nice problem to have. It shouldn't matter, finally, whether the value is returned by a function or subquery or variable or is a constant. But it does, and at some level of complexity it's very hard to get consistent results. And you can't really debug it, because neither you nor anyone else here can peer inside the black box that is the query optimizer. All you can do is poke at it and see how it behaves.

I think the query optimizer is behaving erratically because there are many tables in the query. When you tell it to look for 1 it looks at the index statistics and makes a good choice. When you tell it anything else, it assumes it should join based on what it does know, not trusting your function/variable to return a selective value. For that to true, Table1.Col1 must have an uneven distribution of values. Or the query optimizer is not, um, optimal.

Either way, the estimated query plan should show a difference. Look for opportunities to add (or, sometimes, remove) an index. It could be the 3.5 plan is reasonable in a lot of cases, and what the server really wants is better indexes.

Beyond that is guesswork. Sometimes, sad to say, the answer lies in finding the subset of tables that produce a small set of rows, putting them in a temporary table, and joining that to the rest of the tables. The OPTIMIZE FOR hint might be useful, too.

Keep in mind, though, that any solution you come with will be fragile, data and version dependent.

score 0 · Answer 4 · answered Apr 22 '09 at 10:24

Another thing to try. Instead of loading the id into a variable, load it into a table

if object_id('myTable') is not null drop myTable
select dbo.fnGetSomeID() as myID into myTable

and then use

WHERE Table1.Col1 = (select myID from myTable)

in your query.

score 0 · Answer 5 · answered Apr 25 '09 at 00:46

0

You could try the OPTIMIZE FOR hint to force a plan for a given constant, but it may have inconsistent results; in 2008 you can use OPTIMIZE FOR UNKNOWN

answered Apr 25 '09 at 00:46

SqlACID

4,008
18
29

score 0 · Answer 6 · answered May 01 '09 at 03:27

I think that since the optimizer has no idea how much work the function does, it tries to evaluate them last.

I would try storing the return value of the function in a variable ahead of time, and using that in your where clause.

Also, you might want to try schema binding your function, because apparently sometimes it seriously affects peformance.

You can make your function schema bound like so:

create function fnGetSomeID()
with schema_binding
returns int
... etc.

Weird SQL Server query problem

6 Answers6