Sunday, February 26, 2012

Eliminating a table scan from a query

I'm doing a simple join from an inventory table to a product table to look up
the product name. Both the inventory table and product table have indexes on
the shared key. Nevertheless its using a table-scan against the product
table, which is taking up a lot of time (over 1/2 of the outer query this is
part of). Here it is:
SELECT position, value, a.productId, a.productname
FROM tblPositions p INNER JOIN tblProducts a ON p.productId= a.productId
WHERE p.bank not in (300, 400, 1866) and ABS(position) > 0.001
Looking at the output of Query Analyzer, it is using the index on positions
for the bank, then table scanning products.
Any ideas?
MauryABS(position) > 0.001
Don't have functions etc on the column side of the query. In most cases it will prohibit the
optimizer from using a column. Unless I misunderstand ABS, you should be able to write it as:
position > 0.001 OR position < -0.001
Above doesn't mean that the plan *will* change but at least the optimizer has a chance now to use an
index on that column. Whether or not that happens depends on a lot of factors, like type of join,
join density, number of rows in the tables, selectivity of each restriction, accuracy of statistics
etc.
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://sqlblog.com/blogs/tibor_karaszi
"Maury Markowitz" <MauryMarkowitz@.discussions.microsoft.com> wrote in message
news:D2261870-06AC-429B-9870-5CC6ACAA14E2@.microsoft.com...
> I'm doing a simple join from an inventory table to a product table to look up
> the product name. Both the inventory table and product table have indexes on
> the shared key. Nevertheless its using a table-scan against the product
> table, which is taking up a lot of time (over 1/2 of the outer query this is
> part of). Here it is:
> SELECT position, value, a.productId, a.productname
> FROM tblPositions p INNER JOIN tblProducts a ON p.productId= a.productId
> WHERE p.bank not in (300, 400, 1866) and ABS(position) > 0.001
> Looking at the output of Query Analyzer, it is using the index on positions
> for the bank, then table scanning products.
> Any ideas?
> Maury|||> SELECT position, value, a.productId, a.productname
> FROM tblPositions p INNER JOIN tblProducts a ON p.productId= a.productId
> WHERE p.bank not in (300, 400, 1866) and ABS(position) > 0.001
> Looking at the output of Query Analyzer, it is using the index on positions
> for the bank, then table scanning products.
> Any ideas?
> Maury
Hi,
That is why it is taking such a long time. It should be doing an index
join on products. For some reason optimizer decided to chose the index
on position. It is very smart but not smart enough.|||"Tibor Karaszi" wrote:
> position > 0.001 OR position < -0.001
No luck, unfortunately.
Oh wow, I was just able to eliminate the scan! I have no idea why this
works, but when I added a criterion on the products by joining against
another table of product types, then it worked instantly.
Now here's something interesting... when I use ABS in _that_ query it
behaves as you mention above and the table scan comes back. Very odd!
Maury|||Ok, this is just too weird.
In order to try to give the optimizer a little more to work with, I added a
join against a product type table. This table has a column that can be used
to find all products of a particular type (pumbing, etc.), and has seven
unique values. Here is the new version of the query:
SELECT position, value, a.productId, a.productname
FROM tblPositions p
INNER JOIN tblproducts a ON p.productId= a.productId
INNER JOIN tblSecType t ON t.sectypeId = p.sectypeId AND t.assetType > 0
WHERE p.bank not in (300, 400, 1866)
AND (position > 0.01 OR position < -0.01)
Ok, ready for this? If the value on assetType is > 1 the table scan
disappears and the query runs instantly. If the value is >0 or >=1, the table
scan re-appears!
So is there some way to force the system to use the plan I want?
Maury|||It should not surprise you that you get different execution plans when you change the query, and
also when you change search criteria. This is what a cost-based optimizer is all about. Perhaps
there is a significant change in selectivity between > 0 and > 1, explaining why different execution
plans are chosen. Yes, there are several methods to force different aspects of a plan. For example
an index hint. I suggest you read about "optimizer hints" in Books Online for more information.
--
Tibor Karaszi, SQL Server MVP
http://www.karaszi.com/sqlserver/default.asp
http://sqlblog.com/blogs/tibor_karaszi
"Maury Markowitz" <MauryMarkowitz@.discussions.microsoft.com> wrote in message
news:1503F046-4007-4228-93D8-D6BA32D1DD7E@.microsoft.com...
> Ok, this is just too weird.
> In order to try to give the optimizer a little more to work with, I added a
> join against a product type table. This table has a column that can be used
> to find all products of a particular type (pumbing, etc.), and has seven
> unique values. Here is the new version of the query:
> SELECT position, value, a.productId, a.productname
> FROM tblPositions p
> INNER JOIN tblproducts a ON p.productId= a.productId
> INNER JOIN tblSecType t ON t.sectypeId = p.sectypeId AND t.assetType > 0
> WHERE p.bank not in (300, 400, 1866)
> AND (position > 0.01 OR position < -0.01)
> Ok, ready for this? If the value on assetType is > 1 the table scan
> disappears and the query runs instantly. If the value is >0 or >=1, the table
> scan re-appears!
> So is there some way to force the system to use the plan I want?
> Maury|||One more thing. Try rebuilding the indexes and updating statistics on the
tables to see if the optimizer can get better heuristics to work with. It
isn't that the optimizer isn't 'smart enough' as someone suggested. It is
indeed PLENTY smart - but it only has a limited set of data to work with and
that comes from the statistics information. It will usually switch from
index usage to table scan if it estimates more than roughly 10-15% (IIRC) of
the number of rows will match your join/where clauses.
--
TheSQLGuru
President
Indicium Resources, Inc.
"Maury Markowitz" <MauryMarkowitz@.discussions.microsoft.com> wrote in
message news:1503F046-4007-4228-93D8-D6BA32D1DD7E@.microsoft.com...
> Ok, this is just too weird.
> In order to try to give the optimizer a little more to work with, I added
> a
> join against a product type table. This table has a column that can be
> used
> to find all products of a particular type (pumbing, etc.), and has seven
> unique values. Here is the new version of the query:
> SELECT position, value, a.productId, a.productname
> FROM tblPositions p
> INNER JOIN tblproducts a ON p.productId= a.productId
> INNER JOIN tblSecType t ON t.sectypeId = p.sectypeId AND t.assetType > 0
> WHERE p.bank not in (300, 400, 1866)
> AND (position > 0.01 OR position < -0.01)
> Ok, ready for this? If the value on assetType is > 1 the table scan
> disappears and the query runs instantly. If the value is >0 or >=1, the
> table
> scan re-appears!
> So is there some way to force the system to use the plan I want?
> Maury|||"TheSQLGuru" wrote:
> that comes from the statistics information. It will usually switch from
> index usage to table scan if it estimates more than roughly 10-15% (IIRC) of
> the number of rows will match your join/where clauses.
Ahhh, a useful number I'll try to remember.
Maury

No comments:

Post a Comment