BIGINT BETWEEN is slow

Last Post 26 Jul 2011 04:14 PM by Ray SQL. 7 Replies.
AddThis - Bookmarking and Sharing Button
Author Messages
SQLUSA
New Member
New Member

--
14 Jun 2007 05:27 PM
SELECT u.UserID, i.DMA
INTO GeoMap
FROM User u
JOIN IPMapToDMA i
ON u.BigIP BETWEEN i.FromIP and i.ToIP

IPMapToDMA 1 million
User 3 million

BigIP, FromIP, ToIP are bigint,
userid, dma are int.

Everything is freshly indexed. High performance platform.

It does not finish in 3 hours (output around 2 million rows of 2 int columns).

Kalman Toth, Database Architect
SQL Server 2005 Training - http://www.sqlusa.com/order2005grandprix
SQLUSA
New Member
New Member

--
15 Jun 2007 12:47 AM
It never finishes, would statistics io show something?

The plan show HASH JOIN with cost into the millions, also NESTED....


SELECT COUNT(*) ...runs fast in 30sec.

SELECT DMA, count(*)...... GROUP BY DMA is also fast into the minutes.

SELECT ...INTO.... or INSERT...SELECT... hits a road block.

YES THE BIGINT BETWEEN. There is a cluster index on FromIP, index on ToIP

SQLUSA
New Member
New Member

--
15 Jun 2007 01:17 PM
Everything is indexed, clustered.....

If I use the same FROM with a GROUP BY on DMA with count(*), it returns the results in 30 sec.

SELECT i.DMA, count(*)
INTO GeoMap
FROM User u
JOIN IPMapToDMA i
ON u.BigIP BETWEEN i.FromIP and i.ToIP
GROUP BY i.DMA
SQLUSA
New Member
New Member

--
17 Jun 2007 12:25 AM
DMA population is 220, UserID population is 3 million. I will test different clustering.


The problem seems to be the following:

the ranges are disjoints: 1-10, 20-25, 40-40, 55-58, 800-801, 9999-10100, .....

basically only one range has to be checked the first FromIP <= BigIP (order by FromIP asc) range

if no hit in this range, BigIP should be abandoned and check should proceed to the next one

However, the query does not tell the optimizer really that the ranges are disjoint, so the optimizer goes on and checks the following, for example, 800K ranges where FromIP < BigIP. Actually it appears that the optimizer is doing the right thing.

I am experimenteing with a self-join on the ID column of IPMapToDMA which would force only a single range check.

Kalman Toth
http://www.sqlusa.com/register
SQLUSA
New Member
New Member

--
18 Jun 2007 03:38 PM
SOLUTION FOUND!

SELECT u.UserID, i.DMA
INTO GeoMap
FROM User u
JOIN IPMapToDMA i
ON u.BigIP BETWEEN i.FromIP and i.ToIP

Here is the final configuration which runs in a few minutes:

From User table ip varchar(20) column removed (3M rows).

Clustered index on BigIP(bigint), index on UserID (int), no more columns.

Clustered index on FromIP, index on ToIP, both bigint .



Kalman Toth - Database, Data Warehouse & Business Intelligence Architect
The Best SQL Server 2005 Training in the World - SQLUSA:
http://www.sqlusa.com/order2005grandprix/

tom27
New Member
New Member

--
12 Oct 2007 02:42 AM
Just curious, what about the new execution plan?

I think since you made the data types compatible the implicit conversion avoided which might have accounted the performance gain. It's not BIGINT making your query slow.

TOM
robbertnix
New Member
New Member

--
01 Apr 2008 05:19 AM
Since the ranges in IPmap do not overlap, every lookup results in max one value.
You cannot tell the optimizer that there is no overlap, therefore this query will not have the optimal execution plan.

If you set a unique clustered index on [fromIP] from [IPmapToDMA], the following query can use this extra information.


SELECT u.userID,
(SELECT DMA FROM IPmapToDMA
WHERE
fromIP = (SELECT max(fromIP) FROM IPmapToDMA
WHERE fromIP <= u.bigIP)
AND
toIP >= u.bigIP)
FROM user u

My results for a table1 with 30k records and table2 with 80k records on a simple laptop:

using the join between syntax: no result after 30 minutes
using the syntax above: full result within one second

Ray SQL
New Member
New Member

--
26 Jul 2011 04:14 PM
I have a similar challenge trying to join with large bigint mapping tables. This works like a charm! Thanks robbertnix!


Acceptable Use Policy
---