Anyone using table partitioning?

Last Post 10 Jan 2011 05:06 AM by russellb. 5 Replies.
AddThis - Bookmarking and Sharing Button Printer Friendly
  •  
  •  
  •  
  •  
  •  
Sort:
PrevPrev NextNext
You are not authorized to post a reply.
Author Messages
Kasper Bengtsen
New Member
New Member

--
06 Jan 2011 07:17 AM
We tried to use table partitioning to get a large database (at least for us) spilt across high/low priority storage, but ran into this bug:

http://connect.microsoft.com/SQLSer...erformance

The bug results in certain queries performing a complete index sort, rather than just using the known order of an index to determine the result.
This is very unfortunate if your table contains billions of rows.

Examples of simple queries affected by the bug:
A. "SELECT MAX(ColumnN) FROM PartitionedTable" (with a non-clustered aligned index on ColumnN)
B. "SELECT TOP 10 ColumnX, ColumnY, ColumnZ FROM PartitionedTable ORDER BY ColumnX, ColumnY, ColumnZ" (with a clustered aligned index on ColumnX, ColumnY, ColumnZ)

In our case the above queries resulted in a difference in execution duration of milliseconds vs. hours, for the exact same queries, on non-partitioned and partitioned tables respectively.

As far as I know Microsoft has no plans to fix this bug in current releases of SQL Server, and has no plans to fix it in the next major either.

So for an enterprise solution that allows for us to:
- prioritize data in the same table on physical storage (because all data has to be available online, but only some is frequently queried)
- keep the physical prioritization transparent to the layers above the database (because you already have existing applications using the database, and/or is using ORMs in the data access layer)
- keep your indexes aligned to utilize PARTITION SWITCH (because your maintenance window for moving less relevant data to low priority storage is limited)
- use TOP N/ORDER BY style queries on a large amount of rows (because who doesn't )

... we were not able to use table partitioning.


The bug made me wonder if anyone was using table partitioning at all, and how do you get around this bug?

So please share if you are



russellb
New Member
New Member

--
06 Jan 2011 09:37 AM
Somebody proposed a workaround in the connect issue. I haven't tried it, but it looks viable. Basically, you query each partition individually then combine the results. Should work particularly well in your SELECT MAX(columnN) query.

As for your SELECT TOP 10 against the cluster key, you already KNOW those will live in a single partition, so you can query only that partition.

You can also create indexed views on the frequently accessed data.
Kasper Bengtsen
New Member
New Member

--
07 Jan 2011 12:12 AM
The proposed workaround does work yes, but is only an options if you have control over what SQL is generated from the above data access layers. Which you might not be in control of if you are using ORMs.

The problem could also be that you have 100's of SPs written already, that need to be changed and verified if simple things like MAX(ColumnN) is no longer possible.

... and then just the fact that loosing the transparency of whether or not the table is partitioned is a pain.


SELECT SELECT MIN (VisitId) FROM Traffic

Vs.

DECLARE @Partition int, @MaxPartition int SELECT @Partition = (SELECT TOP 1 $PARTITION.TrafficMonthYear(RequestDateTime) FROM Traffic ORDER BY RequestDateTime) SELECT @MaxPartition = (SELECT TOP 1 $PARTITION.TrafficMonthYear(RequestDateTime) FROM Traffic ORDER BY RequestDateTime DESC) CREATE TABLE #Maxes (VisitId int) WHILE (@Partition <= @MaxPartition) BEGIN INSERT INTO #Maxes SELECT MAX (VisitId) as VisitId FROM Traffic (NOLOCK) WHERE $PARTITION.TrafficMonthYear(RequestDateTime) = @Partition SET @Partition = @Partition + 1 END DECLARE @StartingNewVisitId int SELECT @StartingNewVisitId = ISNULL (MAX (VisitID), 0) + 1 FROM #Maxes DROP TABLE #Maxes


seems pretty over the top

But thanks for the input
Kasper Bengtsen
New Member
New Member

--
07 Jan 2011 12:40 AM
Posted By russellb on 06 Jan 2011 11:37 AM

As for your SELECT TOP 10 against the cluster key, you already KNOW those will live in a single partition, so you can query only that partition.


True ... but a query like

SELECT TOP 10 ColumnX, ColumnY, ColumnZ FROM PartitionedTable WHERE ColumnY = @SomeValue ORDER BY ColumnX, ColumnY, ColumnZ

for example, generates the same inefficient queryplan.
Kasper Bengtsen
New Member
New Member

--
09 Jan 2011 11:11 AM
Posted By Kasper Bengtsen on 07 Jan 2011 02:40 AM
Posted By russellb on 06 Jan 2011 11:37 AM

As for your SELECT TOP 10 against the cluster key, you already KNOW those will live in a single partition, so you can query only that partition.


True ... but a query like

SELECT TOP 10 ColumnX, ColumnY, ColumnZ FROM PartitionedTable WHERE ColumnY = @SomeValue ORDER BY ColumnX, ColumnY, ColumnZ

for example, generates the same inefficient queryplan.

Got ahead of myself there

The clustered index does not tell you anything about in which partition the data reside, the partitioning key does. So you can't assume this from knowing it's the clustered key.
russellb
New Member
New Member

--
10 Jan 2011 05:06 AM
True. My mistake. I should have said the partitioning key.
You are not authorized to post a reply.

Acceptable Use Policy