MSSQLTips.com - your daily source for SQL Server tips

Google
 
Web mssqltips.com

ESSENTIALS: Home | Tips | Search | Categories | Top 10 | Products | Authors | Blogs | Forums | Webcasts | Advertise | About
Non-Unique Clustered Index and Duplicate Value limits... - Chad Boyd

in Search

Chad Boyd

Non-Unique Clustered Index and Duplicate Value limits...

I spent this past week with a customer giving a workshop covering Sql 2000 architecture and performance related information (interpreting query plans, index structures, etc., etc.), and during the course of the week a few good points came to light that I decided to blog about.  This first one has to do with the creation of non-unique clustered indexes...

Many of you probably already realize that if you don't explicitly create a unique clustered index in Sql Server, the engine will unique-ify the clustered index for you anyhow, since it's used for pointers to data in nonclustered indexes. The engine will append a 4-byte value (when necessary) to any non-unique cluster key value as it's inserted into the cluster to unique-ify the key value within the cluster...there are a variety of places that cover this, including Books Online, and this isn't the topic of my blog post, but it's the basis for it, so I needed to ensure we covered that.

Now, in the workshop I gave this last week, I mentioned how I had once upon a time tried to produce a situation where a Duplicate Key error is generated on a non-unique clustered index by simply creating a table with a single column, then clustering the table on this column (in a non-unique fashion), and then inserting a whole-lot-o-records with the same value. I mentioned in the workshop that I was unsuccessful at producing this type of situation, and attributed it to the fact that the 4-byte value was probably allowed to make use of any character, not just numeric values, which would increase the amount of duplicate key values that could be sustained on a single cluster exponentially compared to the case where the value could only be numeric data. Well, if any of you guys from the workshop are reading along, I WAS WRONG!!!!  You can in fact produce this type of scenario, and it's even easier than I originally thought before mistakenly thinking the 4-byte value could include characters in addition to numeric values...

I started thinking about it more on the flight back from the customer, and then pulled up the script I used originally to try and produce the situation, and found that I had a logic error in my data insertion (nice one huh?)...in addition to this, I noticed on an internal DL that someone mentioned that the 4-byte value that is used to unique-ify a nonunique cluster key is in fact integer based, and doesn't include character data.

Well, this should make it easier to reproduce then, assuming I can get the script right anyhow :-). After I produced the situtation, I was surprised to find that not only does the uniqueifier contain integer data only, but only positive integer data at that...so, given that we know a 4-byte integer can only contain values between -2^31 and 2^31-1 (-2,147,483,648 thru 2,147,483,647), which simple math tells us results in about 4,294,967,295 distinct values. I figured originally that I'd be able to produce a Duplicate Key error as soon as I tried inserting more than 4,294,967,295 records with the same key value - however, the Duplicate Key error is actually produced once you try and insert the 2,147,483,648th record, which indicates that the unique-ifier is actually made up of postive 4-byte integer values only...intersting huh?

So, to produce the scenario I mention is pretty simple...just create yourself a table with a single column, then create a nonunique clustered index on the column, then start inserting data into the table using the same value continuously over and over and over and over and over until you get to just about 2,147,483,647 records...then, as soon as you try inserting the next one, you'll get an error as follows:

   Cannot insert duplicate key row in object 'dbo.ztblDupClusterKeyTest' with unique index 'ix_ztblDupClusterKeyTest_clust'.

So, if you ever plan to have a scenario where you are deploying an application that will make use of a clustered table that contains more than 2,147,483,647 records per distinct cluster key value, you'll have to figure out an alternative design...this could include any of the following:

   1) Add additional keys to the cluster key
   2) Physically partition the data into seperate tables
  
      NOTE: Using partitioning with Sql 2005 wouldn't solve this problem, since the cluster is table-wide
     
   3) Use a heap
  
Of course, this really isn't that much of a problem, since you probably won't see this scenario due to a variety of reasons, and not to mention that if it was a problem, we'd have known about it long ago from customers :-). But, nonetheless, something interesting...

I've included the script I used to produce this situation below...a single run of the entire test took about 2.75 hours on my machine...by the time I was producing the error, my test table had 2,147,483,647 records and was about 45,000 MB in total size...

On a side note, if anyone out there is working with record counts where you think this may be an issue, I'd be curious to hear from you...

Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.


----------------------------------------------------------------------
------------------ CODE ONLY BELOW ------------------
----------------------------------------------------------------------

-- 292,754,944 million rows to start with on my machine...
select 1 as clustId
into dbo.ztblDupClusterKeyTest
from sys.columns a
cross join sys.columns b
cross join sys.columns c;
go

-- Cluster now...
create clustered index ix_ztblDupClusterKeyTest_clust on dbo.ztblDupClusterKeyTest (clustId);
go

set nocount on;
go

-- Ok...now keep adding until we break...
declare @i int;
while 1=1 begin

 begin try
  -- about 10.17 million rows on my machine per iteration...
  insert dbo.ztblDupClusterKeyTest (clustId)
  select 1
  from sys.columns a
  cross join sys.columns b
  cross join (select top 23 column_id from sys.columns) c;
 end try
 begin catch
  print 'ERROR: ' + error_message();
  break;
 end catch

 -- Just to introduce a bit of a delay between iterations...
 checkpoint;
end

 

Published Oct 15 2007, 11:28 PM by Chad Boyd
Filed under: ,

Comments

No Comments

About Chad Boyd

Chad is an Architect, Administrator, and Developer with technologies such as Sql Server (and all related technologies), Windows Server, and Windows Clustering. He currently works as an independent consultant and also spends a significant amount of time writing, talking, presenting and blogging about Sql Server in person and online at http://mssqltips.com. In the past, Chad has worked with companies and organizations such as Microsoft Corporation and The American Red Cross, and provided consulting/support services at companies such as Bank of America, HP, Citigroup, Qualcomm, Scottrade, TJX, SunTrust, and Zurich Financial Services. For over 3 years with Microsoft Corporation Chad was responsible for providing onsite and remote support, guidance, and advice with SQL Server products to some of Microsoft’s foremost enterprise customers running the largest, most complex SQL Server installations and configurations in the world. This included all SQL Server products and versions, including SQL Server 7.0, 2000, 2005, and recently 2008, the SQL Server database engine, Reporting Services, SSIS/DTS, Notification Services, and Analysis Services on both 32 and 64 bit systems. Chad's primary responsibilities today include troubleshooting critical server situations, performance tuning and monitoring, disaster recovery planning and execution, architectural guidance for new Sql Server related deployments, and delivering deep technical workshops/presentations/proof-of-concept sessions covering a variety of technologies and functionality. Chad regularly posts Sql Server related content, tools, and advice with the mssqltips team at http://blogs.mssqltips.com/blogs and http://mssqltips.com. Chad can be contacted via his blog or email at chad dot boyd dot tips at gmail dot com.

This Blog

Syndication