MSSQLTips.com - your daily source for SQL Server tips

Google
 
Web mssqltips.com

ESSENTIALS: Home | Tips | Search | Categories | Top 10 | Products | Authors | Blogs | Forums | Webcasts | Advertise | About
Sorting and comparing unicode and binary data - why do I get a weird sort order? - Chad Boyd

MSSQLTips

MSSQLTips.com - your daily source for SQL Server tips
Welcome to MSSQLTips Sign in | Join | Help
in Search

Chad Boyd

MSSQLTips - SQL Server Blog

Sorting and comparing unicode and binary data - why do I get a weird sort order?

A customer recently questioned me regarding a sorting issue they were having in the following scenario:

   My database is running under the "SQL_Latin1_General_CP1_CI_AS" Sql collation, but when I run
   the script below I seem to get incorrect sort results.  Why does the "abc-test" item come 
   immediately after the "abctest" item and not after "abctest1" as I would expect it to?

      create table #temp1 (
       ServerItem nvarchar(260) collate SQL_Latin1_General_CP1_CI_AS primary key clustered
      )

      insert #temp1
      select 'abctest'
      insert #temp1
      select 'abc-test'
      insert #temp1
      select 'abctest1'
      insert #temp1
      select 'abc-test1'

      select * from #temp1
      order by ServerItem

      drop table #temp1

      RESULTS:
     
      ServerItem
      ------------
      abctest
      abc-test
      abctest1
      abc-test1


   Is this right? Is this a bug?

No, this is not a bug - and yes, this is correct and by design.  The reasoning lies in how certain SQL collations particularly handle sorting rules for unicode data...this collation in particular uses a word sort algorithm to sort the unicode data, which ignores punctuation characters (and hyphen is one of those, amoung many other such as apostrophe's, dialect characters, etc.). The simple rule to follow when comparing or sorting multiple unicode strings is to not assume that it is done character-by-character, as it isn't.

A similar distinction occurs when using a Binary code page, like Latin1_General_BIN for example (popular amoung many cross-platform COTS applications). Did you know that if you are using a binary sort order that UPPERCASE letters will always sort before lowercase letters?  This is because binary sorting sorts based on the actual bit representation of the characters, and UPPERCASE letters have a lower ascii-code than that of lowercase letters.  Using a similar example as above, here is what you would see using a binary sort order:

   create table #temp1 (ServerItem nvarchar(260) collate Latin1_General_BIN primary key clustered)

   insert #temp1
   select 'abctest'
   insert #temp1
   select 'ABCTEST'
   insert #temp1
   select 'BBCTEST'
   insert #temp1
   select 'bbctest'
  
   select * from #temp1
   order by ServerItem

   drop table #temp1

   RESULTS:
  
   ServerItem
   -------------
   ABCTEST
   BBCTEST
   abctest
   bbctest

See how BBCTEST comes before abctest?  You'd see the same type of result if you changed "BBCTEST" to just "Bbctest".

Multiple things are taken into consideration when sorting/comparing data in Sql Server (and many other platforms) including your collation setting, code page, binary/non-binary, unicode vs. ascii data, etc. Be sure to understand differences between all characteristics of your system before deploying...


Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

 

Comments

No Comments

About Chad Boyd

Chad is an Architect, Administrator, and Developer with technologies such as Sql Server (and all related technologies), Windows Server, and Windows Clustering. He currently works as an independent consultant and also spends a significant amount of time writing, talking, presenting and blogging about Sql Server in person and online at http://mssqltips.com. In the past, Chad has worked with companies and organizations such as Microsoft Corporation and The American Red Cross, and provided consulting/support services at companies such as Bank of America, HP, Citigroup, Qualcomm, Scottrade, TJX, SunTrust, and Zurich Financial Services. For over 3 years with Microsoft Corporation Chad was responsible for providing onsite and remote support, guidance, and advice with SQL Server products to some of Microsoft’s foremost enterprise customers running the largest, most complex SQL Server installations and configurations in the world. This included all SQL Server products and versions, including SQL Server 7.0, 2000, 2005, and recently 2008, the SQL Server database engine, Reporting Services, SSIS/DTS, Notification Services, and Analysis Services on both 32 and 64 bit systems. Chad's primary responsibilities today include troubleshooting critical server situations, performance tuning and monitoring, disaster recovery planning and execution, architectural guidance for new Sql Server related deployments, and delivering deep technical workshops/presentations/proof-of-concept sessions covering a variety of technologies and functionality. Chad regularly posts Sql Server related content, tools, and advice with the mssqltips team at http://blogs.mssqltips.com/blogs and http://mssqltips.com. Chad can be contacted via his blog or email at chad dot boyd dot tips at gmail dot com.

This Blog

Syndication