join the MSSQLTips community

MSSQLTips.com - your daily source for SQL Server tips

Google
 
Web mssqltips.com

 
Chad Boyd

MSSQLTips

MSSQLTips.com - your daily source for SQL Server tips
Welcome to MSSQLTips Sign in | Join | Help
in Search

Chad Boyd

MSSQLTips - SQL Server Blog
  • Invoke-Nz - Powershell script similar to Nz VBA function

    I got a few pings about the Restore-SqlDb script requiring the Invoke-Nz script to execute - here it is. Similar functionality to that of the Nz function in VBA - basically pass along 2 values and the function will return the first value if it is present (i.e. non-null), otherwise the second (could also say it's similar to the isnull() tsql function, or a coalesce with only 2 parameters). Sorry I missed this with the original Restore-SqlDb posting.

    Invoke-Nz.ps1

    Thanks to everyone who reminded me,

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

    -------------------------------- CODE ONLY BELOW --------------------------------

    # See the Get-Usage function contents in the begin{} section for usage/comment details
    param
    (
     $value,
     $nullValue
    )
    begin {
     function Get-Usage {
    @"
     NAME
      Invoke-Nz
     
     SYNOPSIS
      You can use the Nz function to return a specified value when an input value is null/empty/unspecified.
     
     SYNTAX
      Nz <value> <nullValue>
     
     DETAILED DESCRIPTION
      You can use this function to convert a null/empty value to another value and prevent it from propagating
      through an expression. If the value parameter is not present/null, the nullValue value is returned.

      If the value of variant isn't Null, then the Nz function returns the value of variant.

      You can often use the Nz function as an alternative to the IIf function. For example, in the following code,
      two expressions including the IIf function are necessary to return the desired result. The first expression
      including the IIf function is used to check the value of a variable and convert it to zero if it is Null.
     
     PARAMETERS
      -value <any>
       Value to check for null/empty/etc. - if not present, the -nullValue is returned
     
       Required?   True
       Position?   1
       Default value  <required>
       Accept pipeline? True
       Accept wildcards? False
     
      -nullValue <any>
       A Variant that supplies a value to be returned if the variant value argument is not present.
     
       Required?   False
       Position?   2
       Default value  <required>
       Accept pipeline? False
       Accept wildcards? False
     
     INPUT TYPE
      Any,Any
     
     RETURN TYPE
      Either `$value or `$nullValue, depending on if the `$value is present.
     
     NOTES
      Alias created called "Nz"
     
      -------------------------- EXAMPLE 1 --------------------------
       Nz $testObj "this is null"
       
      -------------------------- EXAMPLE 2 --------------------------
       dir *.ps1 | Nz -n "NOFILE"

    "@
     } # Get-Usage
     if (($MyInvocation.InvocationName -ne '.' -and $MyInvocation.InvocationName -ne '&') -and
      ($Args[0] -eq "-?" -or $Args[0] -eq "/?" -or $Args[0] -eq "-help" -or $Args[0] -eq "/help")) {
       $showUsage = $true;
       &Get-Usage;
       return;
     }
     if ($MyInvocation.InvocationName -ne '.') { Write-Debug "Invoke-Nz::BEGIN"; }
     $InPipeline = $false;
     
     function Invoke-Nz {
      param
      (
       $value,
       $nullValue
      )
      begin {} # Invoke-Nz::begin
      process {
       # Store meta-data depending on pipeline vs. invoked processing...
       if (($_) -or ($_ -eq 0)) { $InPipeline = $true; }
       
       # If we don't have a $value value, figure it out now
       if (-not $value) {
        if ($_) {
         $value = $_;
        } else {
         # No pipeline input - so, if we are invoked throw an error, otherwise the $Try will be nothing...
         if ($MyInvocation.InvocationName -eq '&') {
          # Invoked, throw an exception
          Throw "The parameter -value is required. [$($MyInvocation.InvocationName)] [$($MyInvocation.MyCommand.Name)] [$_]";
         }    
        }
       }
       # If we have a value, show it if debugging, otherwise show a warning
       if ($value) {
        Write-Debug "Invoke-Nz::Have `$value value of [$value] [$_]";
       } else {
        # Not invoked, show a warning - don't throw an error, as that will stop pipeline processing if we're in the pipe...
        # Write-Warning "Invoke-Nz::No value could be determined for the `$value parameter"
        # NOTE: Traditionally I'd throw this warning, but since the entire purpose for this particular command is to return
        #   a value when a particular value isn't present, kind of makes sense not to do so here...
       }
       
       # Process the test
       if (($value) -or ($value -eq 0)) { $value; } else { $nullValue; }
      } # Invoke-Nz::process
      end {} # Invoke-Nz::end
     }
     if (-not (Get-Alias -Name nz -ErrorAction SilentlyContinue)) {
      Set-Alias -Name nz -Value Invoke-Nz -Description 'Nz function in VBA - if value parameter is not present, nullValue is returned.';
     }
    } # Invoke-Nz::begin
    process {
     # No processing if we are dot-sourced or if we were just asked for a little help...
     if ($showUsage -or $MyInvocation.InvocationName -eq '.') {
      return;
     }

     # pass processing to the function via pipelining or invoke...
     if (($_) -or ($_ -eq 0)) {
      $InPipeline = $true;
      $_ | Invoke-Nz $value $nullValue;
     } else { # if ($_)
      Invoke-Nz $value $nullValue
     } # if (($_) -or ($_ -eq 0))

    } # Invoke-Nz::process
    end {
     if ((-not $showUsage) -and ($MyInvocation.InvocationName -ne '.')) { Write-Debug "Invoke-Nz::END"; }
    } # Invoke-Nz::end

     

  • Restore-SqlDb - Automate a Database Restore (improved with Powershell)

    UPDATE: This script requires the Invoke-TryCatch.ps1 and the newly posted Invoke-Nz.ps1 scripts as well - thanks to all who pinged me about the Inovke-Nz.ps1 being required. 

    A while back I posted on Automating a Database Restore and it was quite a popular post, I also received multiple emails with requests for other features in the script such as allowing a restore to a different instance, extending support for SQL 2008 (if you try that script on a 2008 instance you'll notice an error occurs due to the result-set format of the "restore headeronly" statement to support new features like compression and encryption), support for restoring specific files/filegroups, removing the use of xp_cmdshell, etc. I've been wanting to update the procedure for a while anyhow to support some of these things as well and also to migrate the functionality over to PoSh (which is a more appropriate fit to be honest).

    This new PoSh version of the script supports everything from the original script plus other features like allowing a restore to be automated/initiated from any machine (as opposed to having to be on the server itself), restoring from one instance to another instance, simply outputting a script of the restore statement(s) and/or execute the restore, removing the dependency on using things like xp_cmdshell, supports SQL 2005 and 2008, exclude differential and/or log backups if you like, and a few others as well. On the PoSh side, this script supports all the major considerations any good PoSh script should such as:

    • Can be dot-sourced into a script
    • Can be invoked from a script (i.e. &restore-sqldb)
    • Fully supports pipeline processing for SMO Database objects and/or any object that can be string-expanded to a database name
    • Friendly usage output (run the script with a single '-?' parameter)
    • Debug and Verbose optional output
    • I don't support a -whatIf directly, but you get this by basically excluding the -execute switch (you'll get a restore script output)

    If you haven't read the original script post, I'd encourage you to take a look at it quickly (the text of the blog post, not necessarily the script) to give you an idea of what can be done with the script and hence this script - some of the functionality included allows for things like:

    • Restore a database with nothing more for information than what database and what instance it resides on
    • By default will pull restore information from the msdb database for the instance being restored from. This will basically query the appropriate backup meta-data tables for backup information on the database in question and build the restore statement(s) from that data appropriately including proper ordering, grouping of media sets/families, etc.
    • Can specify '-paths' that support wildcards and can include 1 or more locations to backup files for the database in question - the backup files will be:
      • Investigated for proper ordering of restore sequence
      • Expanded (if they are backup set files containing multiple backups) appropriately
      • Grouped correctly if part of a media family/set (i.e. if you use a backup statement with multiple output files)
    • Can now restore from one instance to a totally different instance (obviously you need to be able to connect to each and have appropriate privileges to do so)
      • Use the '-fromInstance' parameter to specify where to restore from
      • Use the '-toInstance' parameter to specify where to restore to
    • Can specify a new location to move log file(s) to during the restore - this will build the appropriate 'with move...' statement(s) into the restore script to move log files to the specified location(s)
      • You do not need to know anything about where the log file(s) already existed within the backup
    • Can specify new location(s) to move data file(s) to during the restore - this will build the appropriate 'with move...' statement(s) into the restore script to move data files to the specified location(s)
      • You do not necessarily need to include the same number of new locations as existing locations - if there are more data files than new locations, the script will simply round-robin the data files among the new locations
      • You do not need to know anything about where the data file(s) already existing within the backup
    • You can specify a '-stopAt' value that will mimic the 'STOPAT' statement within the restore
    • You can choose to ignore differential backups and/or log backups - by default the script uses all possible backups, this provides some flexibility
    • You can perform a page restore that will pull pages to be restored automatically from the msdb.dbo.suspect_pages table
    • You can perform a restore of only specific files or filegroups - simply include the appropriate logical filenames and/or filegroup names in the '-files' and '-filegroups' parameters
    • If you don't want to incur the overhead of a restore headeronly/filelist only operation and you write backups with a timestamp, you can specify the '-timeStampInFileNames' option and the script will shred each filename for a timestamp value that will act as the ordering/grouping values instead of performing a restore headeronly/filelistonly operation on each
    • Can restore the database with a new name via the '-newDbName' parameter
    • Support for liteSpeed syntax via the '-liteSpeed' switch
    • Checksum support via the '-checksum' switch
    • And much more...(just like on TV)

    For those of you familiar with PoSh arguments, you realize you don't have to necessarily include the entire name of a script parameter, just enough of it so the PoSh engine can distinguish it from the other parameter names - this will allow you to short-hand things like the '-toInstance' parameter to just '-to', or the '-fromInstance' to just '-from', or the '-dbName' parameter to just '-db', or just '-lite' for liteSpeed vs. the full '-liteSpeed', or...well, you get the picture.

    For detailed usage scenarios and some examples, just PoSh Restore-SqlDb.ps1 -?.

    Restore-SqlDb.ps1

    Enjoy!

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

  • Try/Catch/Finally with Powershell

    I ran across Adam Weigert's posting/script that provides a PoSh script that allows try/catch/finally like behavior - very cool. I started using the script myself and found that as much as I loved it, I wanted to add some additional 'features' to what it provides. I was looking for pipeline support (including the ability to throw 'soft' errors in the pipeline that would allow the pipeline to continue processing), some optional verbose debugging/error related information (particularly handy when using the script in a large distributed environment and you need information in your error logs about where the error came from, what system the script failed on, etc.), alias configuration, and further support for allowing friendly dot-sourcing of the script.

    So, here's a version adapted from what Adam wrote about, links to his original posting are included in the script as well (see the get-usage function).

    Enjoy!

    Invoke-TryCatch.ps1

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

    ----------------------------------------- CODE BELOW -----------------------------------------

    # See the Get-Usage function contents in the begin{} section for usage/comment details
    param
    (
        [ScriptBlock] $Try,
        [ScriptBlock] $Catch = { Throw $_ },
        [ScriptBlock] $Finally,
        [switch] $SoftErrorInPipeline
    )
    begin {
        function Get-Usage {
    @"
        NAME
            Invoke-TryCatch
        SYNOPSIS
            Simulates Try,Catch,Finally handling for scripts
        SYNTAX
            Try {<try_block>} -Catch {<catch_block>} [-Finally {<finally_block>}] [-SoftErrorInPipeline]
        DETAILED DESCRIPTION
            Same functionality as try/catch/finally with .NET - specify the script blocks that make
            up the try/catch/finally handling and call into the function using the sytax specified
            above and below.
            Supports input from the pipeline, dot-sourcing, and invoking directly - if a value is not
            specified for the $Try block and we are in the pipeline, the input object will be either
            cast directly to the $Try block (if the input object is a ScriptBlock object) or will be
            invoked via Invoke-Expression (which supports string-type input to be tried).
            Adapted from Adam Weigert's blog entry here:
    http://weblogs.asp.net/adweigert/archive/2007/10/10/powershell-try-catch-finally-comes-to-life.aspx
        PARAMETERS
            -command <ScriptBlock>
                Script block that makes up the Try section
                Required?            True
                Position?            1
                Default value        <required>
                Accept pipeline?    False
                Accept wildcards?    False
            -catch <ScriptBlock>
                Script block that makes up the Catch section
                Required?            True
                Position?            2
                Default value        <required>
                Accept pipeline?    False
                Accept wildcards?    False
            -finally <ScriptBlock>
                Script block that makes up the Finally section
                Required?            False
                Position?            3
                Default value        {}
                Accept pipeline?    False
                Accept wildcards?    False
            -SoftErrorInPipeline <switch>
                If set and being used in the pipeline, errors caught from the $try block in the
                catch will be raised via the write-error cmdlet vs. a direct Throw - this allows
                the pipeline to continue processing vs. being stopped as it will be if a Throw
                is used.
                Required?            False
                Position?            4
                Default value        False
                Accept pipeline?    False
                Accept wildcards?    False
        INPUT TYPE
            ScriptBlock,ScriptBlock,[ScriptBlock],[Switch]
        RETURN TYPE
            Dependent on the script blocks passed
        NOTES
            Alias created called "Try"
            Adapted from Adam Weigert's blog entry here:
    http://weblogs.asp.net/adweigert/archive/2007/10/10/powershell-try-catch-finally-comes-to-life.aspx
            -------------------------- EXAMPLE 1 --------------------------
                Try {
                    echo " ::Do some work..."
                    echo " ::Try divide by zero: `$(0/0)"
                } -Catch {
                    echo "  ::Cannot handle the error (will rethrow): `$_"
                    #throw `$_
                } -Finally {
                    echo " ::Cleanup resources..."
                }
            -------------------------- EXAMPLE 2 --------------------------
            The following example will pull all ps1 files from the current directory,
            pull the content of the file into the pipeline, and pass the content directly
            to the try block (default Catch block will simply throw any error).
                dir *.ps1 | get-content | try

    "@
        }    # Get-Usage
        if (($MyInvocation.InvocationName -ne '.' -and $MyInvocation.InvocationName -ne '&') -and
            ($Args[0] -eq "-?" -or $Args[0] -eq "/?" -or $Args[0] -eq "-help" -or $Args[0] -eq "/help")) {
                $showUsage = $true;
                &Get-Usage;
                return;
        }
        if ($MyInvocation.InvocationName -ne '.') {    Write-Debug "Invoke-TryCatch::BEGIN"; }
        $InPipeline = $false;
        # Info for reporting error data...
        if ($VerbosePreference -ne "SilentlyContinue") {
            $myInfoScriptFullPath = $MyInvocation.ScriptName;
            $myInfoScriptName = $myInfoScriptFullPath.Substring($myInfoScriptFullPath.LastIndexOf("\") + 1);
            &{
                # we simply ignore errors initializing myInfo
                $local:ErrorActionPreference = "SilentlyContinue";
                $script:myInfo = @{
                    "MachineName" = (get-content env:computername);
                    "MachineDomain" = (get-content env:userdomain);
                    "MachineDnsDomain" = (get-content env:userdnsdomain);
                    "UserName" = (get-content env:username);
                    "UserProfile" = (get-content env:userprofile);
                    "ScriptName" = $myInfoScriptName;
                    "ScriptFullPath" = $myInfoScriptFullPath;
                    "ScriptPath" = $($myInfoScriptFullPath.Substring(0, $myInfoScriptFullPath.Length - $myInfoScriptName.Length));
                }
            }
        }    # if ($VerbosePreference -ne "SilentlyContinue")
        function Show-Exception {
            param (
                [System.Management.Automation.ErrorRecord] $errobj
            )
            # Output some verbose error information if requested...
            if ($VerbosePreference -ne "SilentlyContinue") {
                $local:ErrorActionPreference = "Continue"
                $myInfo.PipelineProcessing = $InPipeline;
                $msg = $myInfo | Sort-Object | Format-Table -autosize | Out-String;
                $msg = "`n `n-------VERBOSE EXCEPTION INFORMATION------- `n `n$msg`n `n-------------------------------------------`n ";
                # toss it up...
                Write-Error $msg;
            }

            # Rethrow the error, either continuing the pipeline or not...
            if (($SoftErrorInPipeline) -and ($InPipeline)) {
                # If we are in a pipeline and
                $local:ErrorActionPreference = "Continue"
                Write-Error $errobj;
            } else {
                #Throw the exception...
                Throw $errobj;
            }
        }
        function Invoke-TryCatch {
            param
            (
                [ScriptBlock] $Try,
                [ScriptBlock] $Catch = { Throw $_ },
                [ScriptBlock] $Finally,
                [switch] $SoftErrorInPipeline
            )
            begin {}    # Invoke-TryCatch::begin
            process {
                # Store meta-data depending on pipeline vs. invoked processing...
                if (($_) -or ($_ -eq 0)) {
                    $InPipeline = $true;
                    if ($VerbosePreference -ne "SilentlyContinue") {
                        $local:ErrorActionPreference = "SilentlyContinue";
                        $myInfo.PipelineObject = $_;
                        $myInfo.PipelineObjectType = $_.GetType();
                    }
                }    # if($VerbosePreference -ne "SilentlyContinue")
                # If we don't have a $Try value, figure it out now
                if (-not $Try) {
                    if ($_) {
                        # Try setting/casting the input object to a script block - if it works, we'll use the pipeline
                        # object as the $try, otherwise we'll throw a parameter exception...
                        trap {
                            Write-Debug "Invoke-TryCatch::Could not convert pipeline object to ScriptBlock [$_]."
                            continue;
                        }
                        # Get the script block...
                        if ("$($_.GetType())" -eq "System.Management.Automation.ScriptBlock") {
                            $Try = $_;
                        } else {
                            $Try = {Invoke-Expression $_};
                        }
                    } else {
                        # No pipeline input - so, if we are invoked throw an error, otherwise the $Try will be nothing...
                        if ($MyInvocation.InvocationName -eq '&') {
                            # Invoked, throw an exception
                            Throw "The parameter -Try is required. [$($MyInvocation.InvocationName)] [$($MyInvocation.MyCommand.Name)] [$_]";
                        }               
                    }
                }
                # If we have a value, show it if debugging, otherwise show a warning
                if ($Try) {
                    Write-Debug "Invoke-TryCatch::Have `$Try value of [$Try] [$_]";
                } else {
                    # Not invoked, show a warning - don't throw an error, as that will stop pipeline processing if we're in the pipe...
                    Write-Warning "Invoke-TryCatch::No value could be determined for the `$Try block, will continue pipeline without processing this object"
                }
                # Process the try/catch/finally...
                & {
                    # Explicitly set our preference for local use...
                    $local:ErrorActionPreference = "SilentlyContinue";
                    trap {
                        trap {
                            & {
                                if ($Finally) {
                                    trap {
                                        Write-Debug "TryCatch::TRAP::Finally Block 2 Error [$_]";
                                        Show-Exception $_;
                                    }
                                    Write-Debug "TryCatch::Finally Block 2 Start";
                                    &$Finally
                                }    # if ($Finally)
                            }
                            Write-Debug "TryCatch::TRAP::Catch Block Error [$_]";
                            Show-Exception $_;
                        }
                        Write-Debug "TryCatch::TRAP::Try Block Error [$_]";
                        if ($Catch) {
                            # Pipe along the current exception to the catch block to be processed
                            $_ | & {
                                Write-Debug "TryCatch::Catch Block Start";
                                # Store the original exception record...
                                if ($VerbosePreference -ne "SilentlyContinue") {
                                    & {
                                        $local:ErrorActionPreference = "SilentlyContinue";
                                        $myInfo.Error = $_;
                                        $myInfo.ErrorInvocationName = $_.InvocationInfo.InvocationName;
                                        $myInfo.ErrorLine = $_.InvocationInfo.Line;
                                        $myInfo.ErrorCommand = $_.InvocationInfo.MyCommand;
                                        $myInfo.ErrorOffsetInLine = $_.InvocationInfo.OffsetInLine;
                                        $myInfo.ErrorPipeLineLength = $_.InvocationInfo.PipeLineLength;
                                        $myInfo.PipeLinePosition = $_.InvocationInfo.PipeLinePosition;
                                        $myInfo.ErrorPositionMessage = $_.InvocationInfo.ErrorPositionMessage;
                                        $myInfo.ErrorScriptLineNumber = $_.InvocationInfo.ScriptLineNumber;
                                        $myInfo.ErrorScriptName = $_.InvocationInfo.ScriptName;
                                        $myInfo.ErrorCategoryInfo = $_.CategoryInfo.GetMessage();
                                    }
                                }    #    if ($VerbosePreference -ne "SilentlyContinue")
                                &$Catch
                            }
                        }    # if ($Catch)
                    }
                    if ($Try) {
                        Write-Debug "TryCatch::Try Block Start";
                        &$Try
                    }    # if ($Try)
                }
                & {
                    trap {
                        Write-Debug "TryCatch::TRAP::Finally Block 1 Error [$_]";
                        Show-Exception $_ ;
                    }
                    if ($Finally) {
                        Write-Debug "TryCatch::Finally Block 1 Start";
                        &$Finally
                    }    # if ($Finally)
                }
            }    # Invoke-TryCatch::process
            end {}    # Invoke-TryCatch::end
        }
        if (-not (Get-Alias -Name try -ErrorAction SilentlyContinue)) {
            Set-Alias -Name try -Value Invoke-TryCatch -Description 'Simulates Try,Catch,Finally handling for scripts';
        }
    }    # Invoke-TryCatch::begin
    process {
        # No processing if we are dot-sourced or if we were just asked for a little help...
        if ($showUsage -or $MyInvocation.InvocationName -eq '.') {
            return;
        }

        # pass processing to the function via pipelining or invoke...
        if (($_) -or ($_ -eq 0)) {
            $InPipeline = $true;
            if ($SoftErrorInPipeline) {
                $_ | Invoke-TryCatch $Try $Catch $Finally -SoftErrorInPipeline;
            } else {
                $_ | Invoke-TryCatch $Try $Catch $Finally;
            }
        } else {    # if ($_)
            if ($SoftErrorInPipeline) {
                Invoke-TryCatch $Try $Catch $Finally -SoftErrorInPipeline;
            } else {
                Invoke-TryCatch $Try $Catch $Finally;
            }
        }    # if (($_) -or ($_ -eq 0))

    }    # Invoke-TryCatch::process
    end {
        if ((-not $showUsage) -and ($MyInvocation.InvocationName -ne '.')) { Write-Debug "Invoke-TryCatch::END"; }
    }    # Invoke-TryCatch::end

     

  • Estimating Data Compression ratios for all...

    One of my favorite features with SQL 2008 has been Data and Backup compression (which I discuss in more detail technically here) - this is not only because of the actual functionality it brings to the table, but also because of all the technical intricacies that it involves and the impact it can have on many other fun topics (fragmentation, storage, internals, etc.). Of course, the functionality is pretty cool too...

    One customer of mine was asking how they can get an idea of the level of compression the different flavors of data compression would have on all the different structures within their database - of course, most folks realize the system procedure sp_estimate_data_compression_savings that exists to provide just that - but, this customer wanted to be able to see this type of information for all structures within their database (partitions, indexes, heaps, etc.) and see where they would get the biggest bang for their buck so-to-speak.

    So, I went to work putting together a fairly simple procedure that would basically run through a database and execute that for each partition for each type of compression that each given partition wasn't currently set in (i.e., if the partition is NONE compressed, we want to see estimations for ROW and PAGE compression; however, if the partition is already ROW compressed, show estimations for NONE and PAGE compression (or un-compression in the case of NONE)). We also wanted to be able to filter on specific objects and/or thresholds for the minimum size of partition to bother checking.

    What came out was sp_estimate_data_compression_savings_all, and I figured we may as well be nice and share with everyone. There's no rocket science here or anything, but a pretty cool procedure nonetheless. Of course, we wouldn't recommend you run this on large production systems during peak hours or anything like that, but it is perfectly well suited for scanning on non-production systems to figure out where to concentrate your time in further investigation.

    And, as a final side note, it also includes some of the simple TSQL enhancements that only work with SQL 2008 (compound assignment, inline initialization, etc.) that I usually exclude from my system procedures for backward-compatibility, but since this applies to only 2008 anyhow, I could use them - makes for much cleaner, more easy to write code, that's for sure...

    Enjoy!

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

  • Automate a Database Restore

    NOTE: See an updated version of this functionality using Powershell and supporting some additional functionality here

    It's quite common to see automated/custom procedures for backing up a database/log - nearly everywhere I go companies have custom backup procedures and processes to handle backups, logging of backups, naming standards, locations, etc. However, it's rare that I find processes/procedures in place to automate the restore of database/log backups - so rare in fact that I can recall only a single time in the last 3 years that I've seen it. Kind of odd considering you typically want to ensure the recovery process goes quickly when needed - nothing underlines this more than an actual disaster, and when you're in the middle of one, the last thing you want to have to worry about is writing lines of code to build the restore statements for your recovery path, which could be made up of hundreds or thousands of backup sets.

    Enter the sp_backup_restoredb procedure - I wrote this procedure to automate the most common restore paths in a variety of scenarios. With it you can automate the restore of a database using msdb backup history information for the given database - don't have the history information, or are restoring to another server? No worries, just point it at the directory location (or locations) that hold the backups using an optional pattern filter and let it rip. Need to restore just a single file, file group, or page(s)? No problem either. Want to rename the database during the restore? Ok, no problem. Need to move the locations of the log/data files for the database during the restore? Just tell it where you want them to go and it will do the rest (and you don't even have to know what files exist in the database, or where they previously existed, nor do you have to know even how many files there are within the database - it will simply round-robin the files among the locations you specify). Want to use LiteSpeed? Sure, we can do that.  Do you use a single mediaset for each backup, or do you use a single mediaset for a group of backups, or do you use a single mediaset forever and always? No problem, handle them all. Need to recovery to a specific point in time? Just specify the value. Want to recover the database? Leave it in recovery? Use a checksum (or not)? Silently ignore restore errors? Suppress execution and output just the restore statements? Check, check, check, check, and check.

    There are lots of interesting uses for this type of procedure beyond just disaster recovery situations - can be leveraged to test recovery processes, to restore varying types of backups to a single reporting server, to validate what you have for backups, or for migration scenarios (and there are probably other cases as well).

    Here are some samples executions:

    -- Restore the testDb database, suppressing actual execution, using data from msdb, not performing

    -- recovery, native restore, and the most efficient path

    exec dbo.sp_backup_restoredb @dbname = 'testDb', @opts = 1;

    -- Same thing, only instead of using data in MSDB, use the 2 specified locations for any .bak file

    -- starting with 'testDb'

    exec dbo.sp_backup_restoredb @dbname = 'testDb', @restorepaths = 'c:\temp;\\backupServerB\backupShare\testDb;',

    @filePattern = 'testDb*.bak', @opts = 1;

    -- Same thing, only use LiteSpeed syntax...

    exec dbo.sp_backup_restoredb @dbname = 'testDb', @restorepaths = 'c:\temp;\\backupServerB\backupShare\testDb;', @opts = 5;

    -- How about changing the name on restore?

    exec dbo.sp_backup_restoredb @dbname = 'testDb', @newDbName = 'testDb_newName',
    @restorepaths = 'c:\temp;\\backupServerB\backupShare\testDb;',

    @opts = 5;

    -- What about moving the log/data files around? Here we will place data files for the database in the

    -- 4 specified locations (semi-colon delimited) - if there are less than 4 data files, they will simply

    -- be placed in the locations in the order specified up to the number of data files there are (so, if

    -- there were 2 data files, 1 would go to M:\SqlData and 1 to N:\SqlData). If there are more than 4

    -- data files, they will continue to round-robin among the specified locations in order specified

    -- until there are no more files (so, with 7 data files, you'd end up with 2 in M,N,O and 1 in P)

    exec dbo.sp_backup_restoredb @dbname = 'testDb', @newDbName = 'testDb_newName',
    @restorepaths = 'c:\temp;\\backupServerB\backupShare\testDb;',

    @moveLogsTo = 'l:\SqlLogs\',

    @moveDataTo = 'm:\SqlData\;n:\SqlData\;o:\SqlData\;p:\SqlData',

    @opts = 5;

    -- Want to stop at a particular point?

    exec dbo.sp_backup_restoredb @dbname = 'testDb', @newDbName = 'testDb_newName',
    @restorepaths = 'c:\temp;\\backupServerB\backupShare\testDb;',

    @moveLogsTo = 'l:\SqlLogs\',

    @moveDataTo = 'm:\SqlData\;n:\SqlData\;o:\SqlData\;p:\SqlData',

    @stopAt = '2008-07-29 15:52:20.310',

    @opts = 5;

    -- Same thing, only ignore the use of an DIFFERENTIAL backups

    exec dbo.sp_backup_restoredb @dbname = 'testDb', @newDbName = 'testDb_newName',
    @restorepaths = 'c:\temp;\\backupServerB\backupShare\testDb;',

    @moveLogsTo = 'l:\SqlLogs\',

    @moveDataTo = 'm:\SqlData\;n:\SqlData\;o:\SqlData\;p:\SqlData',

    @stopAt = '2008-07-29 15:52:20.310',

    @opts = 21;

    -- Perform recovery at the end of the restore process...

    exec dbo.sp_backup_restoredb @dbname = 'testDb', @newDbName = 'testDb_newName',
    @restorepaths = 'c:\temp;\\backupServerB\backupShare\testDb;',

    @moveLogsTo = 'l:\SqlLogs\',

    @moveDataTo = 'm:\SqlData\;n:\SqlData\;o:\SqlData\;p:\SqlData',

    @stopAt = '2008-07-29 15:52:20.310',

    @opts = 23;

    -- Force existing users out of the new database prior to restoring...

    exec dbo.sp_backup_restoredb @dbname = 'testDb', @newDbName = 'testDb_newName',
    @restorepaths = 'c:\temp;\\backupServerB\backupShare\testDb;',

    @moveLogsTo = 'l:\SqlLogs\',

    @moveDataTo = 'm:\SqlData\;n:\SqlData\;o:\SqlData\;p:\SqlData',

    @stopAt = '2008-07-29 15:52:20.310',

    @opts = 31;

    -- Perform a PAGE level restore, getting the pages to be restored from the msdb

    -- suspectpages database table...

    exec dbo.sp_backup_restoredb @dbname = 'testDb', @opts = 65; 

    The usage output looks like follows:

    USAGE:

    exec dbo.sp_backup_restoredb @dbname, @restorepaths, @moveLogsTo, @moveDataTo, @fileFilGroupPageString, @newDbName, @filePattern, @stopAt, @opts

    PARAMETERS:

    @dbname

    DB to be restored

    @restorepaths

    Path(s) to the files containing backups to be restored from, semi-colon delimited...if not passed, we try to grab information from MSDB table instead

    @moveLogsTo

    Path to location that log files for the database being restored should be moved to...semi-colon delimited list...

    @moveDataTo

    Path(s) to location(s) that data files for the database being restored should be moved to...semi-colon delimited list...if more data files exist than paths are passed, data files are simply restored in a round-robin fashion to the locations specified...if more paths are specified here than there are data files, the first paths listed are used up to the # of data files, then the other paths are simply ignored...

    @fileFilGroupPageString

    Is a string of either a file, filegroup, or page string that will be used (if passed) as the <file_or_filegroup_or_pages> portion of the restore string - should match the proper format as outlined in BOL for this section exactly, since we basically just append here.

    @newDbName

    Name of the restored database - if left default/null, the @dbname is used...

    @filePattern

    Pattern of files to match for within the @restorepaths - by default, is everything (i.e. *) - only valid if a value is specified for @restorepaths

    @stopAt

    Date/time to stop at within the restore, if specified...

    @opts

    Options that drive execution for the proc. As follows: 
    1 bit - If set, execution is suppressed and the strings are simply output...
    2 bit - If set, recovery is performed at the end of all restores...by default, db is left in norecovery state...
    4 bit - If set, LiteSpeed is used for recovery statements...
    8 bit - If set, we will forcefully drop existing connections to the db in order to allow restore...
    16 bit - If set, we will NOT use diff backups in the restore, only full and tlog backups...
    32 bit - If set, CHECKSUM is used for the restore - this is only valid if a native restore is used...
    64 bit - If set, PAGE level restore is used, and the pages to be restored are built from the data in the suspect_pages table in the MSDB. This cannot be used currently with LiteSpeed restores...
    128 bit - If set, and a value is set in @restorepath, we will try to find a time/date stamp within the name of each file found in the @restorepath matching @filePattern - we will simply try to find 14 concurrent numbers within the name to signify as such...
    256 bit - If set and the 1 bit is not set (i.e. we are executing), errors raised during the execution of the restore statements will be silently captured and reported without re-raising back to the calling code. Error number and message will be output as a print statement, but no error will be raised...

    Enjoy!

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

     

    Posted Jul 29 2008, 06:19 PM by Chad Boyd with 4 comment(s)
    Filed under: ,
  • Installing Clustered SQL Servers - Outline, Checklists, Document Sheets

    A high percentage of my interactions with clients revolve around clustering SQL Server in some manner or another - could be to review a high-availability solution, could be to stand up a new clustered instance of SQL, could be to troubleshoot a system that is having stability issues, etc. In all cases, I start with a set of checklists and informational-gathering sheets and tools to both review the existing state of the system, and also to understand the system we are working with - once we have all the configuration data, we run through the checklists to ensure the system is configured in a manner matching years of best-practices. If we are installing and configuring a new SQL Server instance within a cluster, I usually send the configuration sheets to my clients a few weeks in advance for them to fill-out in preparation for the install event - I've found that in nearly every case where we have all the configuration data prior to the install event, the installation and configuration goes off very smoothly and typically without a hitch...however, whenever the data isn't available, or a client wants to 'wing-it', the percentages of a smooth sail event drop significantly.  Additionally, whenever I arrive onsite to troubleshoot a server issue for a client, using the checklists as a guide to review the configuration generally yields a wide variety of things that are contributing to and/or causing the issue at hand.

    These checklists and informational sheets have been put together by myself over years of experience, with input from a variety of sources, most of which are referenced in the first section within the primary document included with the pages - the Installation and Configuration Outline document. This document is the starting point, and contains a plethora of links, supporting documents, and descriptions / experiences with why each setting or configuration recommended in the checklists is so.

    The checklists are meant as a guide/verification during the actual server cluster and sql cluster configuration/install, or while reviewing an existing system.

    These info-sheet documents are meant to aid in documenting the both the required information for proper/easy Sql cluster configuration (ip’s, network names, user accounts, etc.) as well as your cluster solution as a whole moving forward for reference and supporting documentation purposes.  Each of these documents should be completed in their entirety prior to beginning a cluster/Sql installation/configuration, and as such, can be used also as a guide for required information prior to beginning a configuration.

    So, here are the links to the actual checklists, info sheets, the outline document, and the sample completed info sheets (for a guide on the types of things you may want to consider including in the sheets, or as a guide on how you may want to consider filling them out). Note that included with each page is a downloadable word version of the documents as well:

    1) Installation and Configuration Outline

    This is the primary document and contains a complete outline of what needs to occur and when it needs to occur when preparing for/installing/configuring a cluster solution. At the very top, there are multiple sections which contain reference material for many of the different components of Sql/Windows/Clustering/etc. involved in the process; I’d recommend becoming as familiar with the contents of this document as possible, as it outlines much of the necessary/possible knowledge/best practices/issues/concerns/etc. involved with Sql Server clustering solutions. This document also references all of the other documents included in the attached file. This is supposed to act as an outline/guide for the actual process of preparing for, installing, and configuring a given Sql cluster solution, so typically it is read/followed from the top to the bottom as steps are completed, though you are more than welcome to browse through the entirety of the document(s) and even perform a ‘mock’ walkthrough if you like to become familiar with the process/documents/requirements/etc. 

    2) Info Sheet - Cluster Configuration

    3) Info Sheet - Node Configuration

    4) Info Sheet - Disk Configuration

    5) Info Sheet - SQL Configuration

    6) Checklist #1 - Pre-Cluster Configuration

    7) Checklist #2 - Post-Cluster Configuration

    8) Checklist #3 - Pre-SQL Installation

    9) Checklist #4 - Post-SQL Installation

    10) Sample Completed Info Sheets

     

    Enjoy, send along comments, recommendations, etc.

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

  • Katmai (Sql 2008) - Transaction Logging Enhancements

    Given some of the "larger" features being introduced with Sql 2008 which are getting lots of coverage and attention (understandably so), there are actually quite a few "smaller" features that are included which will provide a great deal of benefit to SQL Server users everywhere - one of those "smaller" features that hasn't been getting as much attention is improvements made in the database engine for bulk-logging / minimal logging of standard INSERT INTO statements and the new MERGE statement as well. Prior to this functionality, to get minimal-logging for an operation that required pushing data into an existing table with existing data would have required the use of partitioned tables/indexes and a merge/split/switch type operation where the data would have been bulk-loaded from a source into an empty staging table in your server, then switched into an empty partition within your pre-existing table. Naturally, this would necessitate the use of partitioning on the table, and inherently require you use the Enterprise edition of SQL Server (which is the only edition that supports partitioning). If you either didn't want to (or couldn't) partition your existing table, or ran a non-Enterprise version of the server, you really didn't have any options for bulk-loading into existing tables with existing data (baring a partitioned view configuration perhaps). This new enhancement in 2008 will allow bulk-loading / minimally-logged operations for many more scenarios than are possible today.

    Similar to the existing minimally-logged operations, there are some prerequisites for these statements to actually be minimally-logged - you can find a full and detailed list in SQL 2008 Books Online, and also a discussion about the different operations on the SQL Server Storage Engine team's blog.

    Sunil, a PM on the SQL Server Storage Engine Team, has a great 3-part series covering the enhancements, so I won't bother repeating what he has already described extremely well, instead I'll simply point you to each of the posts:

    Part 1

    Part 2

    Part 3

    I'll also leave a very simple sample script you can run to see some of the performance differences between the fully-logged operation in 2008 and the same statement in a minimally-logged execution...Sunil's posts referenced above have additional samples that go into much greater detail and cover a wide-variety of possible scenarios.

    Enjoy!

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

     

    -------------------------------------------------------------------------------------------------------------
    CODE ONLY BELOW
    -------------------------------------------------------------------------------------------------------------

    use AdventureWorks;
    go

    -- Ensure full recovery...
    alter database AdventureWorks set recovery full;
    go

    -- Create a simple table...
    if object_id('dbo.insertLoadTest') > 0
        drop table dbo.insertLoadTest;
    go
    create table dbo.insertLoadTest (id int, charval char(36), filler char(250));
    go

    -- Fully logged insert...
    use AdventureWorks;
    go
    truncate table dbo.insertLoadTest;
    go
    declare @d datetime2;
    select @d = sysdatetime();
    insert    dbo.insertLoadTest with(tablock) (id, charval, filler)
    select    top 500000
            row_number() over (order by a.object_id), newid(), 'filler'
    from    sys.columns a with(tablock)
    cross join sys.columns b with(tablock);
    -- Get the time difference...
    select datediff(millisecond, @d, sysdatetime());
    go

    -- Minimally logged insert...
    use master;
    go
    -- Using simple vs. bulk-logged simply to ease the fact that I'd have to
    -- perform log backups with bulk-logged...this makes it obviously easier...
    alter database AdventureWorks set recovery bulk_logged;
    go

    -- Rerun the same tests as above again...should notice a significant
    -- improvement in not only run-time, but also a large difference in
    -- log-space usage as well...
    use AdventureWorks;
    go
    truncate table dbo.insertLoadTest;
    go
    declare @d datetime2;
    select @d = sysdatetime();
    insert    dbo.insertLoadTest with(tablock) (id, charval, filler)
    select    top 500000
            row_number() over (order by a.object_id), newid(), 'filler'
    from    sys.columns a with(tablock)
    cross join sys.columns b with(tablock);
    -- Get the time difference...
    select datediff(millisecond, @d, sysdatetime());
    go

  • SSD and SQL - Fragmentation Impact

    In the last post on Solid State and it's impact on SQL Server operations, we looked at a variety of different IO patterns and sizes on multiple systems to see where and when SSD would help out and when it wouldn't. If you read the post and analyzed the data, you could clearly see the huge gains Solid State provides with random read IO over traditional spindles - writing and large sequential reads seem to still favor traditional spindles. So, assuming you are in a system that performs lots of random read IO, SSDs will provide you tons of benefits, and will be much more resilient to the impact of fragmentation on the system (since fragmentation leads to heavier random io as a general rule). Since we recently had a series about fragmentation, and in that series we had a post showing the impact of fragmentation in different types of IO and patterns, let's see how those same tests are impacted with a SSD vs. a traditional drive.

    I re-ran the same exact tests that I had run previously to test the different levels of fragmentation and their impact on types of and patterns of IO and then folded them into the same spreadsheet I posted previously alongside my desktop results. See my prior SSD post for specs on my desktop and laptop machines, and see this post in the fragmentation series for the original sheet with fragmentation impact results on my desktop only.

    Given what we know about SSD and my laptop, we'd expect to see the laptop be much more resilient to fragmentation in general, outperform the desktop for cold-cache random IO read operations, and likely perform better comparatively in single-threaded operations vs. multi-threaded operations (since my desktop is a quad-core xeon and my laptop is a dual-core centrino). Additionally, we don't expect the SSD to out-perform my desktop in any warm-cache operations, since, well, they'd never touch the drive in either scenario (and again, my desktop has 16gb of cache, and my laptop has 4gb of cache).

    I've attached the spreadsheet in it's entirety to the post, so you can take a peak at the total results if you like, it's actually quite interesting. As expected, the SSD is much, much, much more resilient to fragmentation impact vs. traditional spindles - for heavily fragmented data, the traditional spindle system degraded from 300% up to 2,500%, whereas the SSD system for the same tests ranged in degradation of 10% up to 580%. My laptop/SSD system outperformed my desktop by as much as 300% under fragmented conditions. In nearly all cold-cache tests the SSD system destroyed the traditional spindle system with the few exceptions coming on large scans of contiguous data (which would take advantage of serialized large sequential IOs with the read-ahead manager), which we already know traditional spindles are still better at.

    I've attached the full spreadsheet with detailed results, enjoy!

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

  • Partitioning Data for Query Performance - Where's the benefit?

    I spend a lot of time interfacing with SQL engineers of all types (developers, dba's, architects, ETL engineers, etc.) and something I hear very frequently is "I want to horizontally partition my data to improve my query response times to customers". This usually makes me follow up the statement(s) with questions of my own such as "what types of queries are you trying to improve response times on by partitioning your data?" or "what systems are you thinking of employing this type of strategy on ?".  Too often I get responses that in turn send us down a path of discussing why partitioning data isn't a viable strategy for improving many types of queries/workloads (such as typical OLTP workloads, seeks of data - particularly seeks that don't filter/join on partitioning keys, etc.). Sometimes engineers and others are downright shocked to see some of results of different query types against partitioned data - usually, by the end of a 1/2 hour demo and discussion and walking through some scripts (which I've attached as well), everything makes good sense and there's a good understanding of why things work the way they do. It's frequently a misconception that using the horizontal partitioning features built-in to the engine in SQL 2005 will lead to large benefits for OLTP-like workloads/queries/etc. (i.e. seeks of data, singleton lookups, etc.) when the reality is that horizontal partitioning is really more about improving performance for things like administrative tasks, bulk data loads/switches, and scans of data when the scanning can be limited by the partition ranges.

    This post is not about switching data with partitions, rolling window scenarios, administrative tasks with partitioning, bulk loading/moving data with partitioning, etc. - there is plenty of that in other places (but if someone would like to see some examples, by all means comment or contact me via email and I'll be happy to oblige). Instead, for the remainder of this post we are going to walk through query execution differences for a range of access methods (i.e. seeks, scans, singleton lookups, etc.) against a partitioned structure and a non-partitioned structure that are exactly the same with the exception of one having all indexes partitioned and one not being partitioned at all.

    The schema for the partitioned table is as follows:

    -- Create a sample table that will mimic the main table...

    create table dbo.Fund_Dtl (

    Fund_Dtl_ID int identity(1,1) not null,

    fillerColumn char(150) not null,

    Fund_Summary_ID int not null,

    Txn_Dt datetime not null,

    Partition_Column char(5) not null default ('xyz')

    ) on [PartitionPSFD] ([Partition_Column]);

    go

    -- Cluster...

    create clustered index Fund_Dtl_Fund_Summary_ID_CI on dbo.Fund_Dtl (

    Fund_Summary_ID,

    Partition_Column

    ) on [PartitionPSFD] ([Partition_Column]);

    go

    -- PK...

    alter table dbo.Fund_Dtl with check add

    constraint Fund_Dtl_pk primary key nonclustered

    (Fund_Dtl_ID,Partition_Column)

    on [PartitionPSFD] ([Partition_Column]);

    go

    -- Nonclustered...

    create nonclustered index ix_TxnDt on dbo.Fund_Dtl (

    Txn_Dt

    ) include (

    Partition_Column,Fund_Summary_ID

    ) on [PartitionPSFD] ([Partition_Column]);

    go

    As you can see, we include a clustered index and 2 nonclustered indexes (1 of which is the primary key) - the clustered index includes the partitioning column as the 2nd key column of the index, as does the primary key. Notice however that the 2nd nonclustered index doesn't include the partitioning column as a key column of the index, but instead only as a covered column (as an included column).  The script then proceeds to load 602,112 records into the partitioned table, filling the Partition_Column column with 7 distinct values (7 partitions) each with 86,016 records.

    Ok, then we create the non-partitioned table as follows:

    -- Create another table from the data in partitioned table that isn't partitioned...

    select *

    into dbo.Fund_Dtl_NoPartition

    from dbo.Fund_Dtl with(nolock)

    go

    -- Index it just like the main table, with no partitioning...

    -- Cluster...

    create clustered index c2 on dbo.Fund_Dtl_NoPartition (

    Fund_Summary_ID,

    Partition_Column

    );

    go

    -- PK...

    alter table dbo.Fund_Dtl_NoPartition with check add

    constraint Fund_Dtl_np_pk primary key nonclustered

    (Fund_Dtl_ID,Partition_Column);

    go

    -- Nonclustered...

    create nonclustered index ix_TxnDt_np on dbo.Fund_Dtl_NoPartition (

    Txn_Dt

    ) include (

    Partition_Column,Fund_Summary_ID

    );

    go

    As you can see, the non-partitioned table is identical to the partitioned table in every way with the exception that it isn't partitioned at all - the columns are the same, the indexes are the same, the data is the same, etc.

    Ok, at this point the script continues to run 6 different identical tests against the partitioned and non-partitioned tables, and commenting on the performance differences on my machine - I'd encourage you to download the attached script and run through it on your own system to become familiar with the setup, results, etc. (it is heavily commented). If anyone has additional tests they'd like to try out, or if anyone sees any discrepancies, by all means, let me know. I've included a brief summary of the tests included in the script and the results from my machine below for those who don't have access to anything for using the attached script (or mobile readers :-)). Enjoy!

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

    TESTS:

    1. SEEK (small range-scan) of data on the Fund_Dtl_ID column. This query does not filter on the partitioning column in any way, and we iterate 30,000 executions of the query. In this test, the run against the partitioned table finishes consistently in about 3.7 seconds on my machine, performs 8 scans (partition constant scans) and 14 logical reads. The non-partitioned table finishes consistently in about 2.1 seconds (1.6 seconds faster), performs a single scan and 3 logical reads. Obviously this scenario is won by the non-partitioned table.
    2. Same as test #1, only we also filter on the partitioning column with an equality against a constant value. The run against the partitioned table improves to 3.4 seconds with a single scan (partition elimination due to the constant equality) and 2 logical reads. The non-partitioned run drops to about the same number, finishing in 3.3 seconds with 3 logical reads and a single scan. The non-partitioned query lost some time due to a residual check of the data for the partitioning column. Call this scenario a tie.
    3. Same as test #2, only we create 2 additional indexes with the key columns reversed (i.e. Partition_Column then Fund_Dtl_ID). Both runs improve by a second or so in duration, but the results end in a tie again.
    4. Same as test #2, only instead of an equality against the partitioning column, we use a non-equality filter (>=). The run against the partitioned table finishes in about 5 seconds with 7 scans and 14 logical reads. The non-partitioned column wins significantly again, finishing in about 3.45 seconds with a single scan and 3 logical reads.
    5. This test introduces a SCAN operation, but for this test there is no filtering for the partitioning column, just a standard scan of the clustered index for each table, and performing only 30 iterations instead of 30,000. The partitioned table run finishes in just over 2.6 seconds consistently with 8 scans and 14,112 logical reads. The non-partitioned test finishes in nearly the same time (just over 2.5 seconds) performing a single scan and slightly less reads with 14,052. Another tie.
    6. The final test is again a SCAN operation, this time including an additional equality filter on the partitioning column and extending the iterations from 30 to 75. Here is the scenario where we expect partitioning to help immensely with a query, and it doesn't disappoint us for sure.  The partitioned table test run finishes in well under a second consistently (850 milliseconds or so) with a single scan and 2,016 logical reads. The non-partitioned table tests take just over 6 seconds consistently, performing a single scan and again performing 14,052 logical reads (just like the last test).
  • SSD and SQL - SQLIO performance

    Let's start the SSD blogs off with a comparison of the SQLIO tool running on my laptop and compare the results to some other traditional spindle based systems.

    First, let's outline the systems we'll be comparing throughout the SSD blogs:

    • First, my laptop. This is a loaded Dell XPS M1330 laptop running Windows Vista Ultimate x64. Specs include a 2.2Ghz Intel Core2Duo T7500 processor (dual-core, 800MHz FSB, 4MB L2 cache), 4gb DDR2 SDRAM @ 667Mhz, and a single 64gb Solid State Drive (Samsung)
    • Next, my desktop.  This is a Dell Precision Workstation T5400 running Windows Vista Business x64. Specs include a 2.0Ghz Intel Xeon E5405 (quad-core, 1333MHz FSB, 12MB L2 cache), 16gb DDR2 SDRAM @ 667Mhz, a single SATA II 10k RPM 146gb boot-drive (system, swap), and a single SATA II 7.2k RPM 750gb additional drive.
    • Server #1. This server is a Dell PowerEdge 2950 running Windows Server Enterprise 2003 R2 x64. Specs include 2 x Intel Xeon 5160 3.0Ghz processors (dual-core, 1333MHz FSB, 4MB L2 cache each (8MB total)), 16gb RAM (8 x 2 gb Dual-Rank), dual PERC 5/E adapters, a RAID-1 boot-drive (system,swap) with 2xSATA II 7.2k RPM 250gb drives, and 2 LUNs attached via an external Dell MD-1000 DAS enclosure. Each external LUN is made up of 7 SATA 7.2k RPM 500gb spindles each in a RAID-5 configuration. For tests, we'll only make use of the 2 externally-attached LUNs.
    • Server #2. This server is an HP DL380 running Windows Server Enterprise 64-bit Edition attached to an HP EVA 6100 SAN. Specs include 2 x Intel Xeon X5450 3.0GHz processors (quad-core, 1333MHz FSB, 12MB L2 cache each (24MB total)), 32gb RAM, tri emulex LP1050 HBA's, 2 internal 15k SAS drives RAID 1 for boot/swap, external log LUN with 8 SAS 15k 146gb drives RAID10 configured, 2 external data LUNs each with 20 SAS 15k 146gb drives RAID 10 configured for data files. SAN is connected via 2Gb FC fabric and dual Brocade 4gb silkworm fiber switches.

    As you can see, we have quite a range of systems from my laptop, a beefy desktop, a mid-size server/storage configuration, and a larger-size server/storage configuration. This should give us a good range of tests throughout the different scenarios. So, let's get this party started, first with some SQL IO comparisons.

    For our SQL IO test, we'll use the following parameters:

    • All tests are run back-to-back in a serial manner - each test hits either the LOG file or the DATA files (so any tests that have the log file on the same LUN as a data file are never mixing IO operations)
    • All tests use 8 processing threads, run for 10 minutes each, are enabled for multi-io, and allow up to between 16 and 64 outstanding IOs each
    • Files used include 2 data files and 1 log file
      • On my laptop, all files reside on the solid state drive and are sized at 15gb each.
      • On my desktop the log file and one of the data files resides on the 10k boot drive - the other data file resides on the 7.2k drive. All files are sized at 75gb each.
      • On Server #1, the log file and one of the data files resides on the first LUN from the DAS - the other data file resides on the 2nd LUN from the DAS. All files are sized at 100gb each.
      • On Server #2, the log file resides on the log lun, and the 2 data files each reside on a separate data LUN each. All files are sized at 100gb each.
    • The tests include each of the following:
      • 8kb random writes to the data files (pattern for checkpoints, tempdb, etc.)
      • 8kb random reads to the data files (pattern for random data reads, singleton seeks, etc.)
      • 64kb sequential writes to the log file (bulk log writes)
      • 8kb random reads to the log file (rollbacks, log reader, etc.)
      • 1kb sequential writes to the log file (small log writes)
      • 64kb sequential writes to the data files (checkpoints, reindex, bulk inserts)
      • 64kb sequential reads to the data files (read-ahead, reindex, checkdb)
      • 128kb sequential reads to the data files (read-ahead, reindex, checkdb)
      • 128kb sequential writes to the data files (bulk inserts, reindex)
      • 256kb sequential reads to the data files (read-ahead, reindex)
      • 1MB sequential reads to the data files (backups)

    So, let's take a look at the results - in each test we've captured the number of IOs performed per second, the number of MBs per second, the total amount of data transferred in GBs, and the Cost per GB where I took the cost of each system and correlated it against the total GBs transferred.

    First, here are the results for operations against the simulated DATA files/luns:

    image

    And secondly, here are the results for the tests against the simulated LOG files/luns:

    image

    So, where did the SSD perform well? Clearly, it is king in the random read scenarios, nearly keeping pace with Server #2 on each LUN - and, if you include costing in the figures, it's a landslide in favor of the SSD. These benefits for random read patterns are great for day-to-day computing, which is why I see such improvements on my laptop for day-to-day use. In the SQL Server world, this would be beneficial for traditional OLTP-like read patterns (i.e. small, singleton type reads of data) and also for something else we've looked at in detail recently - reads against heavily fragmented structures.

    For writes, it's a different story - the SSD lagged significantly behind all other systems in nearly every write scenario.  The same holds true for sequential read patters as well, which aligns with findings elsewhere. On a side note, I did perform some sequential read/write tests with smaller IO sizes (1kb and 4kb) for the data files on each of the systems except the Server #2 system and the SSD performs quite a bit better with these smaller sized IOs than the larger sized IOs that are performed typically in a SQL Server system - seems to follow findings from other reviews I've found and researched a bit on the web. Of course, we really care about how it performs with SQL for these posts, and as you can see, it really is beneficial for smaller, random IO patterns - with the larger and sequential type patterns, it looks like for now most traditional spindle systems will outperform them for now (at least the commodity versions that you can get in a laptop for example, I'm sure some of the solid-state systems built for enterprise deployments are a bit more valid for server scenario type tests, you can find these types of systems by most major storage manufacturers today like EMC, HP, Hitachi, etc.).

    Good start to our SSD posts - next I think will be the impact of these drives on fragmentation (since it's fresh in our minds and all).

    Enjoy!

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

  • SSD and Me

    I recently purchased 2 laptops with Solid State Drives in them (my loaded Dell XPS 1330 and my wife's Apple MacBook Air), and I have to say, I will never, ever again own a computer that doesn't have a Solid State Drive in it (at least not until they make something even better). These suckers are great, well worth the money IMHO (of course, I am a techie and find huge value in having the latest and greatest of technology to work with, so that may impact things I'm sure). Of things to note:

    1. Obviously, the performance - for boot times, running multiple concurrent applications, paging operations, and anything else that requires large amounts of random IO, these can't be beat today
    2. Power consumption - I've run my laptop for up to 6 hours with a full-brightness on the display - I do have a 9-cell battery, but still, 6 hours of battery time on my laptop is awesome.
    3. Heat - more accurately, the lack of - there is none...the fan never runs, it never gets warm on my lap, etc.
    4. Noise - again, more accurately the lack of - the laptops don't make any noise hardly at all. It's gotten to the point now where I think my keys making that clicky noise as I type is annoying.

    I used to leave my systems running all the time, 24*7, just so I wouldn't have to wait for a boot-up and re-launching all my apps, etc. (granted, this is a little over-the-top if I can't wait a few minutes and click 10 or 12 icons and wait another minute or so for them all to launch and load, but hey, you get used to things you have I guess) - now, I freely shut-down my laptop every night and boot-up in the AM with all my 7 or 8 apps that I launch in my startup group. Boot time on this thing is lightning fast (I have a work PC that has a 10k Raptor drive for it's boot system with 16gb of RAM and my laptop smokes it), and starting up applications is a breeze compared to traditional spindles.

    I also have to plug a little for the MacBook Air - IMHO again, this is by far the best laptop on the market today for home use. My M1330 is small for certain, but the MacBook Air is TINY - my wife tried using my XPS once and actually taunted me about it being heavy (it weighs just under 4 pounds - no heavyweight for sure, but again, all what you are used to). I love the XPS, don't get me wrong - but if I'm at home on the couch surfing the web, emailing, blogging, navigating pics/videos/etc., etc., I'll go for the Air every time. It's not just the size, but the usability, the gestures, touch integration, etc. makes it so enjoyable to use.

    Ok, so back to this blog. So given my pleasure with the performance of the drives, I started thinking it was high time to put them through some tests with regards to SQL Server operations and measure some of the impact they might have on different areas in the engine and otherwise. Over the next few blog posts, I'll be providing some analysis of the performance of different workloads/sql operations on my new laptop and SSD compared to the same operations on some other machines (like my work machine, and hopefully a production-like system or 2).  Things I plan to address are:

    1. SQLIO benchmarks for each system
    2. Checkpoint operations
    3. Backup/Restore comparisons
    4. Bulk load operations
    5. Heavy Sequential and Random read/write workloads
    6. Impact on Fragmentation

    I'm open to recommendations of other things I should plan to test against, feel free to email or comment on the post and I'll try to include what I can.

    Given that we know the SSD's are much better at Random IO operations than the traditional spindles, I'm expecting any heavy random IO operation to see some significant benefits over the traditional systems. Sequential write IO operations are typically better today on traditional spindles, so we'll verify if that impacts these operations in SQL Server and if so how.

    Look for the first in the series to come shortly!

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

  • Custom Index Defrag / Rebuild Procedures

    I often get asked for custom procedures to help with index maintenance, including things  such as reorganizing, rebuilding, stats updates, etc. There are quite a variety of these out there today (a simple google search will get you a bunch), and all have their pros/cons. The big thing I generally stress to customers when asked about this type of thing is to do what is best for their environment - a custom procedure that works perfectly for me in my environment(s) might be completely wrong for your environment.  Of course, there are some general rules of thumb that you can use in most any environment, but you get the idea - there is usually never a solution that is perfect for all environments (though that doesn't mean we can't keep trying to make one :-)).

    Some general observations that I've seen in multiple custom procedures in the past include:

    1. Remember that an INDEX REBUILD (or DBREINDEX) also updates statistics - there is no need at all to update statistics for the index after rebuilding (or before). In reality, you could end up with a worse case than you had - when performing an index rebuild, stats are updated using data gathered from a look at all the data in the index (after all, all the data had to be touched to rebuild anyhow), similar to if you had performed a stats update with a full-scan. If you then update stats on the index after the rebuild and use anything less than a full-scan option, you'll most likely end up with stats that aren't as 'accurate' as the full-scan version.
    2. Do not simply blindly rebuild/reorganize all indexes for all tables in all databases - I see this far too often.  Doing so not only wastes resources and processing time that can/could be used for other things, it also typically fails to give you any insight/feedback on the state of fragmentation in your system, or how fragmentation impacts your workload (if at all).
    3. Generally you should include some simple logic to weed-out small indexes (say for example, indexes with less than 5,000 or 10,000 pages). In many, many database schemas, there are 100's or 1000's of 'small' tables with multiple indexes - spending time rebuilding these can add up as the number grows. Use that time instead on the 10's or 100's of tables that really make a difference.
    4. If your custom procedure samples fragmentation of indexes, consider storing that data somewhere for analysis - you've done the bulk of the work by having the engine retrieve the stats, why not store them and report/analyze it so you can gain insight into the fragmentation levels of your data over time?
    5. Consider placing a time limit on the operation - it won't be an exact limit, but something that stops the execution of the procedure after a certain amount of time has passed and the last full operation completed would suffice in most cases - too many times I've had people call about sluggish/slow responding servers early in the morning when it boiled down to large nightly maintenance operations running long overnight.
    6. When using methods to determine fragmentation levels of an index to determine if it should be rebuilt or not, I typically see people use the "avg_fragmentation_in_percent" value as the hard guideline - you may also want to consider page density as a contributor, since that is logical fragmentation also (and impacts operations in cache as well as IO). If you're already looking for a level of avg_fragmentation_in_percent, consider using the same value as a guideline for "100 - avg_page_space_used_in_percent" as well.

    I'm sure there are more, but that's my starting list anyhow.  So, the next question I usually get is "well, what procedure do you use?" - that's a loaded question :-). I do have a generic procedure that I'll typically start with, and if the workload or environment requires it, I will modify it to be effective in that environment. I really can't stress this enough again - it ALWAYS depends on the environment - you should know your environment enough to be able to determine what is best for it, I typically won't be able to tell you what's best without any knowledge, or even a little knowledge.  So, to this post I've attached my generic version of a custom procedure that allows you to specify some options, and protects against a few things, like:

    1. You can specify a fragmentation threshold (%) that will be the low-water mark for considering which indexes to bother with (by using both the avg_fragmentation_in_percent and (100-avg_page_space_used_in_percent) values both)
    2. You can specify a date/time that the operation should stop if it passes
    3. Given the date/time specification, indexes are operated on in order of highest fragmentation levels to lowest, so the time spent is hopefully most beneficial
    4. You're allowed to define online/offline rebuilds, reorganizations, stats updates, partition level operations, etc.
    5. You can specify a location where you'd like to store the results of the stats check off indexes
    6. Stats will not be updated if you perform a rebuild
    7. Only indexes with > 10,000 pages will be touched

    It's a fairly simple procedure for general use.  If you'd like to get a look at a custom procedure that is a bit more specialized, offers additional options, etc., take a look at Ola Hallengren's version. This version allows you to specify a multitude of options and operations, can iterate over multiple databases, additional logging operations, parameterized values for things like page counts and fragmentation levels, etc. (it's also documented quite well).

    Between the 2 solutions, I'm sure you could likely cover a wide variety of scenarios and workloads to find what works best for you.  Enjoy!

    sp_index_defrag.sql

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

  • Fragmentation Station - Stop #8 - Hands On Experience

    In our final stop for the fragmentation series we're going to walk through a complete script that will cover almost everything we've discussed in the series, including:

    • New object creation and allocation (heaps, clustered, non-clustered indexes)
    • Impact of parallel operations vs. serial, sort_in_tempdb, etc.
    • Full scan, range scan, and singleton seek performance tests against cold and warm caches with:
      • No fragmentation
      • Logical fragmentation (unordered pages with high page density and low page density)
      • Extent fragmentation (from concurrent allocation operations and multiple files for a single index)
    • Comparison of Defrag vs. Rebuild for multiple types of structures with multiple types of fragmentation
    • Output of full comparison for performance related measures (physical reads, logical reads, test durations, etc.) of each of the above for multiple tests

    This script should allow you to see and understand first-hand the impact of different types of operations against different types of structures in a variety of fragmentation levels and validate many of the things we discussed in terms of fragmentation's performance impact on operations, what causes fragmentation, understanding the output of detection for fragmentation, the different types of fragmentation and the impact on operations, and the different ways to address fragmentation once you have it and need to fix it. I'm going to walk through the script and what happens at a very high level in the remainder of this post, but to get details you'll want to run it and view the output for yourself in your own test environment.

    First, some logistics:

    • The script is built to be executed all at once and spits out a bunch of information and notes for you to consume once complete - alternatively, you can simply walk down the script manually if you like as well, following step-by-step
    • This script uses both the built-in SQL methods for detecting fragmentation as well as the custom procedure for viewing page linkage and summary information
    • The script needs to be run in SQLCMD mode, as there are 5 different variables you need to set for customizing to your environment. This is a script that will work with SQL 2005 only.
    • The script will create 2 databases (named as you specify and creating files in the location you specify) and also make use of Database Snapshots in the final portion of the script (defraging/rebuilding comparisons), so to successfully run that portion of the script it needs to be tested on either the Enterprise or Developer edition of SQL 2005...the databases will not be dropped at the end of the script, so be sure to do that if you want to remove them when finished testing
    • The output of the script will include detailed notes about preceding/following statistics if you set the "printComments" flag to 1 at the top of the script. Notes about results and operations are delimited by strings of dashes ("-") printed across the screen, then the comments, then an ending delimiter of dashes again.
    • Major 'sections' of the output are delimited by large headers that are equal signs (i.e. "=") printed across the screen, then the comment about the section, then an ending equal signs delimiter

    Ok, that should cover the logistics, below is an analysis of the results from the test run on my desktop which have been attached to this post in a spreadsheet called "desktopResults.xlsx". Additionally, I've attached to this post the output of the script on my desktop, as well as the script itself.

    My desktop machine is a Dell Precision Workstation T5400 64-bit machine, quad-core Xeon E5410 (2.33Ghz, 12Mb L2 cache, 1333 FSB), 16 GB DDR2 SDRAM, a 160GB 10k 16mb cache SATA boot drive, and a 500GB 7.2k 16mb cache SATA data drive. For these tests, the log files for the databases resided on the data drive and the data files resided on the faster boot drive. Multiple test runs were performed for both MAXDOP=1 and MAXDOP=0 configurations and I am reporting the results of one test from each that fell into the 'middle' of the test results for the given configuration.

    Some key points to takeaway from and inspect in the output include:

    • As expected, "cold cache" tests are significantly more impacted by fragmentation than "war cache" tests - the slowdown in warm-cache tests is not from improperly ordered page chains or interleaved pages, but instead from the page density of each page in cache being lower and requiring more logical reads (i.e. reading more pages) to return all the data. Notice the warm-cache tests for our logically fragmented structure with a higher page density is significantly less impacted than the same tests with a low page density.
    • Also as expected, scan operations are significantly impacted by higher fragmentation levels, whereas seek operations are hardly impacted at all (both warm and cold cache tests).
    • Comparing run times for parallel vs. sequential plans (i.e. maxdop = 0 vs maxdop = 1), you should notice that warm-cache tests benefited significantly from a parallel operation (both fragmented and non-fragmented data), cold-cache tests with non-fragmented data were often impacted negatively, and cold-cache tests with fragmented data benefited marginally. This is as expected, since parallelizing a plan is all about helping performance with added CPU power, not for improving IO throughput (the read-ahead manager is responsible for that). Warm-cache operations are all about churning through cached pages as fast as possible, so parallel plans would help this significantly. Non-fragmented data in a cold-cache scan operation would be optimized by the read-ahead manager feeding a single sequential operator as much as possible - adding additional threads to read additional data wouldn't help this type of operation much at all. With fragmented data, additional threads requesting different data pages may help marginally since the read-ahead manager can't do it's job optimally (non-contiguous pages), however again, it will be marginal at best typically. Since the disk system and IO operations are where fragmentation introduces the most pain, optimizing IO throughput is where benefits will be seen, not in adding CPU power.
    • A clustered object typically inherently contains a bit more "fragmentation" than a heap, particularly with insert-heavy type workloads. A big difference between a heap and a clustered table is that in a cluster, the storage structure is a B-Tree (balanced tree) where the leaf pages are the data and above the leaf there are intermediate pages and a single root page. These non-leaf levels of the B-tree are maintained as data is modified within the index. A heap does not have these non-leaf pages, since a heap is not a B-tree, it is simply a bunch-o-pages that are allocated as data is modified in the table.So, as data is inserted into a heap, the engine will simply try to get a free page somewhere, if it does not exist, it will allocate a new extent, and then continue using pages, there is no maintenance for non-leaf pages here, since they don't exist in a heap - in this way, no matter how much data you continue to add, so long as the extents that are next in the file haven't been taken up by something else, you'll get relatively contiguous data pages.With a cluster, as data is inserted, even if its ever-increasing data like an identity or date/time value, as the non-leaf pages are filled up, additional non-leaf pages will have to be allocated for storing the data for the upper-levels of the index.
    • Using a serial build (maxdop = 1) and using the sort_in_tempdb option can help significantly with building indexes that are more contiguous when initially built. When SORT_IN_TEMPDB is set to OFF, the default, the sort runs are stored in the destination filegroup. During the first phase of creating the index, the alternating reads of the base table pages and writes of the sort runs move the disk read/write heads from one area of the disk to another. The heads are in the data page area as the data pages are scanned. They move to an area of free space when the sort buffers fill and the current sort run has to be written to disk, and then move back to the data page area as the table page scan is resumed. The read/write head movement is greater in the second phase. At that time the sort process is typically alternating reads from each sort run area. Both the sort runs and the new index pages are built in the destination filegroup. This means that at the same time the Database Engine is spreading reads across the sort runs, it has to periodically jump to the index extents to write new index pages as they are filled.  If the SORT_IN_TEMPDB option is set to ON and tempdb is on a separate set of disks from the destination filegroup, during the first phase, the reads of the data pages occur on a different disk from the writes to the sort work area in tempdb. This means the disk reads of the data keys generally continue more serially across the disk, and the writes to the tempdb disk also are generally serial, as do the writes to build the final index. Even if other users are using the database and accessing separate disk addresses, the overall pattern of reads and writes are more efficient when SORT_IN_TEMPDB is specified than when it is not.  The SORT_IN_TEMPDB option may improve the contiguity of index extents, especially if the CREATE INDEX operation is not being processed in parallel. The sort work area extents are freed on a somewhat random basis with regard to their location in the database. If the sort work areas are contained in the destination filegroup, as the sort work extents are freed, they can be acquired by the requests for extents to hold the index structure as it is built. This can randomize the locations of the index extents to a degree. If the sort extents are held separately in tempdb, the sequence in which they are freed has no effect on the location of the index extents. Also, when the intermediate sort runs are stored in tempdb instead of the destination filegroup, there is more space available in the destination filegroup. This increases the chances that index extents will be contiguous.
    • Note that the performance of a multi-file index on the same spindles is more like the extent fragmented performance than the contiguous performance. Of course, this is my desktop system and not a high-end many-spindle SAN/DAS system or anything, which could obviously produce different results. Key thing to understand is that multi-file systems introduce some level of extent fragmentation into a system for the added benefit of IO throughput and space (assuming you have a properly configured IO system with multiple spindles and IO paths and a BUS that isn't overloaded, etc., etc.). If you'd like to see what a difference it would make, rerun this same test on the same database after adding the -T1118 and -E startup switches and restarting the server. You would notice that the fragment sizes go back up to the 30-40 range, and the fragment count is back down to near single-file-database numbers. This shows what type of benefit the -E and -T1118 startup switches can provide when used appropriately.
    • While running the tests, consider monitoring the impact on your IO system and cache. If you look at the buffer pool cache size and IO sizes during the above tests, you should notice that when performing contiguous IO operations, the avg. IO sizes are consistently above 64k in size, up to 256k in size maybe even. When performing non-contiguous IO operations, the avg. IO sizes are most likely consistently lower than 64k - the more those sizes approach 8k, the more IO operations that are necessary to transfer the same number of data, and hence the longer the run times.
    • Pay close attention to the impact on defragmenting structures that a defrag vs. a rebuild has and correlate that back to what you know about performance - might make you rethink your current de-fragmentation plan.

    Enjoy, post or email any questions!

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

  • Fragmentation Station - Stop #7 - How to address it

    In our 2nd to last post in the Fragmentation series, we'll discuss our options on addressing and removing/correcting fragmentation. In our next and final post in the series, we'll end with a full-fledged SQL script that will walk you through all the different things we've talked about in the series to give you some up close and personal experience with what we've gone through.

    Once you've detected that you have fragmentation, understand what it impacts in your workload, and understand the different types of fragmentation and the type you're trying to correct, you have a couple of options to address it - realistically, you have 3:

    • DEFRAG / REORGANIZE - A defrag operation (called a reorganize in 2005) will address logical fragmentation by trying to both increase page density in an index(es) and also by re-ordering pages in-place to create properly ordered pages as best as possible. A defrag will not however create contiguous pages if the existing pages are not contiguous, as a defrag does not allocate new pages during the operation, it only shuffles existing allocated pages in place trying to reorder them into proper logical order. Therefore, a defrag/reorg will not address extent fragmentation at all, only logical fragmentation - however this is traditionally the most intrusive type of fragmentation, so many times a defrag/reorg is the best option.
    • REINDEX / REBUILD - A reindex/rebuild operation will do everything that a defrag/reorganize does and also try and build fully contiguous pages as well by allocating new pages to the index where appropriate. A rebuild is a completely atomic operation (i.e. it is all or nothing), and is an 'offline' operation as well (though 2005 Enterprise Edition allows a new online rebuild option as well). One other side-benefit to a rebuild is that statistics related to the index are rebuilt in addition to the index itself (unlike a defrag)
    • DROP and CREATE - Typically there is no difference between a full rebuild and a drop/create operation. One benefit to a drop/create is that you can also change the index keys and included columns (sql 2005).

    The most frequently used options are the 1st and 2nd listed - here's a matrix comparing some of the key functional differences between the rebuild and reorganize options:

    Functionality REBUILD / DBREINDEX REORGANIZE / DEFRAG
    Online / Offline Offline; Online possible with 2005 Enterprise Edition Online
    Faster when logical fragmentation is High Low
    Parallel Processing Yes No
    Compacts Pages Yes Yes
    Can be stopped/started without losing completed work to that point No Yes
    Able to untangle interleaved pages Yes No
    Additional free space required in data file for process Yes - 1.2x - 2x existing size No
    Faster on larger indexes Yes No
    Rebuilds Statistics Yes No
    Log Space Used Full Recovery - High; Bulk/Simple Recovery - Low; Depends on work performed
    May skip pages on busy system No Yes
    Can specify additional options (fillfactor, etc.) Yes No

    Of course, one other option to 'addressing' fragmentation could be to choose to not fix it. Hopefully you're not the type of engineer who simply rebuilds/defrags indexes every day just for the sake of doing so or because it makes you feel better - instead, try to work out a plan to understand which indexes in your schema actually matter the most, are impacted by the different kinds of fragmentation, etc. and those that aren't.

    Let's take a look at the internals for defrag and reorganizing - reorganizing an index is performed in basically 2 stages:

    1. The page compaction stage - this stage is meant to try and make pages have a 'fullness' near the original fillfactor specification for the index. This operates at the leaf-level of the index only, and pages are compacted by shuffling rows towards the left-side of the B-tree and dropping ghosted records and freeing pages made empty.
    2. Page defrag-ing stage - this stage is meant to make the logical order of pages match the allocation order. Again this operates at the leaf-level only, by performing a logical-ordered scan and an allocation-ordered scan in lockstep with one another, re-establishing a scan position after every page. This is a totally online operation, aside from an eXclusive lock on each page for the duration of each given re-ordering. Remember that no new pages are allocated during a defrag operation (aside from the single page for temporary space), which means that logical re-ordering is achieved by shuffling pages that are already allocated to the index only - this is why a defrag operation can't solve interleaved pages, since only existing pages are shuffled.

    Let's take a look at some visual examples on how these stages work - the defrag-ing stage is the same for both 2000 and 2005, but the page compaction stage uses a different and improved in 2005 using a new sliding-window algorithm.

    SQL 2000 Defrag Compaction (left-to-right, top-to-bottom)

    clip_image002[5]clip_image004[8]clip_image006[8]

    clip_image008[7]clip_image010clip_image012

    clip_image014clip_image016clip_image018

    clip_image020

    SQL 2005 Reorganize Compaction (left-to-right, top-to-bottom)

    clip_image002clip_image004

    clip_image006clip_image008

    Page Reordering Stage (left-to-right, top-to-bottom)

    clip_image002[7]clip_image004[10]clip_image006[10]

    clip_image008[9]clip_image010[5]clip_image012[5]

    clip_image014[5]clip_image016[5]

    That wraps up our 2nd to last post in the series, in our next and final post we'll wrap it up with an all-in-one script for everyone to use in a test environment of your own to verify everything we've discussed.

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

  • Fragmentation Station - Stop #6b - Custom Detection Procedure, Page Linkage

    In post #6 of the series, I mentioned that I often get asked for alternative methods of looking at fragmentation, a way to view the page chain or linkage, and ways to get insight into which pages are out of order in a given structure. There is nothing super graceful to be honest, but you can make use of a DBCC statement (DBCC IND()) to get page linkage and order data, then using some relatively complex querying against the data captured you can see some of this type of information.

    This post has an attachment (customFragInfo.sql) that contains the following procedures:

    • zcpReinitDbccInd - This procedure takes a single parameter (@tableName) which defines which table to get page-level information for. The procedure will then create a single table called 'ztblDbccInd' (if it exists it will drop it) which is used to store the results from DBCC IND() into. This will produce a table that contains a single record for every allocated page for all indexes in the given table. We then strip out all pages except data pages, add a few columns to the table, and update them to include indicators for the appropriate logical and physical positions of each page (based on the page linkage information present on the pages).  This is all done with a fairly complex recursive CTE and update statement. We then build some indexes and call it good. This procedure has to be called first on the table you want to analyze with the following procedures.
    • zcpShowPageLinkage - This will show a record for each page at the leaf-level of each index (or all data pages in a heap) sorted by the logical ordinal position of each page - the logical ordinal position meaning the order the page should fall in if you were to scan the data from front-to-back using an ordered-index scan of the data (doesn't apply for heaps, since they don't hold any logical order). If you see a value other than '0' in the physicalPageDiff column, this tells you that this page is that # of pages away from the prior page in the logical order, which indicates that there are other pages of data between the 2 pages - if the value is negative, then that page is earlier in the file; if positive, that page is later in the file (which comes into play when performing a scan of data).
    • zcpShowFragSummary - This will show a summary of the fragmentation for the given table's index(es) and optionally an additional 2 result sets if you set the single parameter (@showFragPages) to 1. The 1st is a summary of the count of logically fragmented pages, file-level fragmented pages, non-contiguous pages, non-leaf pages, and the total # of pages for each index/heap in a given table. The 2ndshows each page that is logically fragmented, and earlier physically within the file than the prior page, as well as the previous and next page logically ordered for comparison purposes (each logically unordered page will show a 'position' value of '00', whereas the previous page will have a position value of '-1', and the next page will show a position value of '+1'). The 3rd result set is the same as the2nd, except it shows logically unordered pages that are later physically in the file than the prior page.

    That's all of them, we'll be using these in the final post in the series where we get a full hands-on script, enjoy!

    Chad Boyd ~~~ This posting is provided "AS IS" with no warranties, and confers no rights. Use of any included script samples are subject to the terms specified at http://www.mssqltips.com/disclaimer.asp and http://www.mssqltips.com/copyright.asp.

More Posts Next page »

This Blog

Syndication