Thursday, 14 May 2009

SQL Server 2008 Resource Governor

Background
As the number of queries hitting a database grows, contention for memory and CPU resources increases, giving rise to problems such as runaway queries that monopolize resources and deny them to other applications, and unpredictable workload execution, which occurs because jobs run more quickly at times when competition for resources is minimal and more slowly at other times, when competition is greater.

Predictable workload execution
Resource Governor, a new SQL Server 2008 Enterprise Edition feature, addresses these problems by enabling administrators to explicitly control the allocation of resources to competing workloads by applying administrator-defined CPU and memory limits to specified workloads, such as regular maintenance jobs or periodic resource intensive queries. This helps to make query execution more predictable and can prevent the occurrence of runaway queries.

You define the available resources by creating resource pools, and you use classifier functions and workload groups to identify workloads and apply limits.

Resource pools
Resource pools limit the available resources in terms of maximum and minimum CPU and memory percentages, which you define. You can create multiple pools, each with different resource settings to reflect the different workloads on the system. Resource Governor will use the percentage values that you supply to calculate a shared percentage value and an effective maximum value for CPU and memory for each of the pools. When calculating these figures, Resource Governor takes account of the competing requirements of each pool so that resource usage is balanced. In addition to user defined resource pools, there are two default resource pools, called ‘Internal’ and ‘Default’, which are used by the Internal and Default workload groups respectively.

Workload groups
A workload group is effectively a container into which incoming sessions are placed. Administrators can define multiple workload groups to represent the different types of workload that the system experiences, and SQL Server also has the two built-in workload groups, ‘Internal’ and ‘Default’, that we mentioned above. Connecting sessions are allocated to the Default group when they are not explicitly allocated to any other group, and the internal group is for use by SQL Server itself, for processes such as Lazy Writer.

Each workload group is associated with a resource pool, which dictates the memory and CPU resources that are available to the workload group, as described earlier. Workload groups have additional settings that can be used to fine tune resource usage, including the maximum number of CPUs that the group can use, and LOW, MEDIUM, and HIGH importance settings to prioritize workload groups relative to other workload groups. By allocating sessions to a specific workload group, you limit the resources that an application has access to, which prevents problems such as runaway queries from ever arising.

Classifier functions
Classifier functions examine incoming sessions as they connect to the server and allocate them to user defined or system workload groups. When you define a classifier function, you can use functions such as HOSTNAME(), SUSER_SNAME(), and APP_NAME() to identify sessions and stream them into the appropriate workload group. For example, you can create a function that uses APP_NAME() to check the names of applications that initiate sessions, and to place them into workload groups that were created to apply limits specifically for those applications. For maximum efficiency, classifier functions should ideally group together sessions that perform similar tasks so that each workload group is as uniform as possible.

Summary
Resource Governor is a powerful tool that is likely to become an essential part of the way we manage SQL Server databases. However, it is not a panacea for poorly performing queries, and you still need to consider all of the ‘traditional’ ways of improving query performance, such as good index and query design. There are also a couple of limitations to consider before you race off to implement it on your servers: Firstly, with the current release, Resource Governor can only be used to manage workloads within an instance of SQL Server; so if you’re running a multi-instance set-up, workloads from one instance can still impact upon with the execution of workloads from other instances, just as they always did. Secondly, Resource Governor can only be used to manage the database engine itself, and not to manage Integration Services, Reporting Services, or Analysis Services.

Friday, 1 May 2009

SQL Server 2008 Star Join Query Optimization

As Business Intelligence (BI) has become ever more important in helping business to maintain their competitive edge, the databases that handle BI queries have had to cope with a corresponding increase in their workload. Nowadays, it is not uncommon for queries that are run against large databases such as data warehouses and data marts to return millions of rows, and when you consider that these types of queries frequently involve joining multiple tables, it’s easy to see why users often experience extended query response times. SQL Server 2008 Enterprise Edition addresses this issue with a new feature called ‘Star Join Query Optimization’, which is a mechanism that uses bitmap filtering to improve the performance of certain types of queries by making the retrieval of rows from fact tables more efficient.

Improved Query Response Times
Data warehouses generally use dimensionally modelled star or snowflake schemas, which feature one or more fact tables that contain transactional data, and various dimension tables which contain the information (such as product data, customer information, and times and dates) that gives meaning to the fact table data. Foreign keys are usually used to maintain the relationships between the rows in fact tables and those in the dimension tables. SQL Server 2008 Enterprise can recognize databases that have star schemas and use the new Star Join Query logic to process queries against them more quickly, with the result that typical data warehouse queries run 15 to 20 percent faster on average.

Automatically Implemented
Star Join Query Optimization is implemented automatically by SQL Server, and requires no special database or application configuration. The query processor will usually elect to optimize queries that are of medium selectivity (that is, those that retrieve between approximately 10% and 75% of the rows from a fact table). Such queries are generally handled by using hash joins to join the dimension and fact tables, using the foreign keys to identify matching rows. Hash joins work by building a hash table for each dimension table referenced in the query, and the optimization process uses these hash tables to derive bitmap filters. Bitmap filters identify the key values from each dimension table that qualify for inclusion in the query. The bitmap filters are applied to the fact table as it is scanned and they act to eliminate the fact table rows that do not qualify for inclusion in the result set. The most selective bitmap filter is applied first because it eliminates the greatest number of rows. Because the eliminated rows require no further processing, subsequent filters do not have to be applied to them, and the efficiency of the process is improved.

Query Selectivity
Bitmap filtering improves performance in medium selectivity queries because it filters out rows early on, before any joins are implemented and as a result, the number of rows that are processed by each join is reduced. Bitmap filtering is not generally used when queries are very highly selective (returning fewer than 10% of the rows in a fact table) because a nested loop join will usually be more efficient. Likewise, on queries that are not very selective at all (returning more than 75% of the rows in a fact table), bitmap filtering will not be used because there are very few rows to be filtered out, so there is no performance benefit to be gained.

Integer Data Types
Star join optimization is most efficient when the columns used in joins are of the integer data type because this enables the filter to be applied as part of the initial table or index scan rather than at a later stage in the query plan. Because it is common practice to implement foreign key relationships by using integer based alternate key columns, the majority of queries should be able to benefit from star join optimization.

Monday, 27 April 2009

Live Mesh Part 2

In my previous entry about Live Mesh I mentioned that you could use it to transfer data to and from the cloud in addition to straightforward file-sync operations.

To develop Mesh enabled applications you need to sign up to the Mesh developer site, but this is straightforward. You do need to remove the Live Mesh client to install the Live Framework Client and connect to https://developer.mesh-ctp.com rather than http://www.mesh.com, but presumably this will be rectified for final release You can download the SDK from here.

In the Live Framework SDK there are samples to demonstrate Mesh functionality. The Live Folders sample creates, edits and deletes files from the Mesh. These files and folders appear exactly the same as the synchronized folders created with Live Mesh. I would expect that long term you could synchronize with these folders, but at the moment the Live Framework Client is sandboxed and does not support the synchronization or remote control features of Live Mesh.

The following code creates a Live Mesh folder and you can see that the resource type is LIVE_MESH_FOLDER:

public static string CreateRootFolder(Mesh mesh, string title)

{

// Check if a folder with the same name exists.

foreach (MeshObject oneObject in mesh.MeshObjects.Entries)

{

if (oneObject.Resource.Title.Equals(title))

{

return "Folder already Exists!!!";

}

}

// Create folder object

MeshObject meshObject = new MeshObject(title);

// It is a mesh folder

meshObject.Resource.Type = MeshConstants.LIVE_MESH_FOLDER;

// Add folder to collection of mesh objects

mesh.MeshObjects.Add(ref meshObject);

// Create feed for files (required)

DataFeed fileDataFeed = new DataFeed(MeshConstants.LIVE_MESH_FILES);

// Set type and handler type (required)

fileDataFeed.Resource.Type = MeshConstants.LIVE_MESH_FILES;

fileDataFeed.Resource.HandlerType = MeshConstants.FILE_HANDLER_TYPE;

// Add new data feeds to collection

meshObject.DataFeeds.Add(ref fileDataFeed);

//this.LoadMeshObjects();

return "Root folder " + title + " created successfully";

}


There is also a Project Manager sample in the SDK. This is a simple application to create projects and milestones. It creates non-standard objects in Mesh. As you can see in the project code below, you can create your own class of object and are not constrained by standard folders and files:


///
/// UUID of the parent MeshObject
///
[DataMember]
public string MOID { get; set; }
///
/// Title of the milestone (during save/update, this matches MeshObject.Resource.Title,
/// but is stored here when the custom object is databound instead of the resource)
///
[DataMember]
public string Title { get; set; }
///
/// Date when project will be started
///
[DataMember]
public DateTime KickOffDate { get; set; }
///
/// Estimated Date for Completion
///
[DataMember]
public DateTime CompletionDate { get; set; }
///
/// Date when project was shipped
///
[DataMember]
public DateTime ShippedDate { get; set; }
///
/// Description of Project
///
[DataMember]
public string Description { get; set; }


When you deploy a Mesh application it appears both on the users Live Mesh website and also as a shortcut on their desktop. In this way, a user can just run the app without any knowledge of Live Mesh. Furthermore, because the data is synchronized between the local machine and the cloud, they can run the application when they are disconnected and it will automatically synchronize when connected although there is no conflict resolution built in.

This does come with some caveats. This is currently CTP and is not fully functional. As previously mentioned the developer environment does not currently integrate with the Live Mesh client. Also every time I tried to run an application from the desktop it said it was waiting for me to sign in with no opportunity for me to do so and regardless of whether I was signed in or not. There is also a risk that this is a solution waiting for a problem. It looks interesting, it’s quick and straightforward to Mesh enable applications, but there needs to be a compelling requirement for this to go beyond the trying out samples stage and into developing real applications.


In the code samples above, these are just snippets of the complete solution, but you can see that Live Mesh has both straightforward user file and folder synchronization, as well as an API to enable you to create a cloud-based solution.
As a final note, I looked into this as a way to synchronize favorites in IE8. You can now do this by signing up to SkyDrive and reinstalling the Live Toolbar.

Monday, 20 April 2009

Table and Index Partitioning in SQL Server 2008 Enterprise Edition

Introduction

SQL Server 2008 Enterprise enables you to horizontally partition individual tables and indexes so that data storage is spread over multiple drives. Horizontal partitioning is the process of dividing the data in a table or index by rows, so that some of the rows are contained in one location, some in a second location, some in a third, and so on. For example, with SQL Server 2008 Enterprise, you can partition large tables by date, so that each partition stores rows that correspond to a specific time period.
Partitioning has multiple benefits:
• Firstly, it enables you to make efficient use of the available disk capacity by controlling the placement of data from a single table or index across multiple disks. Partitions are mapped to filegroups, which in turn contain the files that hold the actual data.
• Secondly, partitioning enables flexible management of larger tables, for example by enabling you to backup a just subset of the rows in a table. Because partitions are mapped to filegroups, you can use SQL Server backup to backup just the filegroups that contain changed data.
• Thirdly, partitioning can make the loading new data a much faster procedure using the SWITCH option, which we’ll discuss shortly.
• Finally, and perhaps most significantly, partitioning can improve query response times for users.

Partition switching

Faster data loading can be achieved when you use the SWITCH clause of the ALTER TABLE statement. This enables you to swap a populated table or partition with an empty table or partition, provided that the they both have the same design structure, including the number, order, naming, and nullability of columns. For example, you could load new data by switching a populated table into a partitioned table and switching a named partition – say, partition 1 - out of that table. All that actually happens with this operation is that the metadata of the table and partition, including the table name and the partition name, are switched. No data actually needs to be moved, and as a result the process is very quick. You can also use the ALTER PARTITION FUNCTION statement with the MERGE and SPLIT clauses to merge two partitions together and to split an existing partition.

Performance improvements

Partitioning improves query response times on partitioned tables when users issues queries that filter rows by date, as is very often the case in large databases such as data warehouses. This is because SQL Server only needs to search through the partition or partitions that contain the relevant rows, which is more efficient. You can also limit lock escalation to the partition rather than the table, reducing the risk of blocking and deadlocking.
SQL Server 2008 offers a considerable improvement in performance over SQL Server 2005 for queries against partitioned tables that run on multiprocessor systems. In SQL Server 2005, if a user issued a query against a partitioned table and the query touched only one of the partitions in that table, all available threads could be allocated to that query, resulting in fast response times. If, however, the query touched more than one partition, then only one thread could be allocated per partition in the query, resulting in comparatively slower response times, particularly in cases where the majority of the data resided in just one of the partitions. This behaviour was by design and was actually intended to provide improved performance in the special case of a query touching only one partition, but from the users’ point of view, it sometimes meant that execution times seemed unpredictable. SQL Server 2008 addresses this issue with improved partitioned table parallelism. Queries that touch a single partition behave in the same way as they do in SQL Server 2005, and all available threads are allocated to the query. Queries that touch multiple partitions now have all available threads allocated to them, with the threads being distributed in a round-robin fashion. For example, on a server that has four cores, a query that touches one partition will have all four threads allocated to it. A query that touches two partitions will have all four allocated again, with two available per partition. In SQL Server 2005, this second query would only have had one thread available per partition. This new behaviour can result in drastic improvements in performance, with some queries running ten to fifteen times faster.

Summary

Table and index partitioning is a powerful feature that offers many benefits to administrators who manage large databases. SQL Server 2008’s enhanced partitioning delivers impressive performance on multi-core servers, enabling you to get more out of your existing hardware.

Monday, 6 April 2009

Live Mesh

I came across a new Windows Live component the other day called Live Mesh
when searching for a way to synchronize my favorites. In IE7 you could synchronize favorites with Live Favorites. It worked pretty well, although by no means perfectly. This functionality was removed from IE8, which annoyed me somewhat. I work from home most of the time, but occasionally go into the office and take a laptop. Virtually all documents I use are checked into SharePoint, so I don’t lose any data when I switch machines, however there are always useful bits and pieces that I pick up on the Web and I usually just add these to favorites. This is where the upgrade to IE8 is giving me problems. Not insurmountable, obviously, but I want everything to work smoothly. A blogger called Laurent Duveau has used Live Mesh to do just this. Looking further into Live Mesh made me realise its other capabilities. It interested me through a connection with the Azure Services Platform, of which it is a part, and through its data synchronization and remote control functionality.
Live Mesh allows you to synchronize folders with the cloud. You set up a device and then simply right click a folder to begin synchronizing data. This sounds a bit like functionality already provided by Live Sync and SkyDrive, however Live Sync requires both computers to be connected at the same time and SkyDrive is just online storage with no more advanced functionality.


So what more does Live Mesh offer? Well, for a start, the whole process is automatic and painless. You add a folder and the folder is synched. You can add another device and keep the data synched across multiple devices. These devices will include Windows Mobile and Apple Macs in the future. The data is both held locally, so that it can be accessed offline, and in the cloud, so that it can be accessed anywhere. This solved my favorites problem, but also made me curious as to the capabilities of Live Mesh.

Other features include remote desktop. Now not only do I have all my data synchronized, I can also connect to my desktop PC from anywhere. So far, these are all features available with the use of a few tools, but there is another trick up the Live Mesh sleeve. There is a Live Framework with various APIs including .NET, Silverlight and JavaScript to use Live Mesh services. This not only allows you to share folders as you normally would with Live Mesh, but also allows any data to be stored and synchronized between devices. I’ll leave the Live Framework for another day, but it appears to offer interesting possibilities.
Of course, this is a beta, but when the minor errors are ironed out this could be a valuable data synchronization tool.