Monday 28 February 2011

Understanding List Query Throttling Limits in SharePoint 2010

By now, most SharePoint developers will have come across the list query throttling settings in SharePoint 2010. Essentially, farm administrators can impose limits on the number of items returned by list queries, in order to protect the performance of the farm as a whole. Limits are applied on a per-Web application basis and are typically managed through the Web application settings in Central Admin.


So far, so good. The concepts of query throttling are well documented, and the rationale will be obvious to anyone who has seen a SharePoint environment grind to a halt under heavy-handed list queries. (For a good explanation of query throttling, together with information on how you can avoid hitting the limits through careful indexing, take a look at Query Throttling and Indexing by the patterns & practices team.) However, it's not always entirely clear how these settings are applied.

First of all, "administrators" is a loose term. Let's clarify who qualifies as "auditors and administrators" for the purposes of these settings. Site collection administrators do not qualify. Farm administrators do not qualify. The only people who qualify are users who have specific permission levels assigned at the Web application level. Specifically, the policy level assigned to the user must include the Site Collection Administrator or the Site Collection Auditor permissions, as shown below.



Now for the bit that took me a little longer to grasp. What does the object model override actually do? Firstly, it doesn't allow you to submit database queries that hit an unlimited number of rows in the database. Secondly, it doesn't change the list view threshold for regular users at all. All the object model override does is allow our auditors and administrators, as defined by the Web application user policy, to submit queries at the higher threshold value. In other words, if you don't use the object model override, auditors and administrators are stuck with the same standard list view threshold as everyone else.

To dig a little deeper into how these thresholds are applied, I provisioned a basic list and used a feature receiver to add 10,000 items. This puts me nicely between the lower threshold and the upper threshold. Next, I created a Web Part that attempts to retrieve all the items from the list. The core code is as follows:

SPWeb web = SPContext.Current.Web;
SPList list = web.Lists["BigList"];
SPQuery query = new SPQuery();
query.QueryThrottleMode = SPQueryThrottleOption.Override;
SPListItemCollection items = list.GetItems(query);
litMessage.Text = String.Format("This list contains {0} items", items.Count);


The important bit is the 4th line down:

query.QueryThrottleMode = SPQueryThrottleOption.Override;

The SPQueryThrottleOption enumeration has three values: Default, Override, and Strict. If you use the default value, the standard list view threshold applies to all users except local server administrators, who are not bound by either threshold. If you set the query throttle mode to Override, users who have the required permissions in the Web application user policy can query at the higher "auditors and administrators" threshold. Local server administrators remain unbound by either threshold. Finally, if you set the query throttle mode to Strict, this closes down the local server administrator loophole and the standard list view threshold applies to all users.

The following table shows which threshold applies to which users for each of the SPQueryThrottleOption values:

Type of userDefaultOverrideStrict
Site memberStandardStandardStandard
Site ownerStandardStandardStandard
Site collection adminStandardStandardStandard
Web app policy: site collection adminStandardHigherStandard
Web app policy: site collection auditorStandardHigherStandard
Farm adminStandardStandardStandard
Local server adminUnlimitedUnlimitedStandard

Finally, I found an interesting quirk for local server admins. The list view threshold exemptions for local server administrators apply only to users who are explicit members of the Administrators group on the local server. For example, domain admins are implicit members of the local Administrators group by virtue of their membership of the Domain Admins group. However, the standard list view threshold applied to my test domain admin account.

I hope this helps to clarify things for anyone else who's confused by list view thresholds. If you want to know more, Steve Peschka's blog is the best source of information I've seen in this area.

Cross-posted from Jason Lee's Blog

Tuesday 22 February 2011

SQL Server “Denali” Integration Services – Projects and Parameters

Some previous posts in this blog have discussed new features in the SQL Server “Denali” database engine. For this post however, I want to focus on some of the key enhancements in SQL Server “Denali” Integration Services (SSIS). SSIS first appeared in SQL Server 2005 as an evolution of the Data Transformation Services (DTS) component in previous releases, and has steadily become a core element of Extract, Transform, and Load (ETL) operations for many data warehousing implementations.

The big news in the “Denali” release of SQL Server Integration Services, is a whole new deployment model for SSIS solutions. In previous releases, the only available unit of deployment is the package (a .dtsx file), and this could be deployed either to the file system or to the MSDB database in a SQL Server instance. This single-package deployment model is at-odds with the development model for SSIS solutions, in which a developer can create a single project that contains multiple packages. Prior to “Denali”, each package must be deployed and any variables that need to be set at runtime must be managed through a package configuration for each individual package. SSIS in “Denali” still supports this “legacy” deployment model, but now also supports project-level deployment to a the new Integration Services Catalog, and project-level parameters that can be used to set variables across multiple packages within a project.

The first thing you need to do to take advantage of this new deployment model, is to create an Integration Services catalog on an instance of SQL Server. The Integration Services catalog is a central database in which SSIS projects can be stored and managed, and you can have one catalog per SQL Server instance. The Integration Services catalog uses the SQLCLR (the .NET common language runtime hosted within SQL Server), so you need to enable this first by using the following Transact-SQL:

sp_configure 'show advanced options', 1;

GO

RECONFIGURE;

GO

sp_configure 'clr enabled', 1;

GO

RECONFIGURE;

GO

Now you’re ready to create an Integration Services catalog, which you can in SQL Server Management Studio as shown here.

Picture1

When you create the Integration Services catalog, you’re prompted for a password that can be used to protect the databases master key used to encrypt the data.

Picture2

 Picture3

After clicking OK, refreshing the Object Explorer view reveals two new items as shown here. The first is a database named SSISDB, and the second is an SSISDB node beneath the Integration Services folder. The database is a regular SQL Server database that contains a number of tables, views, and stored procedures that you can use to manage and run SSIS projects and packages stored in the catalog. It is also where the projects and  packages in your catalog are physically stored. The SSISDB node under the Integration Services folder provides a management interface for the catalog and enables you to define a logical folder structure for your catalog.

Picture4

To create a folder in your catalog, simply right-click the SSISDB node under the Integration Services folder, and click Create Folder.  Here I’ve created a folder with the imaginative name My Folder. Note that subfolders named Projects and Environments have automatically been created – we’ll return to these later.

OK, so now we have an Integration Services catalog that contains a folder to which we can deploy an SSIS project; so I guess it’s time we went ahead and created a project to deploy. For our purposes, we’ll create a simple SSIS project that includes a data flow task that extracts the list of database names from the sysdatabases system view in the master database and copies it to a table in another database. I’m going to copy the database list to a table in a database called CmSampleDB, and to make matters a little more interesting, I’m going to create two tables that can act as the destination for the list – one to be used when testing the SSIS solution, and another to be used in production. We’ll design the SSIS project to support a project-level parameter so you can specify which table to use at runtime. Here’s my Transact-SQL code to create the destination tables:

USE CmSampleDB

GO

CREATE TABLE TestDBList

(name nvarchar(250))

GO

CREATE TABLE DBList

(name nvarchar(250))

GO

Now we can go ahead and create the SSIS project using SQL Server Business Intelligence Development Studio (BIDS). Creating an SSIS project in “Denali” is exactly the same as in previous versions, just select the Integration Services Project template as shown here:

Picture5

When the new project is created, it will contains a single package named Package.dtsx, which you can rename to suit your own requirements – I’m goingPicture6 to name my My Package.dtsx. You can add more packages to the project as required, so for example, I’ll add a second package which I’ll name, um, My Other Package.dtsx. In Solution Explorer, my project now looks like this.

So far, nothing is very different from how you would create an SSIS project in previous releases of SQL Server, but here’s where we’re going to use a new feature – Project Parameters. Project parameters are, as the name suggests, parameters that can be used to pass variable values to the project at runtime. Because these parameters are scoped at the project level, they can be used by any package in the project. To add a project parameter, right-click the project in Solution Explorer and click Project Parameters, or click Project Parameters on the Project menu. Either of these actions displays the Parameters pane as shown here:

Picture7

As you can see, I’ve used this pane to create a project-level parameter named TableName with a default value of TestDBList. This default value is more correctly known as the Design default value, since it’s used when I run the project within BIDS. When I deploy the project, I can set a Server default value that will override this one when packages in this project are run on the server.

Now I need to create the data flow task that copies the database names from sysdatabases in the master database to the table indicated by the TableName parameter in the CmSampleDB database. To do this I just need to drag a Data Flow task to the design surface of My Package.dtsx as shown here:

Picture8

Next, I’ll double-click the data flow task to view the data flow design surface, and use the Source Assistant item on the SSIS Toolbox to create a new connection to the master database on my SQL Server instance. Then I can configure the OLE DB source that gets created to extract the name column from the sysdatabases system view by using the following SQL command:

SELECT name FROM sysdatabases

The data flow surface now looks like this:

Picture9

Next I’ll use the Destination Assistant to add a connection to the CmSampleDB database on my SQL Server instance, and connect the output from the source to the destination as shown here:

Picture10

To complete the data flow, I need to configure the destination to insert the output from the source into the table specified in the project-level TableName parameter, as shown here:

Picture11

Now I’m ready to build and deploy the project to the Integration Services catalog I created earlier. Building the project in BIDS creates a .ispac file, which you can then import into the catalog using SQL Server Management Studio, or deploy directly to the catalog from BIDS by clicking Deploy on the Project menu (or by right-clicking the project in Solution Explorer and clicking Deploy). Whichever approach you use, deployment to the catalog is accomplished via the Integration Services Deployment Wizard. After the Welcome screen, the wizard prompts you to select the project you want to deploy – in this case, the .ispac file I just built.

Picture12

Next, the wizard loads and validates the project before prompting you for the destination. This consists of the server where the Integration Services catalog is hosted, and the path to the folder where you want to deploy the project. Here, I’ve select the My folder folder i created earlier.

Picture13

Finally, the wizard prompts you to set Server default values for any project parameters. You can use the Design default value, specify a new value, or use an environment variable. We’ll look at environment variables shortly, but for now I’ve set the Server default value for the TableName parameter to DBList.

Picture14

 

Completing the wizard deploys the project to the catalog, which you can verify in SQL Server Management Studio. note that the project is actually saved to the Projects sub-folder of the path specified in the wizard, and that all packages within the project are deployed as a single unit.

Picture15

The final thing I want to do is to define a test environment and a production environment that can be used to control the execution context for the project. To do this, I’ll right-click the Environments folder and click Create Environment. Using this approach I’ve created two environments called Test and Production.

Picture16

You can edit the properties of each environment to create environment variables, which in turn can be used to set project parameters when project packages are run in the context of the environment. For example, here I’m creating an environment variable named tName in the Test environment with a value of TestDBList. I’ve also created an environment variable with the same name in the Production environment and assigned the value DBList.

Picture17

Finally, I can hook the environments up to the project by editing the properties of the project in the Integration Services catalog and adding environment references, as shown here…

Picture18

… and setting parameters to get their values from environment variables as shown here (note that in the CTP release, you must click OK after adding the environment references on the References page before re-opening the Properties window and changing the parameter value on the Parameters page):

Picture19

So now we have a project deployed to our Integration Services catalog. The project contains two packages - one of which doesn’t actually do anything, and another that copies the list of database names from the sysdatabase system view in the master database to a table in the CmSampleDB database.  There is a project-level parameter that is used to indicate which table the database names should be copied to, and this is set to TestDBList or DBList depending on the environment that the package is executed in. To test this, I can right-click My Package.dtsx in Object Explorer and click Run, which produces the following dialog box:

Picture20

Note that I can select the environment reference I want to use, which will determine the value of the tName environment variable, which will in turn set the value for the TableName project parameter and ultimately determine which table the data is copied to. For this example, I’ll select the Test environment and run the package, and the data is coped to the TestDBList table as shown below:

Picture21

To review past operations in the catalog, you can right-click SSISDB under the Integration Services folder in Object Explorer and click Operations. This shows a list of all operations that have been performed in the catalog in reverse order of occurrence, so in this example you can see that a project was deployed and then a package was executed.

Picture22

Double-clicking an entry in the list reveals more information about the operation. For example, here’s the  details for the package execution:

Picture23

Note that clicking the Parameters tab shows the parameter values that were used for this particular execution:

Picture24

I’ve used this article to give you a quick tour of the new deployment model for SSIS and how you can use project parameters and environment variables to create a more flexible but manageable ETL solution with SSIS in SQL Server “Denali”. For more information about what’s new in SSIS in “Denali”, see SQL Server Books Online.

Wednesday 16 February 2011

SQL Server Denali - Query Acceleration for Data Warehouses with ColumnStore Indexes

ColumnStore indexes are a new feature of SQL Server Denali that boost query performance in data warehouses. Although this feature is not a part of the latest CTP release, it is a significant development given the energy that Microsoft has expended in recent years to compete with its main rivals as a serious platform for enterprise data warehousing and business intelligence (BI). With the SQL Server 2008 and SQL Server 2008 R2 releases, Microsoft introduced features such as data compression, improved performance for partitioned table and indexes (http://cm-bloggers.blogspot.com/2009/04/table-and-index-partitioning-in-sql.html), star join query optimization (http://cm-bloggers.blogspot.com/2009/05/sql-server-2008-star-join-query.html), the Fast Track reference architectures (http://cm-bloggers.blogspot.com/2009/06/fast-track-data-warehouse-reference.html), Resource Governor (http://cm-bloggers.blogspot.com/2009/05/sql-server-2008-resource-governor.html), and the new Parallel Data Warehouse (PDW) offering. These have given SQL Server the manageability, scalability, and fast, predictable performance times required to deliver true enterprise data warehousing. Meantime, PowerPivot technologies, deeper integration with SharePoint and Office, Report Builder (http://cm-bloggers.blogspot.com/2009/01/sql-server-reporting-services-report.html), and a whole host of improvements to Reporting Services and Analysis Services have helped to put Microsoft in the leading pack for BI solutions.

ColumnStore indexes should be seen in the light of these earlier developments as a way of further boosting data warehouse performance, and by extension, BI performance. Early reports suggest this could be by factors of ten or more and even by up to one thousand times in some cases, so it really does have the potential to seriously speed up long-running queries.

ColumnStore indexes use database pages to store their data, just like standard indexes; However, unlike the pages of standard indexes, which store rows of data, the pages in a ColumnStore index store only the contents of individual columns. Pages are grouped into sets, one set for each column in the table. For example, for a table with five columns, a ColumnStore index would have five sets of pages, one set for each column in the table. Storing data in this way improves disk read performance because it enables SQL Server to retrieve and place in memory only the data columns that are specified in the query. Unnecessary columns of data that would also be returned in a standard index are not returned because they are not present in the index data pages.

Compression on ColumnStore indexes also has a big part to play in boosting performance. This is because columns generally tend to contain repetitive data and one way in which SQL Server compression works is by removing exactly this kind of redundant, repeated data. As a result, once it is loaded into memory, the index takes up much less space and is therefore much less likely to be paged out, ensuring that it remains available for subsequent queries that reference the same columns. Many queries that are run against data warehouses centre around a limited set of columns which are contained in fact tables, so a few carefully planned ColumnStore indexes could result in a major performance boost for these types of queries.

I wasn't able to try ColumnStore indexes out for myself, as the CTP doesn't seem to support them, as my attempt to create one shows:





However if Microsoft's performance figures translate into real-world scenarios, then this should add even more strength to an already compelling data warehousing / BI package. For more information, go to: http://download.microsoft.com/download/8/C/1/8C1CE06B-DE2F-40D1-9C5C-3EE521C25CE9/Columnstore%20Indexes%20for%20Fast%20DW%20QP%20SQL%20Server%2011.pdf




Thursday 10 February 2011

WP7 Guide

It’s finally here – I’ve just received my copy of “Windows Phone 7 Developer Guide – Building connected mobile applications with Microsoft Silverlight,”  a book that I worked on last year.

The book describes a scenario that uses Windows Phone 7 as a client platform for the Tailspin Surveys application that runs in Windows Azure (this cloud-based application is described in the previous book in the series “Developing Applications for the Cloud”), and focuses on building a Windows Phone 7 app that integrates with a range of cloud-hosted services.

The book will help you get started with Windows Phone 7 development, providing practical, real-world examples, and detailed discussions of the trade-offs and design decisions you may have to make. The sample application uses the MVVM design pattern, so if you want to learn more about that, then this book is a great place to start. To download the source code and read more, look here on MSDN.

We’ve now started work on the next project – extending the the “Guide to Claims-based Identity and Access Control” to include Windows Azure AppFabric Access Control Services and SharePoint in some of the scenarios.

Wednesday 9 February 2011

Reactive Extensions

The Windows Phone 7 platform includes the Reactive Extensions for .NET Framework (Rx for short) to enable you to implement a wide range of asynchronous behaviours on the phone. Rx is “reactive” because it enables you to write applications that  react to external events such as data arriving over the network, a stream of data arriving from a hardware device or a user clicking a button in the UI. While Rx is bundled with the Windows Phone 7 API, you can also download it for use with standard .NET applications (and for use with Silverlight and JavaScript).
Rx requires a slightly different way of thinking about asynchronous behaviour and events. The fundamental concept of Rx is based around the idea of there being something that is observable that generates a sequence of things for an observer to observe. The other things to be aware of about the way that Rx works are:
  • It’s a push model — the observable instance pushes information to the observer instance.
  • The observer can filter the information that it receives by using LINQ queries.
Before going any further, here’s a couple of very simple examples to illustrate some of these points. Lets start with a simple enumerable sequence in C#:
int[] numbers = { 1, 2, 3, 5, 8, 13, 21 };
Arrays in .NET implement the IEnumerable interface that enables you to iterate over the array, pulling each item from the array:
foreach (var number in numbers)
{
Console.WriteLine("Number: {0}", number);
}
For this example we can convert the array to an observable sequence that implements the IObservable interface:
IObservable<int> observableNumbers = numbers.ToObservable();
We can then observe this sequence by subscribing to the IObservable instance like this:
observableNumbers.Subscribe(
number => Console.WriteLine("Number: {0}", number),
ex => Console.WriteLine("Something went wrong: {0}", ex.Message),
() => Console.WriteLine("At the end of the sequence")
);
This overload of the Subscribe method receives each number pushed at it by the observable sequence, and the three lambda expressions allow you to:
  • Handle each item in the sequence as it arrives.
  • Handle any exceptions raised.
  • Handle the end of the sequence.
Rx supports a wide range of use cases, and there are some excellent resources available for you to start exploring how you an use Rx. For example:
However, to start understanding the Rx approach I found it useful to work out how to do things in Rx that I already new how to do with other approaches. The following examples show some of these scenarios:

Handling Events

Writing code to handle events is a very common task. The following code shows how you might prevent a user from entering non-numeric characters into a text box:
<TextBox Height="23" Name="textBox1" Width="120" KeyDown="textBox1_KeyDown" />
private void textBox1_KeyDown(object sender, KeyEventArgs e)
{
if (e.Key < Key.NumPad0 || e.Key > Key.NumPad9)
{
e.Handled = true;
}
}
To do the same thing using Rx, you could do this:
<TextBox Height="23" Name="textBox2" Width="120" />
Notice that you don’t need the KeyDown attribute in the XAML code for this approach to work.
var keys = from evt in Observable.FromEvent<KeyEventArgs>(textBox2, "KeyDown")
where (evt.EventArgs.Key < Key.NumPad0 || evt.EventArgs.Key > Key.NumPad9)
select evt.EventArgs;
The FromEvent method enables you to treat an event as an observable sequence. This example also shows how you can use a LINQ expression to filter/query the items in the observable sequence; here you are only selecting non-numeric characters.
// Ideally you should dispose of keySubscription when the Window is disposed.
var keysSubscription = keys.Subscribe(evt =>
{
evt.Handled = true;
label1.Content = "Invalid character: " + evt.Key.ToString();
});
The Subscribe method here receives any non-numeric keystrokes in the textBox2 control, discards them and then updates a label control.

Running a Background Task

The following example shows the outline to a standard approach to running a background task in a WPF application using a the Task class. See John Sharp’s post here for more background in the Task class and the Task Parallel Library.
private void button1_Click(object sender, RoutedEventArgs e)
{
Task task = new Task(doWork);
task.Start();
}

delegate void ContentSetterDelegate(Label label, string text);
private void doWork()
{
// Work really hard at something.
Thread.Sleep(2000);
this.Dispatcher.Invoke(new ContentSetterDelegate(setContentProperty), label1, "Finished the work at last!");
}

private void setContentProperty(Label label, string text)
{
label.Content = text;
}
One thing to note about this is the use of the Dispatcher.Invoke method to update the UI from a background thread.
To achieve the same results using Rx, you could use the following code:
var clicks = from evt in Observable.FromEvent<RoutedEventArgs>(button2, "Click")
select evt;
var clicksSubscription = clicks.Subscribe(evt =>
{
var backgroundTask = Observable.Start(() =>
{
// Work really hard at something.
Thread.Sleep(3000);
}).ObserveOnDispatcher();

backgroundTask.Subscribe(
_ => { },
() => label1.Content = "It's all done now!"
);
});
This example responds to a click event using the same technique as the previous Rx example, and then runs the background task by using the Observable.Start method. The ObserveOnDispatcher method ensures that the observer runs on the dispatcher thread so that it can safely update the UI.

Synchronizing Tasks

John Sharp’s post here also described how you could synchronize several background tasks using the WaitAll and WaitAny methods in the Task Parallel Library. You can do a similar thing using Rx like this:
var task1 = Observable.Start(() =>
{
// Work really hard and return a result
Thread.Sleep(4000);
return 10;
});

var task2 = Observable.Start(() =>
{
// Work really hard and return a result
Thread.Sleep(3000);
return 30;
});

var tasks = Observable.ForkJoin(
task1, task2
).Finally(() => Console.WriteLine("Done!"));

// Wait for all the tasks to finish
var results = tasks.First();

// Process all the results
int sum = 0;
foreach (int r in results)
sum += r;
Console.WriteLine("The sum of all the tasks was: {0}", sum);
This example uses the Observable.Start method to define two background tasks, each of which returns an integer result. The ForkJoin method enables you to wait for both tasks to complete, and the example code then prints a “Done” message using the Finally method. You can then access the results of all the background tasks by using the First method (First because you want the first, and only set of results from the tasks sequence).
There’s an interesting discussion about the differences between running multiple background tasks with Rx and using the Task Parallel Library here: http://social.msdn.microsoft.com/Forums/en-US/rx/thread/12d3f79a-0a53-4854-976e-5fa0d86f01ee/.

Friday 4 February 2011

Step by Step Guide to Setting up HADR in SQL Server Denali

Following on from my previous post, I wanted to try out the HADR functionality in SQL Server “Denali” CTP1.

To test the solution, I wanted to run SQL Server HADR in a virtual machine environment, with all the VMs running on one physical host.

This seemed like a pretty straightforward task at first. There were a number of blogs with step by step instructions, however none of them actually showed HADR running and I was soon to find out why. After following the steps and finding that it did not work, I did some digging around online and, thanks to the help of Justin Erickson at Microsoft, found that Books Online is unclear on the pre-requisites (and soon to be corrected). Essentially, it sounds like you can run HADR on a single node cluster and this is what the other blog posters, and myself initially, had done. This is not the case. HADR must run on a multi-node cluster. So here are the steps to set this up in a virtual machine environment:

 

Pre-requisites

Hardware (physical)

One machine running Windows Server 2008 R2 with enough memory to run three virtual machines (I had 8GB and this was sufficient)

Hardware (virtual)

One domain controller running Windows Server 2008 R2 which I have name DenaliDC.

Two domain joined Windows Server 2008 R2 VMs with SQL Server Denali installed. Note: These cannot be a copy of the same VHD or clustering will fail (and it will take you ages to work out why). I have named these DenaliOne and DenaliTwo.

Note: I had some problems connecting to the default instance in Management Studio and so I have used named instances DenaliOne\DenaliOne and DenaliTwo\DenaliTwo.

Virtual Network

I created five private virtual networks as follows:

clip_image001

This is a bit over the top, but virtual network cards are free, so why not?

Domain Controller Networking

I added two network cards on the DataOne and DataTwo networks. DenaliOne has a fixed IP address of 192.168.10.1 and DenaliTwo has a fixed IP address of 192.168.20.1.

I added DHCP with a scope of 192.168.10.10 – 192.168.10.20.

I added DNS and created a default forward lookup zone.

DenaliOne Networking

I added five network cards as follows:

Name: DataOne

IP Address: Obtain automatically

Name: DataTwo

IP Address: 192.168.20.11

Default gateway and preferred DNS Server: 192.168.10.1.

Name: iSCSIOne

Remove Client for Microsoft Networks, QoS Packet Scheduler, File and Printer Sharing for Microsoft Networks, IP Version 6

IP Address: 192.168.2.1

Register this connection’s address in DNS: Unchecked

Name: iSCSITwo

Remove Client for Microsoft Networks, QoS Packet Scheduler, File and Printer Sharing for Microsoft Networks, IP Version 6

IP Address: 192.168.3.1

Register this connection’s address in DNS: Unchecked

Name: Heartbeat

Remove Client for Microsoft Networks, QoS Packet Scheduler, File and Printer Sharing for Microsoft Networks, IP Version 4

IP Address: Obtain an IPv6 address automatically

Register this connection’s address in DNS: Unchecked

DenaliTwo Networking

I added five network cards as follows:

Name: DataOne

IP Address: Obtain automatically

Name: DataTwo

IP Address: 192.168.20.12

Default gateway and preferred DNS Server: 192.168.10.1.

Name: iSCSIOne

Remove Client for Microsoft Networks, QoS Packet Scheduler, File and Printer Sharing for Microsoft Networks, IP Version 6

IP Address: 192.168.2.2

Register this connection’s address in DNS: Unchecked

Name: iSCSITwo

Remove Client for Microsoft Networks, QoS Packet Scheduler, File and Printer Sharing for Microsoft Networks, IP Version 6

IP Address: 192.168.3.2

Register this connection’s address in DNS: Unchecked

Name: Heartbeat

Remove Client for Microsoft Networks, QoS Packet Scheduler, File and Printer Sharing for Microsoft Networks, IP Version 4

IP Address: Obtain an IPv6 address automatically

Register this connection’s address in DNS: Unchecked

Additional Steps

There is a slight problem with clustering in a virtual machine which is the inability to share a VHD. The way around this is to use a virtual iSCSI device. This involves either using Windows Storage Server or a product called Starwind. I hadn’t used either before and opted for a free trial of Starwind as there was less to download.

Starwind Setup

1. Download and install Starwind on the DC from here.

2. Add a host with an IP address of 127.0.0.1 and Port of 3261.

3. Double click the host and logon with a username of root and password of starwind.

4. Add a target with a suitable name (I used Quorum) and the following settings:

a. Storage type of Hard Disk

b. Device type of Basic Virtual and Image File device

c. Device creation method: Create new virtual disk (place it wherever you want with a .img extension)

d. Image File device parameters: Allow multiple concurrent iSCSI connections

e. Accept defaults for all other parameters.

5. You should end up with something like this:

clip_image003

6. Click on the host and then click Network in Server Settings. You should then see the IP Address and port that the host is using. In this case I will use 192.168.10.1 and 3260. Note down these values.

clip_image005

Install iSCSI Inititor

On both SQL Server VMs:

1. Install the iSCSI Initiator from here. There is also a Starwind version, but I used the Microsoft one.

2. Run iSCSI Initiator and click the Discovery tab.

3. Click Discover Portal and enter the values from the Starwind software that you noted earlier.

4. Now click the Targets tab and click Refresh.

5. Connect to the Starwind target.

6. Run the Disk Management console (Diskmgmt.msc) and you should see a new disk.

7. Bring this disk online in both virtual machines, but only format it from one VM (DenaliOne)

Add Windows Clustering

On both SQL Server boxes run Server Manager and add the Failover Clustering feature. After any reboots check that you can still see the iSCSI disk and if not run iSCSI Initiator and check that you are connected to the target. It is quicker to disconnect and connect again than wait for the automatic reconnection.

Follow these steps to build the cluster:

1. On DenaliOne run Failover Cluster Manager and in the Actions box click Create a Cluster.

2. Skip the introduction screen and add DenaliOne and DenaliTwo. They should appear with their FQDNs.

3. Click Next through all of the warnings until the Cluster Validation runs.

4. Resolve and Cluster Validation errors. Don’t worry about warnings for this example.

Install SQL Server HADR

1. On DenaliOne:

a. Create a folder for transferring logs. I created C:\HADRShare\Logs.

b. Share this folder.

2. On both SQL Server VMs:

a. Star SQL Server Configuration Manager.

b. Right click the SQL Server service and click Properties.

c. Click the SQL HADR tab.

d. Click Enable SQL HADR service and click OK.

e. Stop and Restart SQL Server.

3. On DenaliOne run SQL Server Management Studio.

a. Create a database (I called this SQLHADR). For simplicity with backup and restore, place the files in a directory that exists on both servers. I created C:\Databases\Data on both virtual machines and used this.

b. Create a table in this database and add a row to the table.

c. Backup this database and restore it on DenaliTwo with the RESTORE WITH NORECOVERY option.

d. Expand Management and right click Availability Groups.

e. Click New Availability Group and click Next.

f. Enter a suitable name (I used SQLHADR) and click Next.

g. On the Select Databases page select SQLHADR and click Next.

clip_image007

h. Click Add and connect to DenaliTwo.

i. I want to use the secondary, or mirror, database to offload querying from the primary server. It is no longer necessary to create a snapshot and the querying happens with live data. To configure this change the read mode in Secondary Role to Allow Read Intent Connections and click Next:

clip_image009

j. Click Finish.

clip_image011

Start Data Synchronization

1. Click Start Data Synchronization.

2. Specify the share that you created previously and click Test.

clip_image013

3. Click OK.

clip_image014

4. Click Close.

Querying the Secondary Instance

With SQL HADR you can now query the secondary database. Previously, you could not query the mirror. Now, as long as we allowed connections when we set up the availability group, we can offload queries from the primary database:

clip_image015

Now, with SQL Server HADR, we have a high availability solution providing a failover database, but we can also offload querying to this database and have a reduced workload on our primary server. We’ve come a long way since log-shipping!

Performing a Failover

1. In Management Studio on the DenaliTwo\DenaliTwo instance expand Management, expand Availability Groups, expand SQLHADR, expand Availability Replicas and right click DenaliTwo\DenaliTwo.

2. Click Force Failover.

image

3. Click OK.

Note: Do not try to failover the primary database. This will fail.

 

Notes

This wasn’t the most straightforward environment to setup, mostly due to inaccurate step by step instructions on other blogs (never trust a solution that doesn’t show it working at the end). The key things that I found were:

  • You need a two node cluster. This requires a shared disk, although SQL Server doesn’t use this shared disk.
  • For some reason, I had inconsistent problems with default instances. If this is the case, try a named instance.
  • When you create the networks in Hyper-V, create one at a time and configure it in the VMs before creating the next network. This ensures that you know which network is which and helps with troubleshooting.
  • If anything doesn’t work and you change the configuration, you must disable HADR on all instances, restart the instances, re-enable HADR and restart the instances again.
  • I had HADR running on every instance on every server, although it now seems to work with only the mirrored instance having HADR enabled.
  • Perform failovers on the secondary database, not the primary. This makes sense when you have multiple secondary databases because you will failover the new primary.

SQL Server “Denali” HADR

Continuing with the SQL Server Denali blogs , I thought I’d take a look at the new high availability disaster recover, or HADR, functionality.

With SQL Server 2005, Microsoft introduced Database Mirroring. Prior to this we had a system called Log Shipping which was essentially an automatic backup and recovery system. Log Shipping did supply a failover instance of SQL Server, but, by default, it replicated every 15 minutes, which could result in the loss of important data. Furthermore, Log Shipping had no automatic failover. With Database Mirroring we had a system which was easier to setup, allowed us to choose the level of availability against performance, and with SQL Native Client, had automatic failover. The only limitation was that you couldn’t query the mirror. This could be remedied by taking periodic snapshots and using these for reporting, however if you wanted a real-time reporting database to take the load off your principal instance, then database mirroring wasn’t the answer. The other limitation was that you only had one mirror for each database.

Five years on and what has changed? Well, we now have High Availability Disaster Recovery (HADR). HADR is mirroring on steroids. You can have multiple mirrors (or secondary databases) to provide improved availability and ,furthermore, we can query these mirrors. We also have availability groups. Each availability group can have multiple databases and we can now failover these databases together in their group rather than on database at time.

In the CTP version of Denali there are some limitations which will not exist in the final product. For now, we can only have one secondary database, but in the release version, we can have four. We also only have asynchronous mode which is the equivalent to high performance mode, but again this is expected to be supplemented in the release version. Failovers can only be forced, which has the potential for data loss and are not currently automatic, but again, this will change.

There are some caveats. Obviously, there is a load on the server to transfer the logs. The CTP release uses an asynchronous mode whereby the primary server does not wait until the secondary server confirms that it has written the log records to disk before the transaction is committed. This reduces load on the primary server, but does have the potential that an update could be lost if the primary server fails just after a transaction is committed, but before it has been written to the secondary server. With the final release there will be a synchronous mode, but you will need to decide between protection and performance.

Currently, there are not many tools beyond setup and Force Failover available in Management Studio, although there are the following PowerShell cmdlets allowing you to also suspend and resume availability group databases:

Add-SqlAvailabilityGroupDatabase
Join-SqlAvailabilityGroup
New-SqlAvailabilityGroup
New-SqlAvailabilityReplica
Remove-SqlAvailabilityGroup
Resume-SqlAvailabilityGroupDatabase
Suspend-SqlAvailabilityGroupDatabase
Switch-SqlAvailabilityGroup

Although this is much more complex to setup than database mirroring, it provides a genuine high availability solution that also improves performance by offloading query operations to a mirror server. It will be very interesting to see how this evolves in the final product, but from these beginnings, SQL Server HADR looks to be an excellent availability solution.

Tuesday 1 February 2011

Using the New THROW Keyword in SQL Server "Denali"

When SQL Server 2005 introduced the TRY...CATCH blocks, SQL Server developers were finally able to use structured exception handling to their code. However, passing error information back to the application from the CATCH block using a RAISERROR statement is not as simple as it seems. To ensure that the correct information is passed back to the caller, you have to add your error information to the sys.messages table. That's easy enough to do, but if your application moves onto another server, there's a risk that you'll forget to add the error to sys.messages on the new server and your application may not function correctly.

SQL Server "Denali" introduces the THROW keyword, which extends the exception handling functionality of SQL Server 2008. It eliminates the issue of having to define errors in sys.messages and also enables you to handle an error and then rethrow that error back to the calling code.

So let's look at some examples of how you can use THROW in your Transact-SQL code. If you are implementing search functionality, you might want to customize the information returned when a SELECT statement finds no records. In this situation, you can simply use the THROW statement to return your own error message to the calling application.

SELECT * FROM Person.Person
WHERE LastName = 'Smyth'

IF @@ROWCOUNT = 0

THROW 50001, 'No results found.', 1

You can see in this example that I've used an error number greater than 50000. This is a requirement of THROW to avoid replicating any of the system error numbers. I've specified a meaningful message to be returned to the calling application that can be used directly in the calling code and the state parameter, which you can use to locate where the error originated.

The output when you run this code will be:

(0 row(s) affected)
Msg 50001, Level 16, State 1, Line 7
No results found.

You can also use THROW in a TRY ... CATCH block. This enables you to execute Transact-SQL code such as rolling back a transaction or logging information before passing the error back to the calling application.

BEGIN TRY
DECLARE @BusEntityID int;
SET @BusEntityID = (SELECT MAX(BusinessEntityID) FROM Person.BusinessEntity)
INSERT Person.Person(BusinessEntityID, PersonType, NameStyle, FirstName, LastName)
VALUES(@BusEntityID, 'EM', 0, 'Lin', 'Joyner')

END TRY

BEGIN CATCH
-- Handle the error.
-- Log the error in the SQL Server application log.
THROW
END CATCH

By using the THROW statement on its own, you will simply rethrow the existing error that has occurred. This is useful when the SQL Server error message is meaningful to the user or application. If you run this code with a non-unique value for the BusinessEntityID field, you will see the following error:

(0 row(s) affected)
Msg 2627, Level 14, State 1, Line 5
Violation of PRIMARY KEY constraint 'PK_Person_BusinessEntityID'. Cannot insert duplicate key in object 'Person.Person'. The duplicate key value is (20780).

You can use the parameters of THROW to make the error information more meaningful to the calling application or user as shown in the following example.

BEGIN TRY
DECLARE @BusEntityID int;
SET @BusEntityID = (SELECT MAX(BusinessEntityID) FROM Person.BusinessEntity)
INSERT Person.Person(BusinessEntityID, PersonType, NameStyle, FirstName, LastName)
VALUES(@BusEntityID, 'EM', 0, 'Lin', 'Joyner')

END TRY

BEGIN CATCH
-- Handle the error.
-- Log the error in the SQL Server application log.
THROW 51111, 'Invalid business entity id.', 2
END CATCH

In this case, the output when you violate the primary key will be as follows:

(0 row(s) affected)
Msg 51111, Level 16, State 2, Line 9
Invalid business entity id.

You'll notice that the severity level when you throw your own error is defaulting to 16. You cannot define a severity level for custom errors, however when you rethrow a SQL Server error, the original severity level is retained. You can only rethrow errors from inside a CATCH block, if you are outside of the CATCH block, you must specify the parameters.

So, as you can see, the new THROW statement in SQL Server "Denali" extends the structured exception handling capabilities of earlier versions of SQL Server.