Thursday, 28 October 2010

Make Your Applications More Efficient with Multitasking

Multitasking is the ability to do more than one thing at the same time. As far as Windows applications are concerned, it is one of those concepts that is easy to describe but that, until recently, has been difficult to implement without getting embroiled in the threading mechanism provided by the .NET Framework and the operating system. A good implementation of multitasking achieves two objectives:

  1. It helps to ensure that an application remains responsive; a desktop application can wait and respond to user input on one thread while another thread performs any processing required.
  2. It assists in making the application scalable; if the application is deployed to a computer with a multicore processor, it can take advantage of the additional processing power to perform operations concurrently.

In the optimal scenario, an application running on a multicore processor performs as many concurrent tasks as there are processor cores available, keeping each of the cores busy. However, there are many issues you have to consider to implement concurrency, including the following:

  • How can you divide an application into a set of concurrent operations?
  • How can you arrange for a set of operations to execute concurrently, on multiple processors?
  • How can you ensure that you attempt to perform only as many concurrent operations as there are processors available?
  • If an operation is blocked (such as while it is waiting for I/O to complete), how can you detect this and arrange for the processor to run a different operation rather than sit idle?
  • How can you determine when one or more concurrent operations have completed?
  • How can you synchronize access to shared data to ensure that two or more concurrent operations do not inadvertently corrupt each other’s data?

To an application developer, the first question is a matter of application design. The remaining questions depend on the programmatic infrastructure—Microsoft provides the Task Parallel Library (TPL) to help address these issues.

Tasks, Threads, and the ThreadPool

The most important type in the TPL is the Task class in the System.Threading.Tasks namespace. The Task class is an abstraction of a concurrent operation. You create a Task object to run a block of code. You can instantiate multiple Task objects and start them running in parallel if sufficient processor cores are available.

Internally, the TPL implements tasks and schedules them for execution by using Thread objects and the ThreadPool class (in the System.Threading namespace). Multithreading and thread pools have been available with the .NET Framework since version 1.0, and you can use the Thread class directly in your code. However, the TPL provides an additional degree of abstraction that enables you to easily distinguish between the degree of parallelization in an application (the tasks) and the units of parallelization (the threads). On a single-processor computer, these items are usually the same. However, on a computer with multiple processors or with a multicore processor, they are different items. If you design a program based directly on threads, you will find that your application might not scale very well; the program will use the number of threads you explicitly create, and the operating system will schedule only that number of threads. This can lead to overloading and poor response time if the number of threads greatly exceeds the number of available processors, or inefficiency and poor throughput if the number of threads is less than the number of processors.

The TPL optimizes the number of threads required to implement a set of concurrent tasks and schedules them efficiently according to the number of available processors. The TPL uses a set of threads provided by the ThreadPool, and implements a queuing mechanism to distribute the workload across these threads. When a program creates a Task object, the task is added to a global queue. When a thread becomes available, the task is removed from the global queue and is executed by that thread. The ThreadPool implements a number of optimizations and uses a work-stealing algorithm to ensure that threads are scheduled efficiently.

Note: The ThreadPool was available in previous editions of the .NET Framework, but it has been enhanced significantly in the .NET Framework 4.0 to support Tasks.

You should be aware that the number of threads created by the .NET Framework to handle your tasks is not necessarily the same as the number of processors. Depending on the nature of the workload, one or more processors might be busy performing high-priority work for other applications and services. Consequently, the optimal number of threads for your application might be less than the number of processors in the machine. Alternatively, one or more threads in an application might be waiting for long-running memory access, I/O, or a network operation to complete, leaving the corresponding processors free. In this case, the optimal number of threads might be more than the number of available processors. The .NET Framework follows an iterative strategy, known as a hill-climbing algorithm, to dynamically determine the ideal number of threads for the current workload.

The important point is that all you have to do in your code is divide your application into tasks that can be run in parallel. The .NET Framework takes responsibility for creating the appropriate number of threads based on the processor architecture and workload of your computer, associating your tasks with these threads and arranging for them to be run efficiently. It does not matter if you divide your work up into too many tasks as the .NET Framework will only attempt to run as many concurrent threads as is practical; in fact, you are encouraged to "overpartition" your work as this will help to ensure that your application scales if you move it on to a computer that has more processors available.

Creating, Running, and Synchronizing Tasks

You can create Task objects by using the Task constructor. The Task constructor is overloaded, but all versions expect you to provide an Action delegate as a parameter. The task uses this delegate to run the method when it is scheduled. The following example creates a Task object that uses a delegate to run the method called doWork:

Task task = new Task(doWork);
...

private void doWork()
{
// The task runs this code when it is started
...
}

The default Action type references a method that takes no parameters. Other overloads of the Task constructor take an an Action<object> parameter representing a delegate that refers to a method that takes a single object parameter. These overloads enable you to pass data into the method run by the task. The following code shows an example:

Action<object> action;
action = doWorkWithObject;
object parameterData = ...;
Task task = new Task(action, parameterData);
...

private void doWorkWithObject(object o)
{
...
}

After you create a Task object, you can set it running by using the Start method, like this:

Task task = new Task(...);
task.Start();

The Start method is also overloaded, and you can optionally specify a TaskScheduler object to control the degree of concurrency and other scheduling options. It is recommended that you use the default TaskScheduler object built into the .NET Framework, although you can define your own custom TaskScheduler class if you really need to take more control over the way in which tasks are queued and scheduled.

You can use also a TaskFactory object to create and run a task in a single step. The constructor for the TaskFactory class enables you to specify a task scheduler and additional task creation options. The TaskFactory class provides the StartNew method to create and run a Task object. Like the Start method of the Task class, the StartNew method is overloaded, but all overloads expect a reference to a method that the task should run. Even if you do not currently specify any particular task creation options and you use the default task scheduler, you should still consider using a TaskFactory object; it ensures consistency, and you will have less code to modify to ensure that all tasks run in the same manner if you need to customize this process in the future. The Task class exposes the default TaskFactory used by the TPL through the static Factory property. You can use it like this:

Task task = Task.Factory.StartNew(doWork);

When the method run by a task completes, the task finishes, and the thread used to run the task can be recycled to execute another task.

A common requirement of applications that invoke operations in parallel is to synchronize tasks. The Task class provides the Wait method, which implements a simple task coordination method. It enables you to suspend execution of the current thread until the specified task completes, like this:

task.Wait(); // Wait at this point until task completes

You can wait for a set of tasks by using the static WaitAll, and WaitAny methods of the Task class. Both methods take a params array containing a set of Task objects. The WaitAll method waits until all specified tasks have completed, and WaitAny stops until at least one of the specified tasks has finished. You use them like this:

// Wait for both task1 and task2 to complete
Task.WaitAll(task, task2);

// Wait for either of task1 or task2 to complete
Task.WaitAny(task, task2);

In summary, the TPL makes it easy to build multitasking applications without having to be concerned about the details of multithreading. The TPL provides a large number of features beyond those described in this simple overview, such as the Parallel class that implements a concurrent version of some common programming constructs. The TPL also includes a number of collection classes in the System.Collections.Concurrent namespace that support synchronized concurrent access to data shared by multiple tasks.

Wednesday, 13 October 2010

Using a Transparent Background in Reporting Services

While watching the Japanese Formula 1 Grand Prix on Sunday, it struck me that TV sports broadcasters make a lot of use of transparent overlays when showing scores, results, times, statistics, or whatever. In the case of the Grand Prix, the driver rankings in the world championship were displayed on a semi-transparent overlay, behind which the live footage of the race circuit could be seen.

So, naturally I started to wonder how I could achieve a similar visual effect in a Reporting Services report, like this:

Product Sales Report

My first thought was to look at the BackgroundColor property of the Tablix data region and set the Transparency level. However, when I looked at the color picker control for the property, this is what I saw:

ColorProperties

Note that the Transparency control is disabled. It turns out you can only set a transparency level for gauges and charts in Reporting Services – not for shapes or data regions. So, I needed to find an alternative approach.

The answer I came up with was to create a semi-transparent .png graphic, and use it as the background image for the data region. I created this with PowerPoint, though of course you can use any graphics tool you like. I also used PowerPoint to find a suitable clipart image to use as the background for the report (on which the semi-transparent data region will be overlaid). In this case, I’m using the Adventure Works Cycles sample data, so a photo of a cyclist seems like a good choice.

PowerPoint

You can take one of two approaches when it comes to sizing the semi-transparent image – you can make an extremely small image and then set the BackgroundRepeat property of the data region to Repeat, or you can make it bigger than the data region is ever likely to be and set the BackgroundRepeat property to Clip (or Repeat – it won’t matter since the image will be bigger than the data region anyway!). I found that PowerPoint tends to add some whitespace to the edge of a shape when you save it as a .png image, which showed up when repeating the background image, so I went with a large background image. Of course, had I used a more comprehensive graphics tool,  could have easily avoided this issue and got away with repeating a smaller image.

To embed the images to the report, I added them to the Images folder in the Report Data pane in Report Designer.

ReportData

Then I set the BackgroundImage property of the tablix data region in which the report data is displayed, like so:

TablixProperties

I’ve also used the semi-transparent image as the background for the report title textbox, which appears above the tablix data region.

The next challenge was to apply the cyclist image to the background of the report, and ensure that the layout of the report overlays the data neatly. If you have a small data set with a known number of records(for example, in a “top 10 products” report), then this is relatively straightforward. However, for a dataset with an unknown size, the data region will be resized dynamically, and automatic pagination may break the report into multiple pages. In my case, I want to ensure that the report title appears on all pages, and that the table of data has a suitable space above and below it on all pages.

To accomplish this, I added a page header and footer to the report and put the report title in the header. This ensures that if the report is paginated, the table on the second (and all subsequent pages) doesn’t start right at the top of the page. Similarly, the report footer ensures that there’s always a space after the table – it never goes all the way to the bottom of the page. I set the BackgroundImage of the report to the cyclist picture (clipped so it doesn’t repeat), and I set the InteractiveSize property of the report so that when viewed in the browser, the report has a maximum size that will keep the tablix well within the background image area. This was made tricky by the fact that Report Designer does not show the background image of the report in design view, so I had to preview the report and assess the right size through trial and error.

Report Designer

Obviously, the report size is optimized for interactive viewing, and though you can set the PageSize property of the report to an appropriate size for any other renderers you plan to use, my experience is that using background images and contrived layouts in reports you intend to render to a different format can result in some pretty horrible looking exported reports. One solution I have used in the past for this is to create the version that’s tailored for online viewing, and include a link to an offline version that has more conventional formatting for printing or exporting.

You can download the sample report I created from here. You’ll also need SQL Server 2008 R2 with Reporting Services (you can get the free Express edition from here) and the AdventureWorksDW2008R2 sample database (which you can get from here).

del.icio.us Tags: ,