Wednesday 16 February 2011

SQL Server Denali - Query Acceleration for Data Warehouses with ColumnStore Indexes

ColumnStore indexes are a new feature of SQL Server Denali that boost query performance in data warehouses. Although this feature is not a part of the latest CTP release, it is a significant development given the energy that Microsoft has expended in recent years to compete with its main rivals as a serious platform for enterprise data warehousing and business intelligence (BI). With the SQL Server 2008 and SQL Server 2008 R2 releases, Microsoft introduced features such as data compression, improved performance for partitioned table and indexes (http://cm-bloggers.blogspot.com/2009/04/table-and-index-partitioning-in-sql.html), star join query optimization (http://cm-bloggers.blogspot.com/2009/05/sql-server-2008-star-join-query.html), the Fast Track reference architectures (http://cm-bloggers.blogspot.com/2009/06/fast-track-data-warehouse-reference.html), Resource Governor (http://cm-bloggers.blogspot.com/2009/05/sql-server-2008-resource-governor.html), and the new Parallel Data Warehouse (PDW) offering. These have given SQL Server the manageability, scalability, and fast, predictable performance times required to deliver true enterprise data warehousing. Meantime, PowerPivot technologies, deeper integration with SharePoint and Office, Report Builder (http://cm-bloggers.blogspot.com/2009/01/sql-server-reporting-services-report.html), and a whole host of improvements to Reporting Services and Analysis Services have helped to put Microsoft in the leading pack for BI solutions.

ColumnStore indexes should be seen in the light of these earlier developments as a way of further boosting data warehouse performance, and by extension, BI performance. Early reports suggest this could be by factors of ten or more and even by up to one thousand times in some cases, so it really does have the potential to seriously speed up long-running queries.

ColumnStore indexes use database pages to store their data, just like standard indexes; However, unlike the pages of standard indexes, which store rows of data, the pages in a ColumnStore index store only the contents of individual columns. Pages are grouped into sets, one set for each column in the table. For example, for a table with five columns, a ColumnStore index would have five sets of pages, one set for each column in the table. Storing data in this way improves disk read performance because it enables SQL Server to retrieve and place in memory only the data columns that are specified in the query. Unnecessary columns of data that would also be returned in a standard index are not returned because they are not present in the index data pages.

Compression on ColumnStore indexes also has a big part to play in boosting performance. This is because columns generally tend to contain repetitive data and one way in which SQL Server compression works is by removing exactly this kind of redundant, repeated data. As a result, once it is loaded into memory, the index takes up much less space and is therefore much less likely to be paged out, ensuring that it remains available for subsequent queries that reference the same columns. Many queries that are run against data warehouses centre around a limited set of columns which are contained in fact tables, so a few carefully planned ColumnStore indexes could result in a major performance boost for these types of queries.

I wasn't able to try ColumnStore indexes out for myself, as the CTP doesn't seem to support them, as my attempt to create one shows:





However if Microsoft's performance figures translate into real-world scenarios, then this should add even more strength to an already compelling data warehousing / BI package. For more information, go to: http://download.microsoft.com/download/8/C/1/8C1CE06B-DE2F-40D1-9C5C-3EE521C25CE9/Columnstore%20Indexes%20for%20Fast%20DW%20QP%20SQL%20Server%2011.pdf




No comments: