Friday, 1 May 2009

SQL Server 2008 Star Join Query Optimization

As Business Intelligence (BI) has become ever more important in helping business to maintain their competitive edge, the databases that handle BI queries have had to cope with a corresponding increase in their workload. Nowadays, it is not uncommon for queries that are run against large databases such as data warehouses and data marts to return millions of rows, and when you consider that these types of queries frequently involve joining multiple tables, it’s easy to see why users often experience extended query response times. SQL Server 2008 Enterprise Edition addresses this issue with a new feature called ‘Star Join Query Optimization’, which is a mechanism that uses bitmap filtering to improve the performance of certain types of queries by making the retrieval of rows from fact tables more efficient.

Improved Query Response Times
Data warehouses generally use dimensionally modelled star or snowflake schemas, which feature one or more fact tables that contain transactional data, and various dimension tables which contain the information (such as product data, customer information, and times and dates) that gives meaning to the fact table data. Foreign keys are usually used to maintain the relationships between the rows in fact tables and those in the dimension tables. SQL Server 2008 Enterprise can recognize databases that have star schemas and use the new Star Join Query logic to process queries against them more quickly, with the result that typical data warehouse queries run 15 to 20 percent faster on average.

Automatically Implemented
Star Join Query Optimization is implemented automatically by SQL Server, and requires no special database or application configuration. The query processor will usually elect to optimize queries that are of medium selectivity (that is, those that retrieve between approximately 10% and 75% of the rows from a fact table). Such queries are generally handled by using hash joins to join the dimension and fact tables, using the foreign keys to identify matching rows. Hash joins work by building a hash table for each dimension table referenced in the query, and the optimization process uses these hash tables to derive bitmap filters. Bitmap filters identify the key values from each dimension table that qualify for inclusion in the query. The bitmap filters are applied to the fact table as it is scanned and they act to eliminate the fact table rows that do not qualify for inclusion in the result set. The most selective bitmap filter is applied first because it eliminates the greatest number of rows. Because the eliminated rows require no further processing, subsequent filters do not have to be applied to them, and the efficiency of the process is improved.

Query Selectivity
Bitmap filtering improves performance in medium selectivity queries because it filters out rows early on, before any joins are implemented and as a result, the number of rows that are processed by each join is reduced. Bitmap filtering is not generally used when queries are very highly selective (returning fewer than 10% of the rows in a fact table) because a nested loop join will usually be more efficient. Likewise, on queries that are not very selective at all (returning more than 75% of the rows in a fact table), bitmap filtering will not be used because there are very few rows to be filtered out, so there is no performance benefit to be gained.

Integer Data Types
Star join optimization is most efficient when the columns used in joins are of the integer data type because this enables the filter to be applied as part of the initial table or index scan rather than at a later stage in the query plan. Because it is common practice to implement foreign key relationships by using integer based alternate key columns, the majority of queries should be able to benefit from star join optimization.

No comments: