Tuning Multi-Dimensional Cube Processing

In my last post (HERE) we talked about troubleshooting and resolving issues with problematic MDX queries. In this post we will look at techniques to tune and troubleshoot the processing side of your Analysis Services cube.

Understanding Cube Processing

Some of the common questions I hear as a consultant are “Why does my cube take 4 hours to process?” or “How can I reduce the time it takes to process my cube?”. The answer to both of these questions starts with identifying the processing bottleneck but before we can start poking around we need to understand how cubes are processed.

Dimensions and Measure Groups are processed independently of one another yet they are intimately linked together. The decisions made when processing dimensions can have a profound impact on Measure Groups (and their partitions). Let’s start by briefly looking at the various processing options:

Process Type Description
ProcessDefault Evaluates the current processed state of each object to bring it into a fully processed state.
ProcessFull It discards the data/contents of each object (dimensions, measure groups, partitions, etc.) and its descendants before rebuilding it.
ProcessData Processes data into the object including attributes, hierarchies and fact data. Any existing data is dropped. No aggregations or indexes are built.
ProcessIndexes Requires that data is already loaded into object. Aggregations and bitmap indexes are build.
ProcessAdd Adds new data to dimensions, measure groups and partitions. No content is updated and Aggregations/Indexes are left intact.
ProcessUpdate Applies only to dimensions. Acts like an incremental load by adding, updates or marking rows as deleted. Flexible aggregations are invalidated when data is either updated or marked for deletion.
ProcessClear Drops all the data from an object and its descendants and leaves it in an unprocessed state.

Identifying the Bottleneck

If you understand at a high level how Analysis Services goes about processing data into the cube, you can begin to dig into where it’s spending time. For our purposes we will break down the process into three distinct parts: loading data from the relational data source, processing data and building aggregations/indexes meaning that the bottleneck will either be in the relational database (SQL Server) or in Analysis Services.

To find and subsequently tune the bottle neck, you work will tools and techniques you are probably already familiar with:

  • SQL Server Profiler – used to build traces and identify queries issued against the relational database during processing
    image
  • Database Tuning Advisor – used to help identify and build optimal indexes need to load data out of the relational db during processing
    image
  • Performance Monitor – used to monitor processor, memory, disk i/o and progress during processing and building of aggregations and indexes. Some of the common Analysis Service performance counters are listed below.
    image image
    Performance Counter Description
    MSOLAP\Processing Rows read/sec
    Rows written/sec
    MSOLAP\Memory Memory
    Quota Blocked
    Memory Limit High KB
    Memory Limit Low KB
    MSOLAP\Proc Indexes Rows Read/sec
    Current Partitions
    MSOLAP\Pro Aggregations Temp file bytes written
    MSOLAP: Threads rocessing pool idle threads
    Processing pool job queue length
    Processing pool busy threads
  • DMVs – helps identify contention issues on loading data from the relational database. The most common DMV to use is sys.dm_os_wait_stats. Using this DMV you can identify:
    1. I/O System: PAGEIOLATCH_XX
    2. Locking: PAGELATCH_XX / LATCH_XX
    3. Network: ASYNC_NETWORK_IO
    4. Parallelism: CXPACKET / SOS_SCHEDULER_YIELD

High-Performance Processing

After you identify the bottleneck either in the relational database or in Analysis Services its time to tune it up. Below are a subset of the best practices and common techniques for each subject-matter area. For a more comprehensive discussion on tuning techniques check out the SQL Server Analysis Server 2008 R2 Performance Guide white paper.

Relational Database

  1. Optimize data access by using views with the TABLOCK, NOLOCK and/or READONLY hints or by using query binding.
  2. For SQL Server I/O issues confirm that physical file/storage layout is configured across multiple drives as appropriate.
  3. Check for and create missing indexes.
  4. Ensure that SQL Server Statistics are up to date
  5. Enable/Use compression
  6. Check data types to ensure appropriate usage
  7. While not strictly part of the underlying relational db add connections by setting the Max Number of Connections on the cube data source (you may need to test with this as too many connections can slow things down).
  8. Also related to the data source, configure the network for jumbo frames and set the network packet size on the data source to 32K (32,767).
  9. Probably the most obvious, check your data schema. A star schema will typically out perform a snowflake schema.

Analysis Services

  1. Reduce the number of dimensions that relate to your measure groups.
  2. Set AttributeHierarchyOptimizedState to false for attributes that are do not typically need fast retrieval.
  3. Unneeded dimension attributes should have their AttributeHierarchyEnabled property set to false.
  4. Remove unnecessary or out of date aggregations.
  5. Avoid processing large unrelated measure groups in parallel (process sequentially).
  6. Choose a partitioning strategy that allows you to process the minimum number of partitions and/or align your cube partitions to the partition strategy used in the relational db.
  7. If you are going to use ProcessUpdate, try and make relationships Rigid when possible.
  8. If process is thread-bound and adjust the ThreadPool max threads
  9. If process is memory constrains particularly during aggregation processing adjust the AggregationMemoryLimitMin/Max values
  10. If processing is I/O bound, you will either need to purchase faster disk (go Solid State) or consider a strategy such as “short-stroking” disk (using only 20% of disk capacity)

Wrap-Up

Much like tuning MDX queries, tuning and troubleshooting the processing of a cube is part Art, part Science. Every step along the way you will find multiple ways to solve problems, each has its own tradeoffs.

To be successful make sure you use a methodical step-based approach. Document and test at every iteration and avoid trying to push too many buttons and turn too many knobs at one time.

If you have any tips you would like me to add, please shoot me an email or find me on Twitter and I’ll be happy to add them.

Till next time!

Chris

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s