Overview
• Efficient Computation of Data Cubes
– General Strategies for Cube Computation – Multiway Array Aggregation for Full Cube Computation – Computing Iceberg Cubes
• BUC
– High-dimensional OLAP: A Minimal Cubing Approach – Computing Cubes with Complex Conditions
• Exploration and Discovery in Multidimensional Databases
– Discovery-Driven
• Summary
General Strategies for Cube Computation
• Sorting, hashing, and grouping operations are applied to the dimension attributes in order to reorder and cluster related tuples • Aggregates may be computed from previously computed aggregates, rather than from the base fact table
– Smallest-child: computing a cuboid from the smallest, previously computed cuboid – Cache-results: caching results of a cuboid from which other cuboids are computed to reduce disk I/Os – Amortize-scans: computing as many as possible cuboids at the same time to amortize disk reads – Share-sorts: sharing sorting costs cross multiple cuboids when sort-based method is used – Share-partitions: sharing the partitioning cost across multiple cuboids when hash-based algorithms are used
Multiway Array Aggregation for Full Cube Computation
• Computes a full data cube by using a multidimensional array as basic structure • Typical MOLAP approach
– Partition the array into chunks
• Chunk: A subcube small enough to fit into memory
– Compute aggregates by visiting cube cells
• The order in which the cube cells are visited can be optimized to reduce memory access, storage cost
Consider a 3-D data array containing three dimensions: A, B, C - Each dimension is divided into 4 chunks - A (a0, a1, a2, a3), B (b0, b1, b2, b3), C (c0, c1, c2, c3) - A has 40 different values, B has 400, and C has 4000 - So each partition of A has size of 10, B has 100, and C has 1000 - Full cube computation requires to compute - The base cuboid ABC which is already computed (the 3-D array) - The 2-D cuboids AB, AC, BC - The 1-D