标签:
Table - This is definition of hive tables as source of cubes, which must be synced before building cubes.
Data Model - This describes a
STAR SCHEMA data model, which defines fact/lookup tables and filter condition.
Cube Descriptor - This describes definition and settings for a cube instance, defining which data model to use, what dimensions and measures to have, how to partition to segments and how to handle auto-merge etc.
Cube Instance - This is instance of cube, built from one cube descriptor, and consist of one or more cube segments according partition settings.
Partition - User can define a DATE/STRING column as partition column on cube descriptor, to separate one cube into several segments with different date periods.
Cube Segment - This is actual carrier of cube data, and maps to a HTable in HBase. One building job creates one new segment for the cube instance. Once data change on specified data period, we can refresh related segments to avoid rebuilding
whole cube.
Aggregation Group - Each aggregation group is subset of dimensions, and build cuboid with combinations inside. It aims at pruning for optimization.
Derived - On lookup tables, some dimensions could be generated from its PK, so there’s specific mapping between them and FK from fact table. So those dimensions are DERIVED and don’t participate in cuboid generation.
标签:
原文地址:http://blog.csdn.net/jianghuxiaojin/article/details/51508662