Hive As data warehouse , Similar to relational database development process , All need to be modeled first , So called modeling , It is to specify the relationship between tables . Modeling in hive It can be roughly divided into star type , Snowflake and constellation . We should have a deep understanding of modeling , First of all, we need to hive In the storehouse
Table concept .hive The table in is divided into internal table , External table , Bucket Watch , Partition table . In data logic, it is divided into dimension table and fact table . Dimension table is equivalent to dictionary table . The fact table is a data table outside the dictionary table .

1.1 Star model

Multiple dimension tables , A fact sheet , There is no relationship between dimension tables . Better query performance , Redundant storage . Star model is used more often .

1.2 Snowflake Schema

Snowflake is an extension of star modeling , There are relationships between dimension tables . Storage reduces redundancy , Loss of query performance , Multi level connection required . In common with the star model, there is only one fact table .

1.3 Constellation model

Constellation is also an extension of star model , There are multiple fact tables .


Problem expansion

For example, the information of a business system does not provide a summary table , It's a new business , You need to add a data table . When we count business volume , We have to take dozens of data tables for association , Report execution efficiency is very poor , After that, we directly help the source system to make a summary table of new business on the data warehouse , It improves the processing efficiency of follow-up reports , It also shields the impact of new business varieties on the report .

Here we can consider the basic characteristics of our book warehouse management design , This is the concept of layering :

* Clear data structure
* Data lineage tracking
* Reduce duplication of development
* Simplify complex problems
* Shield the exception of original data
* Influence of shielding business , You don't have to change the service once, you need to re access
Here we can follow these concepts , Which of our basic data is divided into several storehouses , Combined with our business requirements, we build the market layer of each business layer by layer ; Can effectively reduce the cost of our later operation and maintenance ;


( Four ) Application scenarios

The main advantage of star model design is to improve query efficiency , Because the generated fact table has been preprocessed , The main data is in the fact table , So as long as you scan the real-time table, you can do a lot of queries , You don't have to do a lot of it join, Secondly, the dimension table data is generally less , stay join It can be directly put into memory join To improve efficiency , besides , The fact table of star model is more readable , You can get most of the core information without associating multiple tables , Design and maintenance is relatively simple .

The design of snowflake model is more in line with the concept of database paradigm , The design is more formal , Less data redundancy , However, it may be necessary during query join Multiple tables lead to query efficiency degradation , In addition, standardized operation is more complex in the later maintenance .

( Five ) summary

Through the above comparison , We can find that the data warehouse is more suitable to use star model to build the underlying data Hive surface , Through a lot of redundancy to improve query efficiency , Star model pair OLAP Analysis engine support is more friendly , This point lies in Kylin It can be reflected in Chinese . And the snowflake model in relational database such as MySQL,Oracle Very common in , Especially the database table of e-commerce . In the data warehouse, the application scenarios of snowflake model are relatively few , But it's not without it , So in the specific design , We can consider whether we can combine the advantages of the two to participate in the design , In order to achieve the optimization of the design .

©2019-2020 Toolsou All rights reserved,
AWS Elastic Compute Cloud (Amazon EC2) Study notes ControllerAdvice Common scenes of PTA-MOOC《Python Programming Zhejiang University 》 Puzzle A Chapter 2 programming problems Dart In Isolatejetson Nano debugging CSI Interface camera , Combined use opencv Turn on the camera Knowledge points of preparing for the Blue Bridge Cup find method Force button 496 The next bigger element IBert/Transformer Calculation of model parameters 2020 year Java industry , General analysis of market environment and position Lambda Expression and functional interface