标签:
数据产品测试与其他产品的不同之处:
根据精确、及时的数据分析,用户做出决策;
整合、频繁的检索数据大于存储;
数据需要及时、准确;
需要维护大量的历史数据;
检索的性能;
数据的安全性。
数据产品测试的阻碍:
性能问题、过期数据、功能问题、可扩展问题;
业务主要关注end reports;很多整合实现工作、缺少专业的文档--)重要的业务逻辑隐藏在复杂的架构中;忽略白盒测试的重要性;在设计阶段缺少测试的参;与缺少知识共享、过程的不成熟-------客户流失
数据量、复杂度不断变化;
上游数据改变直接影响整合的过程,需要修改现有模型、转换逻辑;
上游数据的质量问题;
实时分析需要及时的数据;
测试过程的建议:
requirement & analysis(分析数据来源)、design & coding(获取数据、实现业务逻辑和纬度建模、建立 填充多纬度数据集)、QA & Deployment(得出报告)
设计测试的过程
A realistic example for this can be to acquire last 5 years product sales data from United States for a Company (here this rule should be taken while designing the system as it doesn’t make sense to acquire all the data if the customer wants to see reports based on only last 5 year data from United States)
Verifying metadata which includes constraints like Nulls, Default Values, PKs, Check Constraints, Referential Integrity (PK-FK relationship), Surrogate keys/ Natural keys, Cardinality (1:1, m: n) etc
Ensuring the traceability from: Data Sources -> Staging -> Data Warehouse -> Data Marts -> Cube -> Reports
Dimensional approach enables a relational database to emulate analytical functionality of a multidimensional database and makes the data warehouse easier for the user to understand & use. Also, the retrieval of data from the data warehouse tends to operate very quickly. In the dimensional approach, transaction data are partitioned into either "facts” or "dimensions".
For example, a sales transaction can be broken up into facts such as the number of products ordered and the price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order.
In the normalized approach, the data in the data warehouse are stored as per the database normalization rules. Tables are grouped together by subject areas that reflect general data categories (e.g., data on customers, products, finance, etc.) The main advantage of this approach is that it is straightforward to add information into the database.
Data warehousing procedures can subdivide an ETL process into smaller pieces running sequentially or in parallel in a specific order. The opted path can have a direct impact on the performance and scalability of the system
Entire data can be pulled from the source every time or only the delta since the last run can be considered to reduce the network movement of huge amount of data for each run.
Adventures with Testing BI/DW Application
标签:
原文地址:http://www.cnblogs.com/stay-sober/p/4387322.html