One . Non functional test
Due to the application of big data to specific industries , In addition to functional testing , Non functional testing is required under the whole big data processing framework , The following ：
a. performance testing
Performance is the most critical dimension to evaluate a big data analysis system , Big data system performance mainly includes throughput , Task completion time , Memory utilization and other indicators , Processing capacity of big data analysis platform , Resource utilization capacity and other performance . Available through Hadoop Performance monitor to monitor performance indicators and bottlenecks , Performance test is carried out in an automated way , Test the performance of the system under different load conditions .
b. Fault Tolerance Testing
Automatic recovery from partial failure , And it will not affect the overall performance , In particular , When a fault occurs , Big data analysis system should continue to operate in an acceptable way while recovering , You can continue to operate to some extent when an error occurs , Design solutions and specific deployment according to application scenarios , Then test manually .
c. Usability test
High availability is an indispensable feature of big data analysis , So as to ensure the continuity of data application business . Big data high availability is critical for many applications , Strict testing and validation required , Mainly manual test .
d. Extensibility test
Elastic scalability is particularly important for file systems in the era of big data , File system extensibility test mainly includes testing system elastic extensibility ( extend / Retraction ) And the performance impact of the extended system , Verify linear scalability , Mainly manual test .
e. Stability test
Big data analysis system usually runs continuously for a long time , The importance of stability is self-evident , The stability test mainly verifies that the system has been running for a long time (7/30/180/365*24) With permission , Whether the system can still operate normally , Whether the function is normal . Stability testing is usually carried out in an automated manner ,LTP,10ZONE,POSTMARK,FIO And other tools generate load on the test system , Verification function is also required .
f. Deployment mode test
Big data has scale-out Characteristics of , Ability to build large scale , High performance file system cluster . For different applications and Solutions , There will be significant differences in the way file systems are deployed ;
Deployment mode test needs to test the system deployment mode in different scenarios , Include auto install configuration , Cluster size , hardware configuration ( The server , storage , network ), Automatic load balancing, etc , This part of the test is unlikely to be automated , Need to design solutions and specific deployment according to application scenarios , Conduct manual test again .
g. Data consistency test
Data consistency here means that the data in the file system is consistent with the data before being written from outside , That is to say, the written data is consistent with the read data . Data consistency can show that file system can guarantee data integrity , No data loss or data errors , This is the most basic function of file system , Test available diff,md5sum Script automated tests ,LTP It also provides testing tools for data consistency .（ Penguin Group can be added when receiving information ：1039627644 remarks ）
h. Pressure test
The load capacity of big data analysis system is limited , When the system is overloaded , There may be performance degradation in the system , Abnormal function , Access denied, etc . Pressure test is to verify that the system is under high pressure , Including data multi client , high OPS pressure , high IOPS/ Throughput pressure , Whether the system can still operate normally , Whether the function is normal , System resource consumption , So as to provide basis for big data operation .
Two , Functional test
The data function mainly involves the implementation of the system for big data analysis application POSIX API, Including file reading and access control , Metadata operations , Lock operation and other functions ;
Big data analysis system POSIX Semantic differences , Implemented file system API It's different , The function test shall cover the implementation of big data system API And function points ;
Heavy workload of function test , The application of automatic test method should be emphasized , Combined with manual test supplement , Recommended automation tools ltp,fstest and locktests.
In the process of processing big data on multiple nodes , Because ‘ Useless data ’ And data quality problems . Functional testing is mainly used to identify data problems caused by coding errors or node configuration errors .
It includes the following stages ：
a. Data import / Preprocessing verification stage
According to the specific application background and business requirements , Various data sources such as network logs , Internet of things , Social network and Internet text and files are loaded to HDFS Pending . In this process, it may be caused by incorrect or non copied , Wrong data caused by storage , In this case , The test can be carried out in the following ways ：
1. Comparison between input file and source file , Ensure data consistency ;
2. Ensure the accuracy of data acquisition according to data requirements ;
3. Verify that the file is loaded correctly HDFS, And is split , Copy to different data nodes .
b.MapReduce Data output verification phase
When the data is loaded HDFS after ,mapreduce Start processing data from different data sources . It can happen in the process mapreduce Coding in the process of processing , If it works correctly on a single node , Incorrect operation on multiple nodes , Include incorrect aggregation , Node configuration , Output format, etc . For this stage of the problem , The following verification methods can be used ：
1. Verify the normal completion of data processing , Output file obtained normally ;
2. Verifying the business logic of big data on a single node , enter
The same verification is performed on multiple nodes ;
3. verification mapreduce Processing key/value Correct production ;
4. stay reduce Verify that the aggregation and merging of data are correct after the process ;
5. Verify the output data through the source file to ensure the correct completion of data processing ;
6. According to the needs of big data business , Verify that the format of the output data file meets the requirements .
c. Verify big data ETL To data warehouse
When mapreduce After the end of the process , The generated data output files will be moved to data warehouse or other transactional systems as required . In the process , Conversion rules may be applied improperly , from HDFS Problems caused by incomplete data extracted in . For the problems in this stage, the following methods can be adopted ：
1. Verify that the conversion rules are applied correctly ;
2. By comparing target table data with HDFS File data to verify data corruption ;
3. Verify whether the target system data is loaded successfully ;
4. Verify the data integrity of the target system .
d. Validation analysis report
From data warehouse or Hive Data from , Analysis report can be obtained through report tool ; This process may cause report data problems that the report definition cannot meet the requirements ; In this process, query can be used to verify whether the report meets the business requirements .