One . Non functional test

Due to the application of big data to specific industries , In addition to functional testing , Non functional testing is required under the whole big data processing framework , The following :

a. performance testing

Performance is the most critical dimension to evaluate a big data analysis system , Big data system performance mainly includes throughput , Task completion time , Memory utilization and other indicators , Processing capacity of big data analysis platform , Resource utilization capacity and other performance . Available through Hadoop Performance monitor to monitor performance indicators and bottlenecks , Performance test is carried out in an automated way , Test the performance of the system under different load conditions .

b. Fault Tolerance Testing

Automatic recovery from partial failure , And it will not affect the overall performance , In particular , When a fault occurs , Big data analysis system should continue to operate in an acceptable way while recovering , You can continue to operate to some extent when an error occurs , Design solutions and specific deployment according to application scenarios , Then test manually .

c. Usability test

High availability is an indispensable feature of big data analysis , So as to ensure the continuity of data application business . Big data high availability is critical for many applications , Strict testing and validation required , Mainly manual test .

d. Extensibility test

Elastic scalability is particularly important for file systems in the era of big data , File system extensibility test mainly includes testing system elastic extensibility ( extend / Retraction ) And the performance impact of the extended system , Verify linear scalability , Mainly manual test .

e. Stability test

Big data analysis system usually runs continuously for a long time , The importance of stability is self-evident , The stability test mainly verifies that the system has been running for a long time (7/30/180/365*24) With permission , Whether the system can still operate normally , Whether the function is normal . Stability testing is usually carried out in an automated manner ,LTP,10ZONE,POSTMARK,FIO And other tools generate load on the test system , Verification function is also required .

f. Deployment mode test

Big data has scale-out Characteristics of , Ability to build large scale , High performance file system cluster . For different applications and Solutions , There will be significant differences in the way file systems are deployed ;

Deployment mode test needs to test the system deployment mode in different scenarios , Include auto install configuration , Cluster size , hardware configuration ( The server , storage , network ), Automatic load balancing, etc , This part of the test is unlikely to be automated , Need to design solutions and specific deployment according to application scenarios , Conduct manual test again .

g. Data consistency test

Data consistency here means that the data in the file system is consistent with the data before being written from outside , That is to say, the written data is consistent with the read data . Data consistency can show that file system can guarantee data integrity , No data loss or data errors , This is the most basic function of file system , Test available diff,md5sum Script automated tests ,LTP It also provides testing tools for data consistency .( Penguin Group can be added when receiving information :1039627644 remarks )

h. Pressure test

The load capacity of big data analysis system is limited , When the system is overloaded , There may be performance degradation in the system , Abnormal function , Access denied, etc . Pressure test is to verify that the system is under high pressure , Including data multi client , high OPS pressure , high IOPS/ Throughput pressure , Whether the system can still operate normally , Whether the function is normal , System resource consumption , So as to provide basis for big data operation .

Two , Functional test

The data function mainly involves the implementation of the system for big data analysis application POSIX API, Including file reading and access control , Metadata operations , Lock operation and other functions ;

Big data analysis system POSIX Semantic differences , Implemented file system API It's different , The function test shall cover the implementation of big data system API And function points ;

Heavy workload of function test , The application of automatic test method should be emphasized , Combined with manual test supplement , Recommended automation tools ltp,fstest and locktests.

In the process of processing big data on multiple nodes , Because ‘ Useless data ’ And data quality problems . Functional testing is mainly used to identify data problems caused by coding errors or node configuration errors .

It includes the following stages :

a. Data import / Preprocessing verification stage

According to the specific application background and business requirements , Various data sources such as network logs , Internet of things , Social network and Internet text and files are loaded to HDFS Pending . In this process, it may be caused by incorrect or non copied , Wrong data caused by storage , In this case , The test can be carried out in the following ways :

1. Comparison between input file and source file , Ensure data consistency ;

2. Ensure the accuracy of data acquisition according to data requirements ;

3. Verify that the file is loaded correctly HDFS, And is split , Copy to different data nodes .

b.MapReduce Data output verification phase

When the data is loaded HDFS after ,mapreduce Start processing data from different data sources . It can happen in the process mapreduce Coding in the process of processing , If it works correctly on a single node , Incorrect operation on multiple nodes , Include incorrect aggregation , Node configuration , Output format, etc . For this stage of the problem , The following verification methods can be used :

1. Verify the normal completion of data processing , Output file obtained normally ;

2. Verifying the business logic of big data on a single node , enter

The same verification is performed on multiple nodes ;

3. verification mapreduce Processing key/value Correct production ;

4. stay reduce Verify that the aggregation and merging of data are correct after the process ;

5. Verify the output data through the source file to ensure the correct completion of data processing ;

6. According to the needs of big data business , Verify that the format of the output data file meets the requirements .

c. Verify big data ETL To data warehouse

When mapreduce After the end of the process , The generated data output files will be moved to data warehouse or other transactional systems as required . In the process , Conversion rules may be applied improperly , from HDFS Problems caused by incomplete data extracted in . For the problems in this stage, the following methods can be adopted :

1. Verify that the conversion rules are applied correctly ;

2. By comparing target table data with HDFS File data to verify data corruption ;

3. Verify whether the target system data is loaded successfully ;

4. Verify the data integrity of the target system .

d. Validation analysis report

From data warehouse or Hive Data from , Analysis report can be obtained through report tool ; This process may cause report data problems that the report definition cannot meet the requirements ; In this process, query can be used to verify whether the report meets the business requirements .

©2019-2020 Toolsou All rights reserved,
java Four functional interfaces ( a key , simple )os Simple use of module Browser kernel ( understand ) Some East 14 Pay change 16 salary , Sincerity or routine ?HashMap Explain in detail It's unexpected Python Cherry tree (turtle The gorgeous style of Library )html Writing about cherry trees , Writing about cherry trees