<> Partition concept

The word "partition" is not unfamiliar to many students , such as Java In many Middleware , image kafka Partition of ,mysql Partition table, etc , The significance of partitioning lies in the reasonable division of data according to business rules , Facilitate the subsequent efficient processing of data in each partition

<>Hadoop partition

hadoop Partitions in , Is to output different data to different reduceTask , Finally output to different files

hadoop Default partition rule

* hash partition
* according to key of hashCode % reduceTask quantity = Partition number
* default reduceTask Quantity is 1, Of course, you can also driver End setting
Here is Partition Class , It's still easy to understand

hash Partition code demonstration

Here is wordcount In the case driver Partial code , By default, we do not make any settings , The final output of a statistics of the number of words txt file , If we add such a line to this code

After running the following program again , What will happen ?
public class DemoJobDriver { public static void main(String[] args) throws
Exception { //1, obtain job Configuration configuration = new Configuration(); Job
job = Job.getInstance(configuration); //2, set up jar route
job.setJarByClass(DemoJobDriver.class); //3, relation mapper and Reducer
job.setMapperClass(DemoMapper.class); job.setReducerClass(DemoReducer.class);
//4, set up map Exported key/val Type of job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class); //5, Set the final output key / val type
job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class);
//6, Set the final output path String inputPath = "F:\\ Net disk \\csv\\hello.txt"; String outPath =
"F:\\ Net disk \\csv\\wordcount\\hello_result.txt"; // Set the output file to 2 individual
job.setNumReduceTasks(2); FileInputFormat.setInputPaths(job,new
Path(inputPath)); FileOutputFormat.setOutputPath(job,new Path(outPath)); // 7
Submit job boolean result = job.waitForCompletion(true); System.exit(result ? 0 :
1); } }


Can see , Finally output 2 Statistics result files , The contents of each file are different , This is by default , When reducer When the number is set to multiple , Will follow hash The partition algorithm calculates the results and outputs them to the files corresponding to different partitions

<> To customize a partition

* Custom class inheritance Partitioner
* rewrite getPartition method , In this method, different data are controlled to enter different partitions according to business rules
* stay Job In the driver class of , Set custom Partitioner class
* custom Partition after , To customize Partition Logical setting corresponding number of ReduceTask
<> Business requirements

Add the following file The name of the person is based on the last name ,“ horse ” Put the last name in the first partition ,“ Li ” The last name is placed in the second partition , Put the others in the third partition

Custom partition
import org.apache.commons.lang3.StringUtils; import
org.apache.hadoop.io.IntWritable; import
org.apache.hadoop.mapreduce.Partitioner; import org.apache.hadoop.io.Text;
public class MyPartioner extends Partitioner<Text, IntWritable> { @Override
public int getPartition(Text text, IntWritable intWritable, int partion) {
String key = text.toString(); if(StringUtils.isNotEmpty(key.trim())){
if(key.startsWith(" horse ")){ partion = 0; }else if(key.startsWith(" Li ")){ partion =
1; }else { partion = 2; } } return partion; } }
Associate a custom partition to Driver In class , Pay attention here ReduceTasks The number is consistent with the number of customized partitions
job.setNumReduceTasks(3); job.setPartitionerClass(MyPartioner.class);
Run below Driver class , Observe the final output , As expected , Output different last name data to different files

Summary of custom partitions

* If ReduceTask Quantity of > custom partion Number of partitions in , Then several empty output files will be generated
* If 1 < ReduceTask < custom partion Number of partitions in , Some of the data processing cannot find the corresponding partition file storage , Throwing exception
* If ReduceTask = 1 , Regardless of customized partion How many partitions are there in the , The final result will only be given to this one ReduceTask
handle , Only one result file will be generated in the end
* The partition number must be from 0 start , Accumulate one by one

Technology
©2019-2020 Toolsou All rights reserved,
Solve in servlet The Chinese output in is a question mark C String function and character function in language MySQL management 35 A small coup optimization Java performance —— Concise article Seven sorting algorithms (java code ) use Ansible Batch deployment SSH Password free login to remote host according to excel generate create Build table SQL sentence Spring Source code series ( sixteen )Spring merge BeanDefinition Principle of Virtual machine installation Linux course What are the common exception classes ?