14 Graduated in , That's going to be in the current company , Data mining was very popular at that time . In some people's eyes, we are mysterious , I feel that the research is very high-end ; In some people's eyes, he is a handyman , Where do you need to go ; And some people decided that we would blow everything .
The real situation is when there are data mining projects , Training when there is no project , Do system requirement analysis and product design . It does look high-end , In fact, the job of doing chores and blowing water ~
4 Years , Most of the time, I'm floating , First of all, I feel that what I do is really high-end , Just say a word that the developers have never heard of , The concepts are relatively new . As the popularity of data mining has declined , Three major events of life ( marry , buy a house , Having a baby ) after , Start looking back , Positioning the present , Looking to the future , Just started to think about what I'm in ? Where is the future ? What needs to be done now ? These years , There's too much domain knowledge about the workplace experience , Life experience is rich enough , Know everything , But I don't know anything well enough , The system of the industry in mind , Workplace and life are urgent need to sort out through the way of words . Only in this way can we be in the first place in the workplace 5 year , Sort out the future 5 Annual planning .
First, what is data mining ? Tasks of data mining , Problems to be solved and data mining process . Most of this paper is written or business theory , But they have been verified by my industry , I typed it word by word . Something that's incredibly recognized .
What is data mining : Discover undiscovered useful information from massive data
Data mining tasks : classification , forecast , relation , clustering
Problems to be solved in data mining : Massive , High dimension , Scalable , Multi type data : Heterogeneous data and complex data ( To improve performance 【 Efficiency and effectiveness 】 As standard )
The fields of data mining : Data mining is a comprehensive subject and application
{
application ( Improve modeling effect ): statistics , artificial intelligence , Machine learning and pattern recognition
Basics ( Improve the operation efficiency ): Database technology , Parallel computing , Distributed computing
}
Data mining process
The data mining process described below is a common cross industry data mining process , Good methodology is used , It's half done . This methodology is the process of a data mining project , Including phased goals , Tasks and implementation points . Very operational , It is also an industry recognized standard .
There are two key points to remember when using the project process :
1, Data preprocessing of data mining projects may take a lot of work time ;
2, Data mining project process is not completed at one time, but is continuously iterative optimization , Finally, the optimal result is obtained .
Business understanding :
【 Stage objectives 】
Identify business issues and data mining objectives
Develop project plan .
【 Tasks 】
Business demand research , Understand the background of business issues
Project environmental assessment , Identify resources needed ( human resources , cost , data , Parties )
Business objective determination , Define business objectives and success criteria
Mining target determination , Clear data mining objectives and success criteria
Project planning , Guide project implementation
【 Key points of implementation 】
Sufficient demand investigation and communication ,
Reasonable resources , Constraint assumptions ,
Appropriate mining result application scenario setting
Data understanding :
【 Stage objectives 】
Determine the data needed for modeling
Explore the target variables needed for modeling
【 Tasks 】
Compilation of data dictionary , Sort out internal and external data types
Determination of access diameter , Clear the meaning of data business indicators ( Access caliber of each feature , Access period , Range )
Mapping rule determination , Clear business rules for data usage
Quality verification , Ensure data is available
Target variable exploration , Prepare for model building
【 Key points of implementation 】
The necessary internal and external data are available
Data consistency , Integrity , accuracy
Preliminary analysis and determination of target factors
Data preparation :
【 Stage objectives 】
Establish data mart or wide table
Valid load data
【 Tasks 】
Data mart or wide table design
ETL Scripting
Data cleaning , load , transformation
Data quality verification
Data standardization
【 Key points of implementation 】
Scientific coding standards guide coding
Accurate data mapping rules
Efficient ETL Ensure the progress and quality of the project
Data modeling :
【 Stage objectives 】
Choosing the right technology for modeling
Achieve the goal of data mining
【 Tasks 】
Technology selection , Select the appropriate model algorithm
Sample selection , Determine training samples , Test samples and validation samples
Modeling , Filter variables , model training , Model testing
Model evaluation , Evaluate whether the model meets the data mining objectives
【 Key points of implementation 】
The right technology helps to achieve the goal of mining
Sample data truly reflect business needs
Variable factors effectively explain business phenomena
Comprehensive evaluation of model data mining effect
Model evaluation :
【 Stage objectives 】
Test the business application of the model
Determine whether the business objectives are achieved
【 Tasks 】
Model trial , Determine business scenarios , Conduct model application test , Collect feedback
impact assessment , The test results were evaluated and analyzed , Determine whether the model meets the business objectives
Marketing advice , According to the trial effect, the marketing rules are extracted and marketing suggestions are given
【 Key points of implementation 】
Suitable business scenario trial solution
Comprehensive and scientific effect evaluation
Targeted marketing suggestions
Model deployment :
【 Stage objectives 】
Deploy data mining results to business environment , Application in production
【 Tasks 】
Planning and deployment , Develop deployment plan and scheme
Monitoring and maintenance , Real time tracking , Verify achievement of business objectives
Summary report , Experience accumulation
【 Key points of implementation 】
Scientific planning , Ensure seamless deployment
Real time monitoring and maintenance response , Guarantee operation
Comprehensive summary and analysis , Accumulate experience
The skills involved in the process of data mining include business understanding , Data development , Knowledge of statistical artificial intelligence . Need individual to have very strong comprehensive ability , For example, communication skills , For example, business analysis ability , such as SQL technology , Such as mining modeling ability and so on . The charm of data mining is that it needs to constantly broaden the scope of knowledge , Find the best way , Need to communicate with people in the project , Need to know business , Need application technology , You also need to manage the whole project , It's more like a project manager . In the future, we can take the direction of project management and product manager .
Technology
Daily Recommendation