14 Graduated in , That's going to be in the current company , Data mining was very popular at that time . In some people's eyes, we are mysterious , I feel that the research is very high-end ; In some people's eyes, he is a handyman , Where do you need to go ; And some people decided that we would blow everything .

   The real situation is when there are data mining projects , Training when there is no project , Do system requirement analysis and product design . It does look high-end , In fact, the job of doing chores and blowing water ~

  4 Years , Most of the time, I'm floating , First of all, I feel that what I do is really high-end , Just say a word that the developers have never heard of , The concepts are relatively new . As the popularity of data mining has declined , Three major events of life ( marry , buy a house , Having a baby ) after , Start looking back , Positioning the present , Looking to the future , Just started to think about what I'm in ? Where is the future ? What needs to be done now ? These years , There's too much domain knowledge about the workplace experience , Life experience is rich enough , Know everything , But I don't know anything well enough , The system of the industry in mind , Workplace and life are urgent need to sort out through the way of words . Only in this way can we be in the first place in the workplace 5 year , Sort out the future 5 Annual planning .

   First, what is data mining ? Tasks of data mining , Problems to be solved and data mining process . Most of this paper is written or business theory , But they have been verified by my industry , I typed it word by word . Something that's incredibly recognized .

   What is data mining : Discover undiscovered useful information from massive data

   Data mining tasks : classification , forecast , relation , clustering

   Problems to be solved in data mining : Massive , High dimension , Scalable , Multi type data : Heterogeneous data and complex data ( To improve performance 【 Efficiency and effectiveness 】 As standard )

   The fields of data mining : Data mining is a comprehensive subject and application


   application ( Improve modeling effect ): statistics , artificial intelligence , Machine learning and pattern recognition

   Basics ( Improve the operation efficiency ): Database technology , Parallel computing , Distributed computing


   Data mining process

   The data mining process described below is a common cross industry data mining process , Good methodology is used , It's half done . This methodology is the process of a data mining project , Including phased goals , Tasks and implementation points . Very operational , It is also an industry recognized standard .



   There are two key points to remember when using the project process :

  1, Data preprocessing of data mining projects may take a lot of work time ;

  2, Data mining project process is not completed at one time, but is continuously iterative optimization , Finally, the optimal result is obtained .

   Business understanding :

  【 Stage objectives 】

   Identify business issues and data mining objectives

   Develop project plan .

  【 Tasks 】

   Business demand research , Understand the background of business issues

   Project environmental assessment , Identify resources needed ( human resources , cost , data , Parties )

   Business objective determination , Define business objectives and success criteria

   Mining target determination , Clear data mining objectives and success criteria

   Project planning , Guide project implementation

  【 Key points of implementation 】

   Sufficient demand investigation and communication ,

   Reasonable resources , Constraint assumptions ,

   Appropriate mining result application scenario setting

   Data understanding :

  【 Stage objectives 】

   Determine the data needed for modeling

   Explore the target variables needed for modeling

  【 Tasks 】

   Compilation of data dictionary , Sort out internal and external data types

   Determination of access diameter , Clear the meaning of data business indicators ( Access caliber of each feature , Access period , Range )

   Mapping rule determination , Clear business rules for data usage

   Quality verification , Ensure data is available

   Target variable exploration , Prepare for model building

  【 Key points of implementation 】

   The necessary internal and external data are available

   Data consistency , Integrity , accuracy

   Preliminary analysis and determination of target factors

   Data preparation :

  【 Stage objectives 】

   Establish data mart or wide table

   Valid load data

  【 Tasks 】

   Data mart or wide table design

  ETL Scripting

   Data cleaning , load , transformation

   Data quality verification

   Data standardization

  【 Key points of implementation 】

   Scientific coding standards guide coding

   Accurate data mapping rules

   Efficient ETL Ensure the progress and quality of the project

   Data modeling :

  【 Stage objectives 】

   Choosing the right technology for modeling

   Achieve the goal of data mining

  【 Tasks 】

   Technology selection , Select the appropriate model algorithm

   Sample selection , Determine training samples , Test samples and validation samples

   Modeling , Filter variables , model training , Model testing

   Model evaluation , Evaluate whether the model meets the data mining objectives

  【 Key points of implementation 】

   The right technology helps to achieve the goal of mining

   Sample data truly reflect business needs

   Variable factors effectively explain business phenomena

   Comprehensive evaluation of model data mining effect

   Model evaluation :

  【 Stage objectives 】

   Test the business application of the model

   Determine whether the business objectives are achieved

  【 Tasks 】

   Model trial , Determine business scenarios , Conduct model application test , Collect feedback

   impact assessment , The test results were evaluated and analyzed , Determine whether the model meets the business objectives

   Marketing advice , According to the trial effect, the marketing rules are extracted and marketing suggestions are given

  【 Key points of implementation 】

   Suitable business scenario trial solution

   Comprehensive and scientific effect evaluation

   Targeted marketing suggestions

   Model deployment :

  【 Stage objectives 】

   Deploy data mining results to business environment , Application in production

  【 Tasks 】

   Planning and deployment , Develop deployment plan and scheme

   Monitoring and maintenance , Real time tracking , Verify achievement of business objectives

   Summary report , Experience accumulation

  【 Key points of implementation 】

   Scientific planning , Ensure seamless deployment

   Real time monitoring and maintenance response , Guarantee operation

   Comprehensive summary and analysis , Accumulate experience

   The skills involved in the process of data mining include business understanding , Data development , Knowledge of statistical artificial intelligence . Need individual to have very strong comprehensive ability , For example, communication skills , For example, business analysis ability , such as SQL technology , Such as mining modeling ability and so on . The charm of data mining is that it needs to constantly broaden the scope of knowledge , Find the best way , Need to communicate with people in the project , Need to know business , Need application technology , You also need to manage the whole project , It's more like a project manager . In the future, we can take the direction of project management and product manager .


©2019-2020 Toolsou All rights reserved,
Keras Summary of training data loading el-select Get selected label value use PyMC3 Bayesian statistical analysis was performed ( code + example )[work] python read txt Last line of file ( Essence )2020 year 7 month 15 day Wechat applet template Use of Theory and formula derivation of univariate linear regression and multiple linear regression vue use THREE.js Create a cube that you can control Redis Counter High concurrency applications Thoughts on multi tenant system Output string at specified position --PTA