Someone asked : For College Students python When a reptile can earn money in a month 3000 Do you ? If so, how long will it take to work every day ? How long will it take for self-study to reach this level ?

Before I graduated , Once used scrapy Got a lot of data , Sold to many companies . Basically back 3 In, we mainly lived by selling data , At that time, the income was more than the salary .

The threshold of reptiles has been greatly reduced in the past two years , Many companies already have full-time employees IT Reptilian , And many companies have developed special crawler tools to do some basic crawler work and sell them to the demander , The salary of low-end crawlers is also gradually decreasing , At the same time, the demand for some outsourcing in the market is also greatly reduced .

But like monthly income 3000 This is not a problem , But you should have relatively mature business skills , For example, the mainstream crawler technology on the market , Ability to understand requirements , Ability to communicate with customers, etc
, Maybe you haven't touched these things yet , How to cultivate these abilities ?

1. College Students :
Preferably mathematics or computer related major , If the programming ability is OK , Take a look at reptile knowledge , It mainly involves a crawler Library of a language ,html analysis , Content storage, etc ; Due to the lack of engineering experience of students in school , It is recommended that only a few data capture items be taken , Don't pick up some monitoring projects , Or large-scale grab items . take your time , Don't take too many steps .

2. In service personnel :
If you're a reptile engineer , Private work is easy . If you're not , It doesn't matter . Just do it IT of , It shouldn't be difficult to learn about reptiles a little . The advantage of on-the-job personnel is to be familiar with the project development process , Rich engineering experience , Be able to understand the difficulty of a task , time , Make a reasonable assessment of the cost . You can try to take some large-scale capture tasks , Monitoring tasks , The mobile terminal simulates login and grabs tasks, etc , The profit is considerable .

from 0 At the beginning , In general, it can be divided into three stages :

* Phase I introduction , Master the necessary basic knowledge , such as Python Basics , Basic principle of network request, etc ;

Phase II imitate , Follow someone else's crawler code , Understand each line of code , Familiar with mainstream crawler tools ;

Phase III Do it yourself , At this stage, you begin to have your own ideas for solving problems , The crawler system can be designed independently .

The technology involved in crawler includes but is not limited to proficiency in a programming language ( Here with Python Take reptiles as an example ) HTML knowledge ,HTTP
Basic knowledge of protocol , regular expression , Database knowledge , Use of common packet capture tools , Use of crawler framework , It involves large-scale reptiles , You also need to understand the concept of distributed , Message queue , Common data structures and algorithms , cache , It even includes the application of machine learning , Large scale systems are supported by many technologies .

Data analysis , excavate , Even machine learning is inseparable from data , Data is often obtained by crawlers , therefore , Even if we study reptile as a major, it has a great future .

Here are some books I read when I taught myself to crawl , Some of them can be read quickly , Some need intensive reading , And the actual combat part of the project , Operation is required :

《Python Crawler development and project practice 》

This book is divided into basic chapters , Intermediate article , In depth article , The knowledge and skills required in crawler development are explained in depth .
This book is suitable for beginners , Existing explanations of basic knowledge points , It also involves the analysis and solution of key problems and difficulties .

<> Basic chapter

<> The first 1 chapter review Python programming

* install Python
* Build development environment
* IO programming
* Processes and threads
* Network programming
<> The first 2 chapter Web Front end Foundation

* W3C standard
* HTTP standard
* Summary
<> The first 3 chapter New web crawler

* Web crawler overview
* HTTP Requested Python realization
* Summary

<> The first 4 chapter HTML Analytic method

* First acquaintance Firebug
* regular expression
* Powerful BeautifulSoup
* Summary
<> The first 5 chapter data storage ( No database version )

* HTML Text extraction
* Multimedia file extraction
* Email remind
* Summary
<> The first 6 chapter Actual combat project : Basic reptile

* Basic crawler architecture and operation process
* URL Manager
* HTML Downloader
* HTML Parser
* Data storage
* Crawler scheduler
* Summary

<> The first 7 chapter Actual combat project : Simple distributed crawler

* Simple distributed crawler structure
* Control node
* Crawler node
* Summary
<> Intermediate article

<> The first 8 chapter data storage ( Database version )

* SQLite
* More suitable for reptiles MongoDB
* …

<> The first 9 chapter Dynamic website crawling

* Ajax And dynamic HTML
* Dynamic crawler 1: Crawling movie review information
* PhantomJS
* Selenium
* Dynamic crawler 1: Where to climb
* …
<> The first 10 chapter Web End protocol analysis

* Web login POST analysis
* Verification code problem
* www>m>wap
* …
<> The first 11 chapter Terminal protocol analysis

* PC Client packet capture analysis
* APP Packet capture analysis
* API Reptile : Crawling mp3 resources
<> The first 12 chapter First glimpse Scrapy Crawler frame

* Scrapy Crawler architecture
* install Scrapy
* establish cnblogs project
* Create crawler module
* selector
* Command line tools
* definition Item
* Page turning function
* structure Item Pipeline
* Built in data storage
* Built in picture and file download method
* Start crawler
* Enhanced reptile
* …

<> The first 13 chapter thorough Scrapy Crawler frame

* Look again Spider
* Item Loader
* Look again Item Pipeline
* Request and response
* Downloader Middleware
* Spider middleware
* extend
* Breakthrough anti reptile
* …
<> The first 14 chapter Actual combat project :Scrapy Reptile

* Create Zhihu crawler
* definition Item
* Create crawler module
* Pipeline
* Optimization measures
* Deploy crawler
* …
<> In depth article

<> The first 15 chapter Incremental reptile

* De duplication scheme
* BloomFilter algorithm
* Scrapy And BloomFilter
* …

<> The first 16 chapter Distributed crawler and Scrapy

* Redis Basics
* Python and Redis
* MongoDB colony
* …
<> The first 17 chapter Project practice :Scrapy Distributed

* Create Yunqi academy crawler
* definition Item
* Write crawler module
* Pipeline
* Anti reptile mechanism
* De duplication optimization
* …

<> The first 18 chapter hommization PySpider Crawler frame

* PySpider And Scrapy
* install PySpider
* Create watercress crawler
* selector
* Ajax and HTTP request
* PySpider and PhantomJS
* data storage
* PySpider Crawler architecture
* …
more Python data :

be careful : Learning materials lie in refinement , It's not about more , More is not a good thing , As a programmer , Everyone's study time is too precious , We're going to 80% Time investment is the most valuable 20% Learning content .

above Python Free sharing of information , Friends in need can scan below via wechat CSDN Official authentication QR code , Free collection Python data !

©2019-2020 Toolsou All rights reserved,
evo Tool usage problems ——Degenerate covariance rank, Umeyama alignment is not possible Experiment 4 Automated test tools - software test mysql Export data sql sentence _mysql according to sql Query statement export data Create a thread ——— Javaweb (3) Data structure experiment ( three )—— Stacks and queues TS stay vue2 Writing in the project web Front end signature plug-in _signature_pad Plug in implements electronic signature function docker Where is the image stored Qt Getting Started tutorial 【 Basic controls 】QCalendarWidget calendar control springboot How to get reality in ip address