<> Write it at the front

For distributed crawler learning , Or for technology learning , There is no shortcut , There are two ways to go , First, practice repeatedly , What makes perfect ; Second, look at the code shared by others and learn how to write it over and over again , Until you do it yourself .

I believe that you can simply run the distributed crawler , You may find that the distributed crawler is a mental adjustment , There are not too many changes in the way the code is written , But we need to know that we use scrapy-redis Direct build distributed crawler , So I stood on the shoulders of my predecessors and climbed the wall , But as a first step to understanding distributed crawlers “ structure ”, We have achieved this milestone , Next, we need to make this milestone more solid , Easy to climb back .

I'll use it next 3 Some cases , Repeated distributed crawler writing , Until there's no way .

Find a website for reference today , Everyone is a product manager , Sorry about the statement , Learning purpose , Crawling data will be deleted in time .

<> Building distributed crawler

<> Create a scrapy Reptile Engineering

scrapy.exe startproject woshipm

Create a crawler project with the above command , Pay attention if your scrapy.exe No environment variables are configured , Please navigate to the scrapy.exe And then execute the command

<> Create a CrawlSpider Crawler files for

D:\python100\venv\Scripts\scrapy.exe genspider -t crawl pm xxx.com

When creating a crawler file with the above command , Be sure to pay attention to the spider Inside the folder , without , You need to copy the generated pm.py file , Paste to spider Inside the folder

The above two steps are completed , The current directory structure is as follows

Technology
©2019-2020 Toolsou All rights reserved,
Hundreds of millions of locusts rarely collide Locusts want to be self driving Heroes Share has surpassed Ningde Era !LG Chemical confirmation to spin off battery business unit TypeScript Data types in is enough Python Garbage collection and memory leak msf Generate Trojan horse attack android mobile phone Element-UI Implementation of secondary packaging TreeSelect Tree drop-down selection component element-ui+vue-treeselect Verification of drop down box Spring Boot Lesson 16 :SpringBoot Implementation of multithreading with injection class A guess number of small games , use JavaScript realization Unity3D Input Key system