<> Write it at the front
For distributed crawler learning , Or for technology learning , There is no shortcut , There are two ways to go , First, practice repeatedly , What makes perfect ; Second, look at the code shared by others and learn how to write it over and over again , Until you do it yourself .
I believe that you can simply run the distributed crawler , You may find that the distributed crawler is a mental adjustment , There are not too many changes in the way the code is written , But we need to know that we use scrapy-redis Direct build distributed crawler , So I stood on the shoulders of my predecessors and climbed the wall , But as a first step to understanding distributed crawlers “ structure ”, We have achieved this milestone , Next, we need to make this milestone more solid , Easy to climb back .
I'll use it next 3 Some cases , Repeated distributed crawler writing , Until there's no way .
Find a website for reference today , Everyone is a product manager , Sorry about the statement , Learning purpose , Crawling data will be deleted in time .
<> Building distributed crawler
<> Create a scrapy Reptile Engineering
scrapy.exe startproject woshipm
Create a crawler project with the above command , Pay attention if your scrapy.exe No environment variables are configured , Please navigate to the scrapy.exe And then execute the command
<> Create a CrawlSpider Crawler files for
D:\python100\venv\Scripts\scrapy.exe genspider -t crawl pm xxx.com
When creating a crawler file with the above command , Be sure to pay attention to the spider Inside the folder , without , You need to copy the generated pm.py file , Paste to spider Inside the folder
The above two steps are completed , The current directory structure is as follows