<>python Reptile learning 19

after urllib And requests Library Learning , I believe we have python The reptile has a preliminary grasp , Next, we will learn how to use regular expressions ( Remember the hole dug before ?).

<> three , regular expression

stay requests Library Learning , We can use relevant methods to obtain the web source code , obtain HTML code . But the data we really want is hidden in HTML In the code , Through the study of regular expressions , We can use it from HTML Get the information we want from the code .

<>3-1. Instance introduction

Open source China provides a regular expression testing tool , Enter text to match , Then select the regular expression that is commonly used , The corresponding matching results can be obtained :

Open this website , Enter a text , Suppose we want to match the URL:

Similarly , We want to match the phone numbers :

This is regular matching , Isn't it amazing ?

Study it in detail , We found that if we want to match URL, Then we need to use the following regular expression :
# matching URL: [a-zA-Z]+://[^\s]*
At first glance, it looks like something in a mess , In fact, there are corresponding rules :

For example, among them a-z Represents matching any lowercase letter ,\s Represents matching any white space character ,* Represents any number of characters matched in front .

Write the above expression with corresponding rules , The program will take it and look for strings that meet the rules we write in the messy strings .

Attached table below :

End today , Continue tomorrow …

