Regular as a utility for handling strings , stay Python It is often used in , For example, when crawlers crawl data, they often use regular to retrieve strings, and so on . Regular expressions are already embedded in Python in , By importing re The module can be used , As a beginner Python Most novices have heard ” regular “ This term .

Today, let's share a more detailed report on Python Regular expression dictionary , After learning, you will be proficient in regular expressions .

<> one ,re modular

Before we talk about regular expressions , We first need to know where to use regular expressions . Regular expressions are used in findall() Among methods , Most string retrieval can be done through findall() To complete .

1. Import re modular
Before using regular expressions , Import required re modular .
import re
2.findall() Grammar of :

Imported re The module is ready for use findall() Method , Then we must be clear findall() What is the grammar of .
findall( regular expression , Target string )
It's not hard to see findall() Is composed of a regular expression and a target string , The target string is what you want to retrieve , So how to retrieve is through regular expressions , That's our focus today .

use findall() The result returned is a list , In the list are strings that meet the regular requirements

<> two , regular expression

<>( one ). String matching

1. Ordinary character
Most letters and characters can match themselves .
import re a = "abc123+-*" b = re.findall('abc',a) print(b)
Output results :
2. Metacharacter

Metacharacters refer to . ^ $ ? + {} \ [] Special characters like , Through them, we can personalized retrieve the target string , Return the result we want .

Let me introduce you here 10 Common metacharacters and their usage , Here I'll do it for you first 1 A simple summary , Easy to remember , The following will explain the use of each metacharacter one by one .

(1) []

[] There are three main ways to use :

* Commonly used to specify a character set . s = "a123456b" rule = "a[0-9][1-6][1-6][1-6][1-6][1-6]b"
# Let's use this troublesome method for the time being , There's an easier one in the back , Don't knock so much [1-6] l = re.findall(rule,s) print(l)
The output result is :
* Can represent a range .
For example, to create a string "abcabcaccaac" Selected from abc element :
s = "abcabcaccaac" rule = "a[a,b,c]c" # rule =
"a[a-z0-9][a-z0-9][a-z0-9][a-z0-9]c" l = re.findall(rule, s) print(l)
The output result is :
['abc', 'abc', 'acc', 'aac']
* [] Metacharacters within do not work , Represents only normal characters .
For example, to create a string “caabcabcaabc” Selected from “caa”:
print(re.findall("caa[a,^]", "caa^bcabcaabc"))
The output result is :
Attention : When in [] At the first position of the , Means except a Match everything except , For example, put [] Sum in a Change the position :
print(re.findall("caa[^,a]", "caa^bcabcaabc"))
output :
['caa^', 'caab']

^ Usually used to match the beginning of a line , for example :
print(re.findall("^abca", "abcabcabc"))
Output results :

(3) $
$ Usually used to match the end of a line , for example :
print(re.findall("abc$", "accabcabc"))
Output results :


​ Different characters can be added after the backslash to indicate different special meanings , Common are the following 3 species .

* \d: Matching any decimal number is equivalent to [0-9] print(re.findall("c\d\d\da", "abc123abc"))
The output result is :
\ Can be escaped to normal characters , for example :
print(re.findall("\^abc", "^abc^abc"))
Output results :
['^abc', '^abc']
* s
Match any white space characters, for example :
print(re.findall("\s\s", "a c"))
Output results :
[' ', ' ']
* \w
Match any alphanumeric and underscore , Equivalent to [a-zA-Z0-9_], for example :
print(re.findall("\w\w\w", "abc12_"))
output :
['abc', '12_']


{n} You can avoid repeated writes , For example, we used \w When I wrote 3 second \w, And here we need it {n} Can ,n Indicates the number of matches , for example :
print(re.findall("\w{2}", "abc12_"))
Output results :
['ab', 'c1', '2_']

* Indicates zero or more matches ( Match as many as possible ), for example :
print(re.findall("010-\d*", "010-123456789"))
output :
**(7) + **

+ Indicates one or more matches , for example
print(re.findall("010-\d+", "010-123456789"))
output :
(8) .

. It's a point , It's not obvious here , It is used to manipulate any character except the newline character , for example :
print(re.findall(".", "010\n?!"))
output :
['0', '1', '0', '?', '!']
(9) ?

? Indicates one or zero matches
print(re.findall("010-\d?", "010-123456789"))
output :
Note the greedy mode and the non greedy mode .

Greedy model : Match as much data as possible , As \d Followed by a metacharacter , for example \d*:
print(re.findall("010-\d*", "010-123456789"))
output :
Non greedy model : Try to match as little data as possible , As \d Add after ? for example \d?
print(re.findall("010-\d*?", "010-123456789"))
Output as :
m,n Refers to decimal numbers , Indicates the least repetition m second , Maximum repetition n second , for example :
print(re.findall("010-\d{3,5}", "010-123456789"))
output :
add ? Means to match as little as possible
print(re.findall("010-\d{3,5}?", "010-123456789"))
output :
{m,n} There are other flexible ways to write , such as :

* {1,} Equivalent to the one mentioned earlier + Effect of
* {0,1} Equivalent to the one mentioned earlier ? Effect of
* {0,} Equivalent to the one mentioned earlier * Effect of

Let's stop here about the commonly used metacharacters and how to use them , Let's take a look at other knowledge of regularity .

<>( two ) Regular use

1. Compile regular

stay Python in ,re Modules can be accessed via compile() Method to compile regular ,re.compile( regular expression ), for example :
s = "010-123456789" rule = "010-\d*" rule_compile = re.compile(rule) # Returns an object
# print(rule_compile) s_compile = rule_compile.findall(s) print(s_compile)
# Print compile() What is the returned object
Output results :
2. How to use regular objects

The use of regular objects is not just through what we introduced earlier findall() To use , It can also be used by other methods , The effect is different , Here I make a simple summary :

find re All matching strings , Returns a list

Scan string , Find this re Matching location ( Just the first to find it )

decision re Is it at the beginning of the string ( Match line beginning )

Just take the one above compile() The object returned after compiling the regular is used as an example , We don't need it here findall() , use match() Let's see what the results are :
s = "010-123456789" rule = "010-\d*" rule_compile = re.compile(rule) # Returns an object
# print(rule_compile) s_compile = rule_compile.match(s) print(s_compile) #
Print compile() What is the returned object
output :
<re.Match object; span=(0, 13), match='010-123456789'>
It can be seen that the result is 1 individual match object , Start subscript position is 0~13,match by 010-123456789 . Since the object is returned , So let's talk about this next match
Some operation methods of object .

3.Match object Operation method of

Let's introduce the method first , I'll give you another example later ,Match There are several common ways to use objects :

return re Matching string

Returns the start of the match

Returns the location where the match ended

Returns a tuple :( start , end ) Location of

give an example : use span() Come on search() Operate on the returned object :
s = "010-123456789" rule = "010-\d*" rule_compile = re.compile(rule) # Returns an object
s_compile= rule_compile.match(s) print(s_compile.span()) # use span() Processing returned objects
The result is :
(0, 13) Function of module

re Module in addition to the above findall() Out of function , There are other functions , Let's make an introduction :

Returns all matching strings according to the regular expression , I won't say more about this , I've been introducing it before .

(2)sub( regular , New string , Original string )
sub() The function is to replace a string , for example :
s = "abcabcacc" # Original string l = re.sub("abc","ddd",s) # adopt sub() Processed string print(l)
output :
ddddddacc # hold abc Replace all with ddd
(3)subn( regular , New string , Original string )
subn() Is used to replace strings , And returns the number of replacements
s = "abcabcacc" # Original string l = re.subn("abc","ddd",s) # adopt sub() Processed string print(l)
output :
('bbbbbbacc', 2)
split() Split string , for example :
s = "abcabcacc" l = re.split("b",s) print(l)
Output results :
['a', 'ca', 'cacc']

On regularity , That's all I've said , Regular almost Python It is an essential foundation in all directions , Good luck Python The journey has made achievements !

©2019-2020 Toolsou All rights reserved,
java Comparing attribute values between two objects utilize Python handle Excel data ——xlrd,xlwt library Bidirectional linked list Why? Python Not a future oriented programming language ?Python【 Assignment statement 】 Special lecture , You can't just a=b ah ! Suggest mastering ! utilize Python handle Excel data ——pandas library see SQL-SERVER Data volume and occupied space of database and each table PID The algorithm finally figured out the principle , It was so simple web Two front-end practical games ( Attached source code ) Beginners learn Python Be sure to know what his basic algorithms are ? What is the function ?