Regular as a utility for handling strings , stay Python It is often used in , For example, when crawlers crawl data, they often use regular to retrieve strings, and so on . Regular expressions are already embedded in Python in , By importing re The module can be used , As a beginner Python Most novices have heard ” regular “ This term .

Today, let's share a more detailed report on Python Regular expression dictionary , After learning, you will be proficient in regular expressions .

<> one ,re modular

Before we talk about regular expressions , We first need to know where to use regular expressions . Regular expressions are used in findall() Among methods , Most string retrieval can be done through findall() To complete .

1. Import re modular
Before using regular expressions , Import required re modular .
import re
2.findall() Grammar of :

Imported re The module is ready for use findall() Method , Then we must be clear findall() What is the grammar of .
findall( regular expression , Target string )
It's not hard to see findall() Is composed of a regular expression and a target string , The target string is what you want to retrieve , So how to retrieve is through regular expressions , That's our focus today .

use findall() The result returned is a list , In the list are strings that meet the regular requirements

<> two , regular expression

<>( one ). String matching

1. Ordinary character
Most letters and characters can match themselves .
import re a = "abc123+-*" b = re.findall('abc',a) print(b)
Output results :
['abc']
2. Metacharacter

Metacharacters refer to . ^ $ ? + {} \ [] Special characters like , Through them, we can personalized retrieve the target string , Return the result we want .

Let me introduce you here 10 Common metacharacters and their usage , Here I'll do it for you first 1 A simple summary , Easy to remember , The following will explain the use of each metacharacter one by one .

(1) []

[] There are three main ways to use :

* Commonly used to specify a character set . s = "a123456b" rule = "a[0-9][1-6][1-6][1-6][1-6][1-6]b"
# Let's use this troublesome method for the time being , There's an easier one in the back , Don't knock so much [1-6] l = re.findall(rule,s) print(l)
The output result is :
['a123456b']
* Can represent a range .
For example, to create a string "abcabcaccaac" Selected from abc element :
s = "abcabcaccaac" rule = "a[a,b,c]c" # rule =
"a[a-z0-9][a-z0-9][a-z0-9][a-z0-9]c" l = re.findall(rule, s) print(l)
The output result is :
['abc', 'abc', 'acc', 'aac']
* [] Metacharacters within do not work , Represents only normal characters .
For example, to create a string “caabcabcaabc” Selected from “caa”:
print(re.findall("caa[a,^]", "caa^bcabcaabc"))
The output result is :
['caa^']
Attention : When in [] At the first position of the , Means except a Match everything except , For example, put [] Sum in a Change the position :
print(re.findall("caa[^,a]", "caa^bcabcaabc"))
output :
['caa^', 'caab']
(2)^

^ Usually used to match the beginning of a line , for example :
print(re.findall("^abca", "abcabcabc"))
Output results :
['abca']

(3) $
$ Usually used to match the end of a line , for example :
print(re.findall("abc$", "accabcabc"))
Output results :
['abc']

(4)\

​ Different characters can be added after the backslash to indicate different special meanings , Common are the following 3 species .

* \d: Matching any decimal number is equivalent to [0-9] print(re.findall("c\d\d\da", "abc123abc"))
The output result is :
['c123a']
\ Can be escaped to normal characters , for example :
print(re.findall("\^abc", "^abc^abc"))
Output results :
['^abc', '^abc']
* s
Match any white space characters, for example :
print(re.findall("\s\s", "a c"))
Output results :
[' ', ' ']
* \w
Match any alphanumeric and underscore , Equivalent to [a-zA-Z0-9_], for example :
print(re.findall("\w\w\w", "abc12_"))
output :
['abc', '12_']

(5){n}

{n} You can avoid repeated writes , For example, we used \w When I wrote 3 second \w, And here we need it {n} Can ,n Indicates the number of matches , for example :
print(re.findall("\w{2}", "abc12_"))
Output results :
['ab', 'c1', '2_']
(6)*

* Indicates zero or more matches ( Match as many as possible ), for example :
print(re.findall("010-\d*", "010-123456789"))
output :
['010-123456789']
**(7) + **

+ Indicates one or more matches , for example
print(re.findall("010-\d+", "010-123456789"))
output :
['010-123456789']
(8) .

. It's a point , It's not obvious here , It is used to manipulate any character except the newline character , for example :
print(re.findall(".", "010\n?!"))
output :
['0', '1', '0', '?', '!']
(9) ?

? Indicates one or zero matches
print(re.findall("010-\d?", "010-123456789"))
output :
['010-1']
Note the greedy mode and the non greedy mode .

Greedy model : Match as much data as possible , As \d Followed by a metacharacter , for example \d*:
print(re.findall("010-\d*", "010-123456789"))
output :
['010-123456789']
Non greedy model : Try to match as little data as possible , As \d Add after ? for example \d?
print(re.findall("010-\d*?", "010-123456789"))
Output as :
['010-']
(10){m,n}
m,n Refers to decimal numbers , Indicates the least repetition m second , Maximum repetition n second , for example :
print(re.findall("010-\d{3,5}", "010-123456789"))
output :
['010-12345']
add ? Means to match as little as possible
print(re.findall("010-\d{3,5}?", "010-123456789"))
output :
['010-123']
{m,n} There are other flexible ways to write , such as :

* {1,} Equivalent to the one mentioned earlier + Effect of
* {0,1} Equivalent to the one mentioned earlier ? Effect of
* {0,} Equivalent to the one mentioned earlier * Effect of

Let's stop here about the commonly used metacharacters and how to use them , Let's take a look at other knowledge of regularity .

<>( two ) Regular use

1. Compile regular

stay Python in ,re Modules can be accessed via compile() Method to compile regular ,re.compile( regular expression ), for example :
s = "010-123456789" rule = "010-\d*" rule_compile = re.compile(rule) # Returns an object
# print(rule_compile) s_compile = rule_compile.findall(s) print(s_compile)
# Print compile() What is the returned object
Output results :
['010-123456789']
2. How to use regular objects

The use of regular objects is not just through what we introduced earlier findall() To use , It can also be used by other methods , The effect is different , Here I make a simple summary :

(1)findall()
find re All matching strings , Returns a list

(2)search()
Scan string , Find this re Matching location ( Just the first to find it )

(3)match()
decision re Is it at the beginning of the string ( Match line beginning )

Just take the one above compile() The object returned after compiling the regular is used as an example , We don't need it here findall() , use match() Let's see what the results are :
s = "010-123456789" rule = "010-\d*" rule_compile = re.compile(rule) # Returns an object
# print(rule_compile) s_compile = rule_compile.match(s) print(s_compile) #
Print compile() What is the returned object
output :
<re.Match object; span=(0, 13), match='010-123456789'>
It can be seen that the result is 1 individual match object , Start subscript position is 0~13,match by 010-123456789 . Since the object is returned , So let's talk about this next match
Some operation methods of object .

3.Match object Operation method of

Let's introduce the method first , I'll give you another example later ,Match There are several common ways to use objects :

(1)group()
return re Matching string

(2)start()
Returns the start of the match

(3)end()
Returns the location where the match ended

(4)span()
Returns a tuple :( start , end ) Location of

give an example : use span() Come on search() Operate on the returned object :
s = "010-123456789" rule = "010-\d*" rule_compile = re.compile(rule) # Returns an object
s_compile= rule_compile.match(s) print(s_compile.span()) # use span() Processing returned objects
The result is :
(0, 13)
4.re Function of module

re Module in addition to the above findall() Out of function , There are other functions , Let's make an introduction :

(1)findall()
Returns all matching strings according to the regular expression , I won't say more about this , I've been introducing it before .

(2)sub( regular , New string , Original string )
sub() The function is to replace a string , for example :
s = "abcabcacc" # Original string l = re.sub("abc","ddd",s) # adopt sub() Processed string print(l)
output :
ddddddacc # hold abc Replace all with ddd
(3)subn( regular , New string , Original string )
subn() Is used to replace strings , And returns the number of replacements
s = "abcabcacc" # Original string l = re.subn("abc","ddd",s) # adopt sub() Processed string print(l)
output :
('bbbbbbacc', 2)
(4)split()
split() Split string , for example :
s = "abcabcacc" l = re.split("b",s) print(l)
Output results :
['a', 'ca', 'cacc']

On regularity , That's all I've said , Regular almost Python It is an essential foundation in all directions , Good luck Python The journey has made achievements !

Technology
©2019-2020 Toolsou All rights reserved,
【Java8 New features 1】Lambda Expression summary What is a process , Concept of process ?hdfs dfs Common basic commands java When creating objects, you must _Java Basic test questions Generation of random numbers + Figure guessing game When will we enter the old generation ?HDFS Common commands ( summary ) It is never recommended to spend a lot of time studying for employment pythonmacOS Big Sur The installation could not be completed Big Sur Why can't I install it ?Python pyttsx3| Text reading ( Various languages )