28 5 月, 2024

Manufacturing

Processing Machinery

Regular expressions for advanced Python basics

6 min read

One.match
re.match matches a pattern from the beginning of the string,

The re.match method returns a matching object if the match is successful, otherwise it returns None.

re.match(pattern, string, flags=0)
pattern: The regular expression to match.

string: The string to match.

flags: Flag bits, used to control the matching method of regular expressions, such as: whether to be case-sensitive, multi-line matching, etc.

A regular expression can contain some optional flag modifiers to control the pattern matched. Modifiers are specified as an optional flag. Multiple flags can be specified by bitwise OR ( | ), for example: re.L | re.M .

Modifier

describe

re.I

Make matching case insensitive

re.L

Do locale-aware matching

re.M

multiline match, affects ^ and $

re.S

make . match all characters including newlines

re.U

Parse characters according to the Unicode character set. This flag affects \w, \W, \b, \B.

re.X

For readability, ignore spaces and comments after ‘ # ‘

import re
print(re.match(‘www’, ‘www.baidu.com’))
# Match at the beginning return <re.Match object; span=(0, 3), match=’www’>

print(re.match(‘com’, ‘www.baidu.com’))
# do not match at the beginning return None

print(re.match(‘www’, ‘www.baidu.com’).span())
# The starting position of the matching character returns (0, 3)

print(re.match(‘www’, ‘www.baidu.com’).group())
# Get the matching string and return www
Two. search
re.search scans the entire string and returns the first successful match,

The re.search method returns a matching object if the match is successful, otherwise it returns None.

re. search(pattern, string, flags=0)
pattern: The regular expression to match.

string: The string to match.

flags: Flag bits, used to control the matching method of regular expressions, such as: whether to be case-sensitive, multi-line matching, etc.

import re
print(re. search(‘www’, ‘www.baidu.com’))
# Match at the beginning return <re.Match object; span=(0, 3), match=’www’>

print(re. search(‘com’, ‘www.baidu.com’))
# do not match at the start return <re.Match object; span=(0, 3), match=’www’>

print(re.search(‘www’, ‘www.baidu.com’).span())
# The starting position of the matching character returns (0, 3)

print(re. search(‘com’, ‘www.baidu.com’). group())
# Get the matching string and return com
Notice:

re.match only matches the beginning of the string, if the beginning of the string does not match the regular expression, the match fails and the function returns None; while re.search matches the entire string until a match is found.

three.sub
re.sub is used to replace matches in a string, and the result returns the replaced string.

re.sub(pattern, repl, string, count=0, flags=0)
pattern : The pattern string in the regex.

repl : The replacement string, which can also be a function.

string : The original string to be searched and replaced.

count : The maximum number of replacements after pattern matching, default 0 means replace all matches.

flags: Flag bits, used to control the matching method of regular expressions, such as: whether to be case-sensitive, multi-line matching, etc.

import re

print(re.sub(‘w’, ‘www’, ‘w.baidu.com’))
# Match the successful replacement string (replace 1 time by default) Return www.baidu.com

print(re.sub(‘w’, ‘WWW’, ‘w.baidu.www’, 2))
# Successfully match and replace the specified number of strings Return WWW.baidu.WWWww
four.subn
re.subn is the same as sub, both are used to replace the matching items in the string, but the result of subn returns a tuple containing the replaced string and the number of replacements.

re.subn(pattern, repl, string, count=0, flags=0)
pattern : The pattern string in the regex.

repl : The replacement string, which can also be a function.

string : The original string to be searched and replaced.

count : The maximum number of replacements after pattern matching, default 0 means replace all matches.

flags: Flag bits, used to control the matching method of regular expressions, such as: whether to be case-sensitive, multi-line matching, etc.

import re

print(re. subn(‘w’, ‘www’, ‘w.baidu.com’))
# Match successful replacement string (replace 1 time by default) return (www.baidu.com,1)

print(re.subn(‘w’, ‘WWW’, ‘w.baidu.www’, 2))
# Successfully match and replace the specified number of strings Return (WWW.baidu.WWWww,2)
Five.compile
The compile function is used to compile the regular expression and generate a regular expression ( Pattern ) object,

Used by two functions, match() and search().

re.compile(pattern[, flags])
pattern : a regular expression as a string

flags : optional, indicating the matching mode, such as ignoring case, multi-line mode, etc.

import re

x = re.compile(‘www’)
print(re.match(x,’www.baidu.com’))
# <re. Match object; span=(0, 3), match=’www’>

y = re.compile(‘com’)
print(re.search(y,’www.baidu.com’))
# <re.Match object; span=(10, 13), match=’com’>
Six. findall
Find all the substrings matched by the regular expression in the string, and return a list, if there are multiple matching patterns,

returns a list of tuples, or an empty list if no match is found.

findall(string[, pos[, endpos]])
string : The string to match.

pos : optional parameter, specify the starting position of the string, the default is 0.

endpos : Optional parameter, specify the end position of the string, the default is the length of the string.

import re

x = re.compile(‘www’)
print(re.findall(x,’www.baidu.www’))
# [‘www’, ‘www’]

x = re.compile(‘yyy’)
print(re.findall(x,’www.baidu.www’))
# []
Notice:

match and search match once and findall matches all.

Seven.finditer
Find all substrings matched by the regular expression in a string and return them as an iterator.

re.finditer(pattern, string, flags=0)
pattern: the regular expression to match

string: The string to match.

flags: Flag bits, used to control the matching method of regular expressions, such as: whether to be case-sensitive, multi-line matching, etc.

import re

x = re.compile(‘www’)
y = re.finditer(x,’www.baidu.www’)
for i in y:
print(i, i. group())

# <re. Match object; span=(0, 3), match=’www’> www
# <re. Match object; span=(10, 13), match=’www’> www
Eight.split
The split method splits the string according to the matching substrings and returns a list.

re. split(pattern, string[, maxsplit=0, flags=0])
pattern: the regular expression to match

string: The string to match.

maxsplit: the number of splits, maxsplit=1 split once, the default is 0, unlimited times.

flags: Flag bits, used to control the matching method of regular expressions, such as: whether to be case-sensitive, multi-line matching, etc.

x = re.compile(r’\.|:’)
print(re. split(x, ‘www.baidu:com’))
# [‘www’, ‘baidu’, ‘com’]
Nine. Expression mode
model

describe

^

Matches a string beginning with .

$

Matches the end of the string.

.

Matches any single character except ‘\n’. To match ‘\n’ use a pattern like ‘[.\n]’.

+

Match the previous character 1 time or unlimited times, that is, at least 1 time.

*

It can be optional if the previous character appears 0 times or infinitely.

?

Match the previous character 1 time or 0 times, that is, either 1 time or none.

|

Matches either left or right expression.

[ ]

Matches the characters listed in [ ].

( )

Match the characters in parentheses as a group.

{ }

Matches the number of occurrences of the preceding character.

\A

Used to match the beginning of a character, equivalent to ^ .

\b

Matches the end of a word, including various whitespace characters or the end of a string.

\B

Match non-boundary characters.

\d

Matches a numeric character. Equivalent to [0-9] .

\D

Matches a non-numeric character. Equivalent to [^0-9] .

\s

Matches any whitespace character, including spaces, tabs, form feeds, and so on. Equivalent to [ \f\n\r\t\v] .

\S

Matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v] .

\w

Matches any word character including an underscore. Equivalent to ‘[A-Za-z0-9_]’.

\W

Matches any non-word character. Equivalent to ‘[^A-Za-z0-9_]’ .

\Z

Used to match the end of a character, equivalent to $ .

10. Expression application
1. The character ‘^’ matches the beginning of the specified string, and returns the matching string.

import re

x = re.compile(r’^w’)
for i in [‘www’, ‘ywy’]:
print(re. search(x, i))

# <re. Match object; span=(0, 1), match=’w’>
# None
2. The character ‘$’ matches the end of the specified string, and returns a string that matches successfully.

import re

x = re.compile(r’w$’)
for i in [‘www’, ‘ywy’]:
print(re. search(x, i))

# <re.Match object; span=(2, 3), match=’w’>
# None
3. The character ‘.’ represents any character, but it does not represent the character ‘\n’ unless otherwise specified.

import re

x = re.compile(r’1.3′)
for i in [‘12333’, ‘1\t333’, ‘1\n333’