记录博客 ZH-BLOG

Python 正则表达式(二)

时间:2018-07-19 18:07:35分类:python 基础

常用方法

1) re.match() 从开始位置开始匹配  返回 match 对象

2) re.fullmatch() 全匹配

3) re.search() 任意位置开始,贪心模式返回第一次匹配

4) re.compile()  返回 pattern

5) re.findall() 以列表形式返回所有匹配值,如果有多个分组则返回元组的列表

search 与 match

>>> s1 = 'i Love python '
>>> s2 = s1 = 'i Love python '
>>> s1 = 'Python,i love'
>>> s2 = 'i love Python'
>>> re.search(r'P[iy][ty]hon', s1)
<_sre.SRE_Match object; span=(0, 6), match='Python'>
>>> re.search(r'P[iy][ty]hon', s2)
<_sre.SRE_Match object; span=(7, 13), match='Python'>
>>> re.match(r'P[iy][ty]hon', s1)
<_sre.SRE_Match object; span=(0, 6), match='Python'>
>>> re.match(r'P[iy][ty]hon', s2)
>>> 

match 等效于 search 使用 ^

>>> re.search(r'^P[iy][ty]hon', s2)
>>> re.match(r'P[iy][ty]hon', s2)

match 中有用的方法

>>> match = re.search(r'[0-9]+', 'my phone 35423874,call me')
>>> match
<_sre.SRE_Match object; span=(9, 17), match='35423874'>
>>> match.group()
'35423874'
>>> match.span()
(9, 17)  # 元组
>>> match.start()
9
>>> match.end()
17
>>> match.span()[0]
9
>>> match.span()[1]
17

group 分组与 ?: 不捕获组

>>> match = re.search(r'([0-9]+).*: (.*)', 'my phone 35423874,call me;Date: 2018-07')
>>> match
<_sre.SRE_Match object; span=(9, 39), match='35423874,call me;Date: 2018-07'>
>>> match.group()
'35423874,call me;Date: 2018-07'
>>> match.group(0)
'35423874,call me;Date: 2018-07'
>>> match.group(1)
'35423874'
>>> match.group(2)
'2018-07'
>>> match.group(1,2)
('35423874', '2018-07')
>>> match = re.search(r'(?:[0-9]+).*: (.*)', 'my phone 35423874,call me;Date: 2018-07')
>>> match
<_sre.SRE_Match object; span=(9, 39), match='35423874,call me;Date: 2018-07'>
>>> match.group()
'35423874,call me;Date: 2018-07'
>>> match.group(1)
'2018-07'
>>> match.groups()
('2018-07',)
>>> 

反向引用组

>>> s = '<html><body><span>hello world</span></body></html>'
>>> match = re.search(r'<([a-z]+)><([a-z]+)><([a-z]+)>.*</\3></\2></\1>', s)
>>> match
<re.Match object; span=(0, 50), match='<html><body><span>Hello World</span></body></html>
>>> match.group(1)
'html'

从左往右,分组为 \1 \2 \3 ...

group 分组命名的引用

>>> s = 'now: 2018-07-19 15:18:20'
>>> match = re.search(r'\b(?P\d\d):(?P\d\d):(?P\d\d)',s)
>>> match
<_sre.SRE_Match object; span=(16, 24), match='15:18:20'>
>>> match.group('hours')
'15'
>>> match.group('minutes')
'18'
>>> match.group('seconds')
'20'
>>> match.span('seconds')
(22, 24)
>>> match.groups()
('15', '18', '20')
>>> match.group(1)
'15'
>>> match.group(0)
'15:18:20'

re.split() 方法

>>> s = 'name: abc, ai: food, desc: this is abc'
>>> re.split(r',* *\w*:', s)
['', ' abc', ' food', ' this is abc']
>>> re.split(r',* *\w*:', s)[1:]
[' abc', ' food', ' this is abc']

re.sub() 方法

>>> s = 'abc123456efg'
>>> re.sub(r'\d+', '0', s)
'abc0efg'
>>> re.sub(r'\d+', '-', s)
'abc-efg'
>>>