python -- 正则表达式_python

一、什么是正则

Ⅰ、正则的目的

Ⅱ、正则表达式优缺点

二、re模块基本用法

Ⅰ、re模块

Ⅱ、re 模块基本用法

re.search （从任意位置查找匹配第一个）

re.match -- (只匹配字符串的开始)

r'sanle' 中的r代表的是raw（原始字符串）

match.group(default=0)：返回匹配的字符串。

findall和finditer：找到多个匹配

re.sub('匹配正则','替换内容','string')

编译正则：re.compile('匹配正则')

三、基本正则匹配

Ⅰ、则基本正则匹配 - 区间匹配

区间：[]

匹配a或b：a|b

取反：[^abc]

任意字符：“.”占位符

快捷方式

开始与结束：^, $

一、什么是正则 Ⅰ、正则的目的

1.数据挖掘：从一大推文本中找到一小堆文本。

2.验证：使用正则确认获得的数据是否是期望值。eg:用户名是否合法等

注意：非必要时慎用正则，有更简单的方法就不使用正则

指定一个匹配规则，来识别该规则是否在一个更大的文本字符串中。eg:grep "xxx" 文件

正则表达式可以识别匹配规则的文本是否存在，还能将一个规则分解为一个或多个子规则，并展示每个子规则匹配的文本。

Ⅱ、正则表达式优缺点 • 优点：提高工作效率、节省代码 • 缺点：复杂，难于理解二、re模块基本用法 Ⅰ、re模块从文本中匹配某些子串。 • 官方文档： https://docs.python.org/3/library/re.html • 安装： python标准库，无需要安装，直接导入即可使用。（import re） Ⅱ、re 模块基本用法 1. match与search: 查找第一个匹配 re.search （从任意位置查找匹配第一个） • 查找匹配项 • 接受一个正则表达式和字符串，并返回发现的第一个匹配。 • 如果完全没有找到匹配，re.search返回 None

>>> import re
>>> re.search("sanchuang","hello world this is sanchuang")
<_sre.SRE_Match object; span=(20, 29), match='sanchuang'>
>>> re.search("sanchuang","hello world this is")
>>> result = re.search("sanchuang","hello world this is")
>>> print(result)
None
>>> re.search("sanchuang","hello world sanchuang this is")
<_sre.SRE_Match object; span=(12, 21), match='sanchuang'>
>>> re.search("sanchuang","hello world,sanchuang  this is sanchuang")
<_sre.SRE_Match object; span=(12, 21), match='sanchuang'>
>>>

re.match -- (只匹配字符串的开始) • 从字符串头查找匹配项 • 接受一个正则表达式和字符串，从主串第一个字符开始匹配，并返回发现的第一个匹配。 • 如果字符串开始不符合正则表达式，则匹配失败，re.match返回 None

>>> re.search(r"\\\\tsanle","hello\\\\tsanle")
<_sre.SRE_Match object; span=(5, 13), match='\tsanle'>
>>> re.search("\\\\tsanle","hello\\\\tsanle")
<_sre.SRE_Match object; span=(6, 13), match='\tsanle'>
>>> re.search("\\tsanle","hello\\\tsanle")
<_sre.SRE_Match object; span=(6, 12), match='\tsanle'>
>>>

r'sanle' 中的r代表的是raw（原始字符串） • 原始字符串与正常字符串的区别是原始字符串不会将\字符解释成一个转义字符 • 正则表达式使用原始字符很常见且有用

>>> msg = "It's raining cats and dogs"
>>> match = re.search(r"cats",msg)
>>> match.start()
13
>>> match.end()
17
>>>

match.group(default=0)：返回匹配的字符串。 • group 是由于正则表达式可以分拆为多个只调出匹配子集的子组。 • 0是默认参数，表示匹配的整个串，n 表示第n个分组 re模块基本用法-match对象 match.start() • start方法提供了原始字符串中匹配开始的索引 match.end() • end方法提供了原始字符串中匹配开始的索引 re模块基本用法-match对象 match.groups() • groups返回一个包含所有小组字符串的元组，从 1 到所含的小组号

>>> import re
>>> re.match("sanchuang","sanchuang hello world this is")
<_sre.SRE_Match object; span=(0, 9), match='sanchuang'>
>>> result1 = re.match("sanchuang","hello world this is ")
>>> print(result1)
None
>>> result2 = re.match("sanchuang","hello world,sanchuang  this is sanchuang")
>>> print(result2)
None
>>>>>> match.groups()
()
>>> match = re.search(r"(cats)",msg)
>>> match.groups()
('cats',)
>>>

findall和finditer：找到多个匹配 re.findall • 查找并返回匹配的字符串，返回一个列表 re.finditer • 查找并返回匹配的字符串，返回一个迭代器能用for i in 循环的都是迭代器

msg = "It's raining cats and dogs, cats1 cats2"
result = re.findall("cats",msg)
print(result)
result2 = re.finditer("cats",msg)
print(result2)
for i in result2:
    print(i.group())

结果：
['cats', 'cats', 'cats']

cats
cats
cats


msg = "It's raining cats and dogs, cats1 cats2"
result3 = re.finditer("cats",msg)
print(list(result3))

结果：
[, , ]

re.sub('匹配正则','替换内容','string') • 将string中匹配的内容替换为新内容

msg = "I am learning python"
print(re.sub("python","PYTHON",msg))

结果：
I am learning PYTHON

编译正则：re.compile('匹配正则')

msg = "I am learning python"
msg2 = "I am learning Enligsh"
msg3 = "hello world"
reg = re.compile("python")
print(reg.findall(msg))
print(reg.findall(msg2))
print(reg.findall(msg3))
print(re.findall("python",msg))

编译正则的特点：

• 复杂的正则可复用。 • 使用编译正则更方便，省略了参数。 • re模块缓存它即席编译的正则表达式，因此在大多数情况下，使用compile并没有很大的性能优势三、基本正则匹配基本正 Ⅰ、则基本正则匹配 - 区间匹配区间：[] • 正则匹配区分大小写 • 匹配所有字母：[a-zA-Z] • 匹配所有字母及-：[a-zA-Z\-]

ret = re.findall("python","Pyhton 3 python")
print(ret)
ret1 = re.findall("[Ppfg]ython","Python 3 python fython Fython")
print(ret1)
ret2  = re.findall("[a-zA-Z\-]","abcABC-123-")
print(ret2)

结果：
['python']
['Python', 'python', 'fython']
['a', 'b', 'c', 'A', 'B', 'C', '-', '-']

匹配a或b：a|b • 匹配cat或dog

msg = "It's raining cats and dogs"
ret = re.search("cats|dogs",msg)
print(ret.group())
ret1 = re.findall("cats|dogs",msg)
print(ret1)

结果：
cats
['cats', 'dogs']

 #re.search  查找匹配第一个    #re.findall  查找匹配全部

取反：[^abc] • 匹配a+非小写字母

ret = re.findall("[0-z]","lab3cb3ala#>=？！aB")
print(ret)
ret1 = re.findall("[^0-9A-Za-z]","lab3cb3al#>=?!aB")
print(ret1)
ret2 = re.findall("a[^a-z]","lab3cb3al#>=?!aB")
print(ret2)

结果：
['l', 'a', 'b', '3', 'c', 'b', '3', 'a', 'l', 'a', '>', '=', 'a', 'B']
['#', '>', '=', '?', '!']
['aB']

任意字符：“.”占位符 • 匹配任何（除\n外）的单个字符，它仅仅只以出现在方括号字符组以外

ret = re.findall("p.thon","python pYTHON Python pYthon Pthon p=thon")
print(ret)
ret = re.findall("p.thon","python pYTHON Python pYthon Pthon p thon p\nthon")
print(ret)

结果：
['python', 'pYthon', 'p=thon']
['python', 'pYthon', 'p thon']

快捷方式

快捷标识	功能
\A	匹配字符串开始
\bword\b	词边界
\w	匹配包括下划线的任何单词字符。等价于'[A-Za-z0-9_]'
\W	匹配任何非单词字符。等价于 '[^A-Za-z0-9_]'
\d	匹配一个数字字符。等价于 [0-9]
\D	匹配一个非数字字符。等价于 [^0-9]
\s	匹配任何空白字符，包括空格、制表符、换页符等等。等价于 [ \f\n\r\t\v]
\S	匹配任何非空白字符。等价于 [^ \f\n\r\t\v]

例子如下：（使用快捷键得加“r”）

## \bword\b ## ---数字、字符、下划线不算做边界
ret = re.finditer(r"\bworld","hello world 123world =world  world123 ##world## abcworldabc")
print(list(ret))
ret1 = re.finditer(r"world\b","hello world 123world =world  world123 ##world## abcworldabc")
print(list(ret1))
ret2 = re.finditer(r"\bworld\b","hello world 123world =world  world123 ##world## abcworldabc")
print(list(ret2))

结果：
[, , , ]
[, , , ]
[, , ]

 \B 匹配一个前后都无单词边界的字符串

ret = re.finditer(r"\Bworld\B","hello _world world123 123world =world ##world## abcworldabc")
print(list(ret))

结果：
[]

# \w \W
ret = re.findall(r'\w',"python3#")
print(ret)
ret = re.findall(r'\W',"python3#")
print(ret)

结果：
['p', 'y', 't', 'h', 'o', 'n', '3']
['#']

开始与结束：^, $ • 匹配以python开头：^python • 匹配以python结尾：python$

ret = re.findall("^python","hello python")
print(ret)
ret1 = re.findall("^python","python123#")
print(ret1)
ret2 = re.findall("python$","hello python")
print(ret2)
ret3 = re.findall("^python$","hello python")
print(ret3)

结果：
[]
['python']
['python']
[]

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/717981.html

python -- 正则表达式

发表评论

评论列表（0条）