有史以来最伟大的正则表达式技巧? python精简实验_python

缘起

The Best Regex Trick
==========

The Greatest Regex Trick Ever,

Clipped from The Best Regex Trick at 2022-04-16.

琢磨和折腾

文章艰涩难读…好在有代码直接进入py示例代码:

基础代码

import re
subject = 'Jane"" ""Tarzan12"" Tarzan11@Tarzan22 {4 Tarzan34}'
regex = re.compile(r'{[^}]+}|"Tarzan\d+"|(Tarzan\d+)')
# put Group 1 captures in a list译
matches = [group for group in re.findall(regex, subject) if group]

💡: 意思是说从subject里面查找Tanzan+数字(如:Tarzan11和Tarzan22), 但是不包括带引号的和结尾为"}"的Tanzan+数字

【译】将第 1 组捕获放在列表matches中

6大任务任务I:有匹配吗?


######## The six main tasks we're likely to have ########

# Task 1: Is there a match?
print("*** Is there a Match? ***")
if len(matches)>0:
	print ("Yes")
else:
	print ("No")

任务II: 有几个匹配?


# Task 2: How many matches are there?
print("\n" + "*** Number of Matches ***")
print(len(matches))

任务III:第一个匹配?

# Task 3: What is the first match?
print("\n" + "*** First Match ***")
if len(matches)>0:
	print (matches[0])

任务IV:所有的匹配?

# Task 4: What are all the matches?
print("\n" + "*** Matches ***")
if len(matches)>0:
	for match in matches:
	    print (match)

任务V:替换

# Task 5: Replace the matches
def myreplacement(m):
    if m.group(1):
        return "Superman"
    else:
        return m.group(0)
replaced = regex.sub(myreplacement, subject)
print("\n" + "*** Replacements ***")
print(replaced)

任务VI:分词

# Task 6: Split
# Start by replacing by something distinctive,
# as in Step 5. Then split.
splits = replaced.split('Superman')
print("\n" + "*** Splits ***")
for split in splits:
	    print (split)

结果

*** Is there a Match? ***
Yes

*** Number of Matches ***
2

*** First Match ***
Tarzan11

*** Matches ***
Tarzan11
Tarzan22

*** Replacements ***
Jane"" ""Tarzan12"" Superman@Superman {4 Tarzan34}

*** Splits ***
Jane"" ""Tarzan12"" 
@
 {4 Tarzan34}

理解:

大概是这个意思: regex = re.compile(r’“不匹配”|(匹配)’)

💡: re.compile(‘不带括号的排除|(带括号的匹配)’)
注意⚠️: 其实是利用了正则的一个bug, 可用于编程语言,但是在文本编辑器如:EditPad Pro 或 Notepad++等的查找框里不起作用.

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/716250.html