The Best Regex Trick
==========
The Greatest Regex Trick Ever,
Clipped from The Best Regex Trick at 2022-04-16.
琢磨和折腾
文章艰涩难读…好在有代码直接进入py示例代码:
基础代码import re
subject = 'Jane"" ""Tarzan12"" Tarzan11@Tarzan22 {4 Tarzan34}'
regex = re.compile(r'{[^}]+}|"Tarzan\d+"|(Tarzan\d+)')
# put Group 1 captures in a list译
matches = [group for group in re.findall(regex, subject) if group]
6大任务 任务I:有匹配吗?
- 💡: 意思是说从subject里面查找Tanzan+数字(如:Tarzan11和Tarzan22), 但是不包括带引号的和结尾为"}"的Tanzan+数字
【译】将第 1 组捕获放在列表matches中
######## The six main tasks we're likely to have ########
# Task 1: Is there a match?
print("*** Is there a Match? ***")
if len(matches)>0:
print ("Yes")
else:
print ("No")
任务II: 有几个匹配?
# Task 2: How many matches are there?
print("\n" + "*** Number of Matches ***")
print(len(matches))
任务III:第一个匹配?
# Task 3: What is the first match?
print("\n" + "*** First Match ***")
if len(matches)>0:
print (matches[0])
任务IV:所有的匹配?
# Task 4: What are all the matches?
print("\n" + "*** Matches ***")
if len(matches)>0:
for match in matches:
print (match)
任务V:替换
# Task 5: Replace the matches
def myreplacement(m):
if m.group(1):
return "Superman"
else:
return m.group(0)
replaced = regex.sub(myreplacement, subject)
print("\n" + "*** Replacements ***")
print(replaced)
任务VI:分词
# Task 6: Split
# Start by replacing by something distinctive,
# as in Step 5. Then split.
splits = replaced.split('Superman')
print("\n" + "*** Splits ***")
for split in splits:
print (split)
结果
*** Is there a Match? ***
Yes
*** Number of Matches ***
2
*** First Match ***
Tarzan11
*** Matches ***
Tarzan11
Tarzan22
*** Replacements ***
Jane"" ""Tarzan12"" Superman@Superman {4 Tarzan34}
*** Splits ***
Jane"" ""Tarzan12""
@
{4 Tarzan34}
理解:
大概是这个意思: regex = re.compile(r’“不匹配”|(匹配)’)
- 💡: re.compile(‘不带括号的排除|(带括号的匹配)’)
- 注意⚠️: 其实是利用了正则的一个bug, 可用于编程语言,但是在文本编辑器如:EditPad Pro 或 Notepad++等的查找框里不起作用.
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)