正则表达式用于Python中的重音不敏感替换_随笔

正则表达式用于Python中的重音不敏感替换

unidepre

经常被提及用于删除Python中的重音符号，但它的作用还不止于此：它将转换

'°'

为

'deg'

，这可能不是所需的输出。

unipredata

似乎具有消除口音的功能。

有任何图案

此方法应适用于任何模式和任何文本。

您可以从文本和正则表达式模式中暂时删除重音。来自

re.finditer()

（开始和结束索引）的匹配信息可用于修改原始的带重音符号的文本。

请注意，必须颠倒匹配项才能不修改以下索引。

import reimport unipredataoriginal_text = "I'm drinking a 80° café in a cafe with Chloë, François Déporte and Francois Deporte."accented_pattern = r'a café|François Déporte'def remove_accents(s):    return ''.join((c for c in unipredata.normalize('NFD', s) if unipredata.category(c) != 'Mn'))print(remove_accents('äöüßéèiìììíàáç'))# aoußeeiiiiiaacpattern = re.compile(remove_accents(accented_pattern))modified_text = original_textmatches = list(re.finditer(pattern, remove_accents(original_text)))for match in matches[::-1]:    modified_text = modified_text[:match.start()] + 'X' + modified_text[match.end():]print(modified_text)# I'm drinking a 80° café in X with Chloë, X and X.

如果模式是一个单词或一组单词

你可以：

从模式词中删除重音并将其保存在一组中以便快速查找
寻找与您的文字中的每个单词
```
w+
```
从单词中删除重音：
- 如果匹配，则替换为
```
X
```
- 如果不匹配，则保持原样

import refrom unidepre import unidepreoriginal_text = "I'm drinking a café in a cafe with Chloë."def remove_accents(string):    return unidepre(string)accented_words = ['café', 'français']words_to_remove = set(remove_accents(word) for word in accented_words)def remove_words(matchobj):    word = matchobj.group(0)    if remove_accents(word) in words_to_remove:        return 'X'    else:        return wordprint(re.sub('w+', remove_words, original_text))# I'm drinking a X in a X with Chloë.

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5663651.html

正则表达式用于Python中的重音不敏感替换

发表评论

评论列表（0条）