如果需要保留大小写,可以改用字典。大小写折叠键,然后将值提取到集合中:
set({v.casefold(): v for v in l}.values())
该
str.casefold()方法使用Unipre大小写折叠规则(pdf)来规范化字符串,以进行不区分大小写的比较。这对于非ASCII字母和带连字的文本尤其重要。例如,德国
ß尖锐的S,将其标准化为long
ss,或者从相同的语言标准化为
slong:
>>> print(s := 'Waſſerſchloß', s.lower(), s.casefold(), sep=" - ")Waſſerſchloß - waſſerſchloß - wasserschloss
您可以将其封装到一个类中。
如果您不关心保留大小写,只需使用set理解即可:
{v.casefold() for v in l}
注意,Python 2没有这种方法,请
str.lower()在这种情况下使用。
演示:
>>> l = ['#Trending', '#Trending', '#TrendinG', '#Yax', '#YAX', '#Yax']>>> set({v.casefold(): v for v in l}.values()){'#Yax', '#TrendinG'}>>> {v.lower() for v in l}{'#trending', '#yax'}
将第一种方法包装到类中将如下所示:
try: # Python 3 from collections.abc import MutableSetexcept importError: # Python 2 from collections import MutableSetclass CasePreservingSet(MutableSet): """String set that preserves case but tests for containment by case-folded value E.g. 'Foo' in CasePreservingSet(['FOO']) is True. Preserves case of *last* inserted variant. """ def __init__(self, *args): self._values = {} if len(args) > 1: raise TypeError( f"{type(self).__name__} expected at most 1 argument, " f"got {len(args)}" ) values = args[0] if args else () try: self._fold = str.casefold # Python 3 except AttributeError: self._fold = str.lower # Python 2 for v in values: self.add(v) def __repr__(self): return '<{}{} at {:x}>'.format( type(self).__name__, tuple(self._values.values()), id(self)) def __contains__(self, value): return self._fold(value) in self._values def __iter__(self): try: # Python 2 return self._values.itervalues() except AttributeError: # Python 3 return iter(self._values.values()) def __len__(self): return len(self._values) def add(self, value): self._values[self._fold(value)] = value def discard(self, value): try: del self._values[self._fold(value)] except KeyError: pass
用法演示:
>>> cps = CasePreservingSet(l)>>> cps<CasePreservingSet('#TrendinG', '#Yax') at 1047ba290>>>> '#treNdinG' in cpsTrue
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)