Python内置了一个单一的数据结构,可以满足您的所有需求,但是要实现目标并相当有效地使用它,可以很容易地结合使用它们。
例如,假设您输入的是逗号分隔值文件中的以下数据,该文件的
employees.csv名称定义如第一行所示:
name,age,weight,heightBob Barker,25,175,6ft 2inTed Kingston,28,163,5ft 10inMary Manson,27,140,5ft 6inSue Sommers,27,132,5ft 8inAlice Toklas,24,124,5ft 6in
下面的工作代码说明了如何将这些数据读取和存储到记录列表中,并自动创建单独的查找表以查找与每个记录中的字段所包含的值相关联的记录。
记录是创建类的实例,通过
namedtuple它可以非常有效地利用内存,因为每个记录都缺少
__dict__类实例通常包含的属性。使用它们将可以使用点语法(例如)按名称访问每个字段
record.fieldname。
查找表是
defaultdict(list)实例,它们平均提供类似于字典的 O
(1)查找时间,并且还允许将多个值与每个值相关联。因此,查找关键字是要查找的字段值的值,并且与之相关的数据将是具有该值
Person的
employees列表中存储的记录的整数索引的列表,因此它们都将相对较小。
请注意,该类的代码完全是数据驱动的,因为它不包含任何硬编码的字段名,这些字段名全部在读取时从csv数据输入文件的第一行获取。当然,在使用实例时,所有
retrieve()方法调用必须提供有效的字段名称。
更新资料
修改为在首次读取数据文件时不为每个字段的每个唯一值创建查找表。现在,
retrieve()方法“懒惰”仅在需要时创建它们(并保存/缓存结果以备将来使用)。也已修改为可在Python
2.7+(包括3.x)中使用。
from collections import defaultdict, namedtupleimport csvclass Database(object): def __init__(self, csv_filename, recordname): # Read data from csv format file into a list of namedtuples. with open(csv_filename, 'r') as inputfile: csv_reader = csv.reader(inputfile, delimiter=',') self.fields = next(csv_reader) # Read header row. self.Record = namedtuple(recordname, self.fields) self.records = [self.Record(*row) for row in csv_reader] self.valid_fieldnames = set(self.fields) # Create an empty table of lookup tables for each field name that maps # each unique field value to a list of record-list indices of the ones # that contain it. self.lookup_tables = {} def retrieve(self, **kwargs): """ Fetch a list of records with a field name with the value supplied as a keyword arg (or return None if there aren't any). """ if len(kwargs) != 1: raise ValueError( 'Exactly one fieldname keyword argument required for retrieve function ' '(%s specified)' % ', '.join([repr(k) for k in kwargs.keys()])) field, value = kwargs.popitem() # Keyword arg's name and value. if field not in self.valid_fieldnames: raise ValueError('keyword arg "%s" isn't a valid field name' % field) if field not in self.lookup_tables: # Need to create a lookup table? lookup_table = self.lookup_tables[field] = defaultdict(list) for index, record in enumerate(self.records): field_value = getattr(record, field) lookup_table[field_value].append(index) # Return (possibly empty) sequence of matching records. return tuple(self.records[index] for index in self.lookup_tables[field].get(value, []))if __name__ == '__main__': empdb = Database('employees.csv', 'Person') print("retrieve(name='Ted Kingston'): {}".format(empdb.retrieve(name='Ted Kingston'))) print("retrieve(age='27'): {}".format(empdb.retrieve(age='27'))) print("retrieve(weight='150'): {}".format(empdb.retrieve(weight='150'))) try: print("retrieve(hight='5ft 6in'):".format(empdb.retrieve(hight='5ft 6in'))) except ValueError as e: print("ValueError('{}') raised as expected".format(e)) else: raise type('NoExceptionError', (Exception,), {})( 'No exception raised from "retrieve(hight='5ft')" call.')
输出:
retrieve(name='Ted Kingston'): [Person(name='Ted Kingston', age='28', weight='163', height='5ft 10in')]retrieve(age='27'): [Person(name='Mary Manson', age='27', weight='140', height='5ft 6in'), Person(name='Sue Sommers', age='27', weight='132', height='5ft 8in')]retrieve(weight='150'): Noneretrieve(hight='5ft 6in'): ValueError('keyword arg "hight" is an invalid fieldname') raised as expected
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)