您可以遍历工作表:
def UseOpenpyxl(file_name): wb = openpyxl.load_workbook(file_name, read_only=True) sheet = wb.active rows = sheet.rows first_row = [cell.value for cell in next(rows)] data = [] for row in rows: record = {} for key, cell in zip(first_row, row): if cell.data_type == 's': record[key] = cell.value.strip() else: record[key] = cell.value data.append(record) return data
这应该扩展到大文件。如果列表
data太大,则可能需要对结果进行分块 。
现在,openpyxl版本的时间大约是xlrd版本的两倍:
%timeit xlrd_results = UseXlrd('foo.xlsx')1 loops, best of 3: 3.38 s per loop%timeit openpyxl_results = UseOpenpyxl('foo.xlsx')1 loops, best of 3: 6.87 s per loop
请注意,xlrd和openpyxl可能会稍微不同地解释什么是整数和什么是浮点数。对于我的测试数据,我需要添加
float()以使输出具有可比性:
def UseOpenpyxl(file_name): wb = openpyxl.load_workbook(file_name, read_only=True) sheet = wb.active rows = sheet.rows first_row = [float(cell.value) for cell in next(rows)] data = [] for row in rows: record = {} for key, cell in zip(first_row, row): if cell.data_type == 's': record[key] = cell.value.strip() else: record[key] = float(cell.value) data.append(record) return data
现在,这两个版本为我的测试数据提供了相同的结果:
>>> xlrd_results == openpyxl_resultsTrue
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)