我同意@aix,
multiprocessing绝对是要走的路。无论您将如何进行I /
O绑定,无论您正在运行多少个并行进程,您都只能读得这么快。但是,很容易被 一些 加速。
考虑以下内容(input /是一个包含来自Gutenberg项目的.txt文件的目录)。
import os.pathfrom multiprocessing import Poolimport sysimport timedef process_file(name): ''' Process one file: count number of lines and words ''' linecount=0 wordcount=0 with open(name, 'r') as inp: for line in inp: linecount+=1 wordcount+=len(line.split(' ')) return name, linecount, wordcountdef process_files_parallel(arg, dirname, names): ''' Process each file in parallel via Poll.map() ''' pool=Pool() results=pool.map(process_file, [os.path.join(dirname, name) for name in names])def process_files(arg, dirname, names): ''' Process each file in via map() ''' results=map(process_file, [os.path.join(dirname, name) for name in names])if __name__ == '__main__': start=time.time() os.path.walk('input/', process_files, None) print "process_files()", time.time()-start start=time.time() os.path.walk('input/', process_files_parallel, None) print "process_files_parallel()", time.time()-start
当我在双核计算机上运行此程序时,速度明显提高(但不是2倍):
$ python process_files.pyprocess_files() 1.71218085289process_files_parallel() 1.28905105591
如果文件足够小以适合内存,并且您需要完成很多不受I / O约束的处理,那么您应该会看到更好的改进。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)