然而,在对比了两种方式生成的tfrecord占用的存储空间。 大约13G的图片数据,会生成54G大小的tfrecord, 所以不推荐使用tobytes 进行存储。
读取时枯含可以使用tf.image.decode_jpeg 进行解码读取
此处必须加上image_fg = tf.image.convert_image_dtype(image_fg, dtype=tf.float32), 否则,resize时会因为图片数据类局大型不桐败竖是float32而出现错误。
tf运行时提示:
2019-06-01更新,滑敬手没想到随便贴个日志那么多人看,就详细更新下新接信嫌口用法:
之前读tfrecords文件时不是会用tensorflow的队例来去读嘛,sess.run完最稿差后还要close这个queue, 稍微麻烦了点。官方更新的接口是用 tf.data.TFRecordDataset直接读出数据集dataset,用dataset生成iterator就得到要run的tensor了。详细见下面:
可以参考我改写版本的 BERT模型
The tf.data API supports a variety of file formats so that you can process large datasets that do not fit in memory. For example, the TFRecord file format is a simple record-oriented binary format that many TensorFlow applications use for training data. The tf.data.TFRecordDataset class enables you to stream over the contents of one or more TFRecord files as part of an input pipeline.
The filenames argument to the TFRecordDataset initializer can either be a string, a list of strings, or a tf.Tensor of strings. Therefore if you have two sets of files for training and validation purposes, you can create a factory method that produces the dataset, taking filenames as an input argument:
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)