首先看下我的文件:
这里train.csv为我们的训练文件目录,包含image和labels两列内容。
train_images存放的是全部的训练图像数据,train存放我们即将生成的分类文件夹,目前为空。
代码如下:其中label_path是训练csv文件,o_pth为文件目录,new_pth为想要存放的分类文件夹目录。
import os
import time
import shutil
import pandas as pd
label_path = 'D:\data_test\plant-pathology-2021-fgvc8/train.csv'
labels = pd.read_csv(label_path)
# move each image to the specified-class dir
since = time.time()
o_pth='D:\data_test\plant-pathology-2021-fgvc8/'
data_dir = os.path.join(o_pth, 'train_images')
new_pth='D:\data_test\plant-pathology-2021-fgvc8/train'
for root, dirs, files in os.walk(data_dir):
for file in files:
image_name = file # sometimes, it needs to be split: file.split('.')[0]
# get the class the image belongs to
#######这里根据自己的train.csv文件自己修改下哈,
#我这里csv中有两列,一个是image,一个是labels,左边为图像名,右边为类别。
label = labels[labels['image'] == image_name]['labels'].values.item() # int type
out_dir = os.path.join(new_pth, str(label)) # Note: int to str for 'label'
if not os.path.exists(out_dir):
os.makedirs(out_dir)
to_path = os.path.join(out_dir, file)
from_path = os.path.join(data_dir, file)
shutil.copy(from_path, to_path) # shutil.move 万一卡住比较难办
time_taken = time.time() - since
print('Time taken: {:.0f}m {:.0f}s'.format(time_taken // 60, time_taken % 60))
# Time taken: 3m 0s
参考了下该博主,然后自己修改了下
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)