狗看了都流泪的Mask-RCNN_python

准备数据集：

标注：

将json转换成dataset文件:

将labelme_json中所有的文件归类：

mask掩码归类：

maskRCNN：

配置要求：

将 ipynb 转 py，查看官方模型效果：

训练过程的参数设置:

训练与预测：train_and_predict.py：

trian函数的修改内容：

启动训练：

predict函数修改内容：

有趣的小故事：

引言：知识并不是我们鄙视他人的工具，你我都是站在巨人的肩膀上才知道1+1=2

准备数据集：

将自己的数据集准备好，新建一个train文件,train文件中包含内容，cv2_mask,json,labelme_json,pic

cv2_mask:转json包后的标注数据的掩码分类

json：标注后生成的字典形式json

labelme_json:转json包后的文件

pic:原图存放位置

标注：

github下载labelme-main

地址：labelme-main

调通后点击运行__main__.py文件进入标注页面进行标注

标注后生成XXXXX.json，将结果保存到json文件下

将json转换成dataset文件:

附上代码：

import argparse
import base64
import json
import os
import os.path as osp

import imgviz
import PIL.Image
import yaml
from labelme.logger import logger
from labelme import utils


def main():
    list_path = os.listdir(r"E:\Mask_RCNN-master\wifi_coco\train\json")
    for i in range(0, len(list_path)):
        logger.warning(
            "This script is aimed to demonstrate how to convert the "
            "JSON file to a single image dataset."
        )
        logger.warning(
            "It won't handle multiple JSON files to generate a "
            "real-use dataset."
        )
        parser = argparse.ArgumentParser()
        parser.add_argument("json_file")
        parser.add_argument("-o", "--out", default=None)
        args = parser.parse_args()

        json_file = r"E:\Mask_RCNN-master\wifi_coco\train\json/" + list_path[i]  # 添加代码片段
        print(json_file)
        if args.out is None:
            out_dir = osp.basename(json_file).replace(".", "_")  # 返回文件名
            out_dir = osp.join(osp.dirname(json_file), out_dir)  # 把目录和文件名合成一个路径
        else:
            out_dir = args.out
        if not osp.exists(out_dir):
            os.mkdir(out_dir)

        data = json.load(open(json_file))
        imageData = data.get("imageData")

        if not imageData:
            imagePath = os.path.join(os.path.dirname(json_file), data["imagePath"])
            with open(imagePath, "rb") as f:
                imageData = f.read()
                imageData = base64.b64encode(imageData).decode("utf-8")
        img = utils.img_b64_to_arr(imageData)

        label_name_to_value = {"_background_": 0}
        for shape in sorted(data["shapes"], key=lambda x: x["label"]):
            label_name = shape["label"]
            if label_name in label_name_to_value:
                label_value = label_name_to_value[label_name]
            else:
                label_value = len(label_name_to_value)
                label_name_to_value[label_name] = label_value
        lbl, _ = utils.shapes_to_label(
            img.shape, data["shapes"], label_name_to_value
        )

        label_names = [None] * (max(label_name_to_value.values()) + 1)
        for name, value in label_name_to_value.items():
            label_names[value] = name

        lbl_viz = imgviz.label2rgb(
            lbl, imgviz.asgray(img), label_names=label_names, loc="rb"
        )

        # 3.20.0版labelme
        PIL.Image.fromarray(img).save(osp.join(out_dir, 'img.png'))
        utils.lblsave(osp.join(out_dir, 'label.png'), lbl)
        PIL.Image.fromarray(lbl_viz).save(osp.join(out_dir, 'label_viz.png'))

        with open(osp.join(out_dir, 'label_names.txt'), 'w') as f:
            for lbl_name in label_names:
                f.write(lbl_name + '\n')
        # 缺少的部分
        logger.warning('info.yaml is being replaced by label_names.txt')
        info = dict(label_names=label_names)
        with open(osp.join(out_dir, 'info.yaml'), 'w') as f:
            yaml.safe_dump(info, f, default_flow_style=False)

        logger.info('Saved to: {}'.format(out_dir))
if __name__ == "__main__":
    main()

方法1：命令行输输入：python json_to_dataset.py json所在文件路径 -o 存放位置

方法2：新建test.txt,将@echo off
for %%i in (*.json) do labelme_json_to_dataset "%%i"
pause放入txt中，后缀名改为bat,到json所在文件下双击运行最后将得到的结果放入labelme_json文件夹中

将labelme_json中所有的文件归类： mask掩码归类：

#! /usr/bin/env python
# coding=utf-8
import os
import shutil
import time
import sys
import importlib

# =======================================================================================================================
fpath_input = "E:\Mask_RCNN-master\wifi_coco/train\labelme_json/"
fpath_output = "E:\Mask_RCNN-master\wifi_coco/train\cv2_mask/"
for file in os.listdir(fpath_input):
    for inner in os.listdir(fpath_input + file + '/'):
        # print("aaaaaaaaaaaaaaaa",inner)
        if os.path.splitext(inner)[0] == "label":
            former = os.path.join(fpath_input, file)
            # print("bbbbbbbbbbbbbbbbbbb",former)
            oldname = os.path.join(former, inner)
            # print("ccccccccccccccccccc",oldname)
            for abc in range(1,129):
                newname_1 = os.path.join(fpath_output,file.split('_')[0]+"_"+str(abc) + ".png")
                # print("ddddddddddddddddddd",newname_1)
                shutil.copyfile(oldname, newname_1)

注意：cv2_mask,json,labelme_json,pic中的文件前缀名都应相同

maskRCNN：配置要求：

cuda10.0，cundd 7.6.5.32 python：3.6 TensorFlow-gpu==1.13.1 keras==2.24

下载地址:Mask-RCNN

将 ipynb 转 py，查看官方模型效果：

import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt
import cv2
import time
# Root directory of the project
ROOT_DIR = os.path.abspath(r"E:\Mask_RCNN-master/")

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import COCO config
sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))  # To find local version
import coco


# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")

# Local path to trained weights file
# COCO_MODEL_PATH = os.path.join(MODEL_DIR, "logs/mask_rcnn_coco.h5")
COCO_MODEL_PATH = os.path.join(r"E:\Mask_RCNN-master\logs\mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)
    print("cuiwei***********************")

# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR, "images")

class InferenceConfig(coco.CocoConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

config = InferenceConfig()
config.display()


# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)

# COCO Class names
# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
               'bus', 'train', 'truck', 'boat', 'traffic light',
               'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
               'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
               'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
               'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
               'kite', 'baseball bat', 'baseball glove', 'skateboard',
               'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
               'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
               'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
               'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
               'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
               'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
               'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
               'teddy bear', 'hair drier', 'toothbrush']
# Load a random image from the images folder
#file_names = next(os.walk(IMAGE_DIR))[2]
#image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))

# 调用视频
# cap = cv2.VideoCapture(0)
#
# while(1):
#     # get a frame
#     ret, frame = cap.read()
#     # show a frame
#     start =time.clock()
#     results = model.detect([frame], verbose=1)
#     r = results[0]
#     #cv2.imshow("capture", frame)
#     visualize.display_instances(frame, r['rois'], r['masks'], r['class_ids'],
#                             class_names, r['scores'])
#     end = time.clock()
#     print(end-start)
#     if cv2.waitKey(1) & 0xFF == ord('q'):
#         break
#
# cap.release()
# cv2.destroyAllWindows()

image= cv2.imread(r"E:\Mask_RCNN-master\images691390_f9944f61b5_z.jpg")
# Run detection
start =time.clock()
results = model.detect([image], verbose=1)
end = time.clock()

print(end-start)
# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],
                           class_names, r['scores'])

mask-rcnn-coco.h5模型权重下载地址:Releases · matterport/Mask_RCNN · GitHub

查看效果

这里使用GPU加速，处理一张图片的时间需要4秒！因为是双阶段模型，相对于单阶段的yolo来说，比较耗时间。这里的网络结构是resnet101，后续我会用我的理解出几期网络结构的博客。

训练过程的参数设置:

关于训练过程的参数设置，可在config.py文件中修改，根据自己的要求啦~官方也给出了修改建议：官方教程

可修改的主要由：

backbone：主干网络 resnet50，resnet101等，resnet是迁移学习调用的模型，如果电脑性能不好，建议选择resnet50，网络参数小，训练速度快。

model.train(...,layers="heads",...)

model.train(…, layers=‘3+’, …) # Train resnet stage 3 and up

model.train(…, layers=‘4+’, …) # Train resnet stage 4 and up

model.train(…, layers=‘all’, …) # Train all layers (most memory)#这里是选择训练的层数，根据自己的要求选择

IMAGE_MIN_DIM = 800

IMAGE_MAX_DIM = 1024#设置训练时的图像大小，最终以IMAGE_MAX_DIM为准，如果电脑性能不是太好，建议调小

GPU_COUNT = 1

IMAGES_PER_GPU = 2#这个是对GPU的设置，如果显存不够，建议把2调成1（虽然batch_size为1并不利于收敛)

TRAIN_ROIS_PER_IMAGE = 200;可根据自己数据集的真实情况来设定

MAX_GT_INSTANCES = 100；设置图像中最多可检测出来的物体数量

训练的模型会保存在logs文件夹下，.h5格式，训练好后直接调用即

训练与预测：train_and_predict.py：

import os
import sys
import random
import math
import re
import time
import numpy as np
import cv2
import matplotlib
import matplotlib.pyplot as plt

import yaml
from PIL import Image

from mrcnn.config import Config
from mrcnn import utils
from mrcnn import model as modellib

MODEL_DIR = r"E:\Mask_RCNN-master\logs"

iter_num = 0

# Local path to trained weights file
COCO_MODEL_PATH = r"E:\Mask_RCNN-master\weight\mask_rcnn_coco.h5"
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)


class ShapesConfig(Config):
    """Configuration for training on the toy shapes dataset.
    Derives from the base Config class and overrides values specific
    to the toy shapes dataset.
    """
    # Give the configuration a recognizable name
    NAME = "shapes"

    # Train on 1 GPU and 8 images per GPU. We can put multiple images on each
    # GPU because the images are small. Batch size is 8 (GPUs * images/GPU).
    GPU_COUNT = 1
    IMAGES_PER_GPU = 4

    # Number of classes (including background)
    NUM_CLASSES = 1 + 1  # background + 1 class

    # Use small images for faster training. Set the limits of the small side
    # the large side, and that determines the image shape.
    IMAGE_MIN_DIM = 256
    IMAGE_MAX_DIM = 256

    # Use smaller anchors because our image and objects are small
    RPN_ANCHOR_SCALES = (8 * 6, 16 * 6, 32 * 6, 64 * 6, 128 * 6)  # anchor side in pixels

    # Reduce training ROIs per image because the images are small and have
    # few objects. Aim to allow ROI sampling to pick 33% positive ROIs.
    TRAIN_ROIS_PER_IMAGE = 32

    # Use a small epoch since the data is simple
    STEPS_PER_EPOCH = 100

    # use small validation steps since the epoch is small
    VALIDATION_STEPS = 5


class DrugDataset(utils.Dataset):
    # 得到该图中有多少个实例（物体）
    def get_obj_index(self, image):
        n = np.max(image)
        return n

    # 解析labelme中得到的yaml文件，从而得到mask每一层对应的实例标签
    def from_yaml_get_class(self, image_id):
        info = self.image_info[image_id]
        with open(info['yaml_path']) as f:
            # temp = yaml.load(f.read())
            temp = yaml.safe_load(f.read())
            labels = temp['label_names']
            del labels[0]
        return labels

    # 重新写draw_mask
    def draw_mask(self, num_obj, mask, image, image_id):
        # print("draw_mask-->",image_id)
        # print("self.image_info",self.image_info)
        info = self.image_info[image_id]
        # print("info-->",info)
        # print("info[width]----->",info['width'],"-info[height]--->",info['height'])
        for index in range(num_obj):
            for i in range(info['width']):
                for j in range(info['height']):
                    # print("image_id-->",image_id,"-i--->",i,"-j--->",j)
                    # print("info[width]----->",info['width'],"-info[height]--->",info['height'])
                    at_pixel = image.getpixel((i, j))
                    if at_pixel == index + 1:
                        mask[j, i, index] = 1
        return mask

    # 重新写load_shapes，里面包含自己的自己的类别
    # 并在self.image_info信息中添加了path、mask_path 、yaml_path
    # yaml_pathdataset_root_path = "/tongue_dateset/"
    # img_floder = dataset_root_path + "rgb"
    # mask_floder = dataset_root_path + "mask"
    # dataset_root_path = "/tongue_dateset/"
    def load_shapes(self, count, img_floder, mask_floder, imglist, dataset_root_path):
        """Generate the requested number of synthetic images.
        count: number of images to generate.
        height, width: the size of the generated images.
        """
        # Add classes
        self.add_class("shapes", 1, "element")  # 黑色素瘤
        for i in range(count):
            # 获取图片宽和高

            filestr = imglist[i].split(".")[0]
            # print(imglist[i],"-->",cv_img.shape[1],"--->",cv_img.shape[0])
            # print("id-->", i, " imglist[", i, "]-->", imglist[i],"filestr-->",filestr)
            # filestr = filestr.split("_")[1]
            mask_path = mask_floder + "/" + filestr + ".png"
            yaml_path = dataset_root_path + "/labelme_json/" + filestr + "_json/info.yaml"
            print(dataset_root_path + "/labelme_json/" + filestr + "_json/img.png")
            cv_img = cv2.imread(dataset_root_path + "/labelme_json/" + filestr + "_json/img.png")

            self.add_image("shapes", image_id=i, path=img_floder + "/" + imglist[i],
                           width=cv_img.shape[1], height=cv_img.shape[0], mask_path=mask_path, yaml_path=yaml_path)

    # 重写load_mask
    def load_mask(self, image_id):
        """Generate instance masks for shapes of the given image ID.
        """
        global iter_num
        print("image_id", image_id)
        info = self.image_info[image_id]
        count = 1  # number of object
        img = Image.open(info['mask_path'])
        num_obj = self.get_obj_index(img)
        mask = np.zeros([info['height'], info['width'], num_obj], dtype=np.uint8)
        mask = self.draw_mask(num_obj, mask, img, image_id)
        occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)
        for i in range(count - 2, -1, -1):
            mask[:, :, i] = mask[:, :, i] * occlusion

            occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
        labels = []
        labels = self.from_yaml_get_class(image_id)
        labels_form = []
        for i in range(len(labels)):
            if labels[i].find("element") != -1:
                # print "box"
                labels_form.append("element")
        class_ids = np.array([self.class_names.index(s) for s in labels_form])
        return mask, class_ids.astype(np.int32)


def get_ax(rows=1, cols=1, size=8):
    """Return a Matplotlib Axes array to be used in
    all visualizations in the notebook. Provide a
    central point to control graph sizes.

    Change the default size attribute to control the size
    of rendered images
    """
    _, ax = plt.subplots(rows, cols, figsize=(size * cols, size * rows))
    return ax


def train_model():
    # 基础设置
    dataset_root_path = r"E:\Mask_RCNN-master\wifi_coco\train"
    img_floder = os.path.join(dataset_root_path, "pic")
    mask_floder = os.path.join(dataset_root_path, "cv2_mask")
    # yaml_floder = dataset_root_path
    imglist = os.listdir(img_floder)
    count = len(imglist)

    # train与val数据集准备
    dataset_train = DrugDataset()
    dataset_train.load_shapes(count, img_floder, mask_floder, imglist, dataset_root_path)
    dataset_train.prepare()

    # print("dataset_train-->",dataset_train._image_ids)

    dataset_val = DrugDataset()
    dataset_val.load_shapes(7, img_floder, mask_floder, imglist, dataset_root_path)
    dataset_val.prepare()

    # Create models in training mode
    config = ShapesConfig()
    config.display()
    model = modellib.MaskRCNN(mode="training",
                              config=config,
                              model_dir=MODEL_DIR)

    # Which weights to start with?
    # 第一次训练时，这里填coco，在产生训练后的模型后，改成last
    init_with = "last"  # imagenet, coco, or last

    if init_with == "imagenet":
        model.load_weights(model.get_imagenet_weights(), by_name=True)
    elif init_with == "coco":
        # Load weights trained on MS COCO, but skip layers that
        # are different due to the different number of classes
        # See README for instructions to download the COCO weights
        model.load_weights(COCO_MODEL_PATH, by_name=True,
                           exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",
                                    "mrcnn_bbox", "mrcnn_mask"])
    # elif init_with == "last":
    #     # Load the last models you trained and continue training
    #     checkpoint_file = model.find_last()
    #     model.load_weights(checkpoint_file, by_name=True)

    # Train the head branches
    # Passing layers="heads" freezes all layers except the head
    # layers. You can also pass a regular expression to select
    # which layers to train by name pattern.
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=10,
                layers='heads')

    # Fine tune all layers
    # Passing layers="all" trains all layers. You can also
    # pass a regular expression to select which layers to
    # train by name pattern.
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE / 10,
                epochs=30,
                layers="all")


class TongueConfig(ShapesConfig):
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1


def predict():
    import skimage.io
    from mrcnn import visualize

    # Create models in training mode
    config = TongueConfig()
    config.display()
    model = modellib.MaskRCNN(mode="inference", config=config,
                              model_dir="E:\Mask_RCNN-master\logs")
    model_path = model.find_last()

    # Load trained weights (fill in path to trained weights here)
    assert model_path != "", "Provide path to trained weights"
    print("Loading weights from ", model_path)
    model.load_weights(model_path, by_name=True)

    class_names = ['BG', 'element']

    # Load a random image from the images folder
    file_names = r'E:\Mask_RCNN-master\aaa\Pic_2022_04_11_100850_1.jpg'
    # next(os.walk(file_names))[2]
    # image = skimage.io.imread(os.path.join(IMAGE_DIR, random.choice(file_names)))
    # image = skimage.io.imread(file_names)
    image = cv2.imread(file_names)

    # Run detection
    results = model.detect([image], verbose=1)

    # Visualize results
    r = results[0]
    visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], class_names, r['scores'])


if __name__ == "__main__":
    train_model()
    # predict()

这里将train与predict两个函数写到同一个py文件下，方便后续统一修改

trian函数的修改内容：

NUM_CLASSES：1+（类别数）

在label_names.txt中我们可以看到

一个是background背景，一个是自己的标签，所以这个1位背景

IMAGE_MIN_DIM 与 IMAGE_MAX_DIM 官方推荐上面说道以还是以IMAGE_MAX_DIM为准，但必须是32的倍数

Load_shapes函数： self.add_class(“shapes”,1,”类名”)，self.add_class(“shapes”,2,”类名2”),…………依次类推

Load_mask函数第二个for循环语句：

# if labels[i].find("Blazer") != -1:
#     # print "box"
#     labels_form.append("Blazer")
# elif labels[i].find("Blouse") != -1:
#     # print "column"
#     labels_form.append("Blouse")
# elif labels[i].find("Coat") != -1:
#     # print "package"
#     labels_form.append("Coat")

这个与load_shape保持一致，有几个类别就写几个

启动训练：

注意：model.train中的layers两个参数：head，all，很多博主只训练了一个head或者一个all，可能效果并不是很好，建议两个一起用上

本人使用2080独显训练20个批次，但一个批次需要40多分钟（焯！一百多张数据集一个批次大概50分钟！！！！，狗看了都流泪）

很好，在我坚持了4个批次的训练下，我放弃了，为了能继续进行下去，我们就用这个残次品0004.h5走个流程吧

predict函数修改内容：

model = modellib.MaskRCNN(mode="inference", config=config,model_dir="E:\Mask_RCNN-master\logs")

modle_dir:调整为训练权重文件所在路目录，因为代码自动默认该目录下的最后一个训练文件

否则会报错Stopliteration迭代异常

这里还有一个方法可以解决，抛出异常让它继续跑接下来的。

Class_names = [‘BG’,’类名’] 还是一样，BG为默认的背景，后续类名需要改为自己的

注意这一句：

 Image= skimage.io.imread(file_name) 改为 image = cv2.imread(file_name)

建议改成这样image = cv2.imread（）

否则可能报错：operands could not be broadcast together with remapped shapes [original->remapped]: (3,2) and requested shape(2,2)

但奇怪的一点是如果我把图片路径换成官方的图片，就没有这一报错，后面我发现了问题：image.io.imread与cv2.imread 保存后都是numpy的格式，但cv2的储存格式是BGR，而skimage的储存格式为RGB

官方解释：这是一个历史遗留问题，当年使用硬件平台的前辈们用的是BGR，然后opencv也跟着当年的习惯再走。

但还是解释不同是啥子情况，毕竟矩阵跟矩阵之间BGR和RGB只有顺序不一样，但是形状是一样的，所以有知道的小伙伴嘛？

关于BGR和RGB还有一个形象的比喻：Why does OpenCV use BGR color format ? | LearnOpenCV #

结果展示：

嗯.......4个批次还能出来效果，比预想的要好，时间原因就不再训练了

参考博文：博主：Tom Hardy 有趣的小故事：

从前有一只兔子叫白嫖，每天都喜欢看文章，但从不喜欢一键三连，终于有一天，它成了四川独有的一道美食。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/727522.html

狗看了都流泪的Mask-RCNN

发表评论

评论列表（0条）