在这篇文章中,我们将展示如何使用一个名为Mask RCNN(基于区域的卷积神经网络)的卷积神经网络模型来进行目标检测和分割。使用mask - rcnn,我们不仅检测对象,我们还获得一个灰度或二进制mask对象。
Mask rcnn最初是在2017年11月由facebook的人工智能研究团队使用Python和caffe2推出的。
我们将在c++和Python中共享OpenCV代码来加载和使用模型。
The minimum required version of OpenCV is 3.4.3.什么是图像分割?
在计算机视觉中,术语“图像分割”或简称“分割”是指根据某些标准将图像划分为若干像素组。您可以根据颜色、纹理或其他您决定的标准进行分组。这些组有时也被称为超像素。
什么是实例分割?在实例分割中,目标是检测图像中的特定对象,并在感兴趣的对象周围创建一个遮罩。实例分割也可以被认为是对象检测,其中输出是一个mask,而不仅仅是一个bounding box。语义分割试图对图像中的每个像素进行分类,而实例分割的目标并不是对图像中的每个像素进行标记。
下面我们看到一个实例分割的两只羊在一个非常相似的颜色背景
Mask-RCNN是如何工作的?Mask-RCNN是对R-CNN论文(由R. Girshick等人,CVPR 2014)的一系列改进的结果,用于对象检测。R-CNN基于选择性搜索生成区域推荐,然后对每个提议的区域逐个进行处理,使用卷积网络输出一个目标标签及其bounding box。
Fast R-CNN (R. Girshik, ICCV 2015)通过在他们的CNN中使用ROIPool层处理所有提议的区域,使R-CNN算法更快。
Faster R-CNN (S. Run等人,PAMI, 2017)通过使用一种称为区域建议网络(RPN)的ConvNet来执行区域建议步骤,将其进一步推进。RPN和分类以及bounding-box预测网络都是在common特征映射上工作的,因此推理速度更快。在GPU上,Faster R-CNN可以以5fps运行。
Mask R-CNN (He et al., ICCV 2017)是对Faster RCNN的改进,包括一个与类标签平行的Mask预测分支和边界框预测分支,如下图所示。它只给Faster R-CNN网络增加了很小的开销,因此在GPU上仍然可以以5帧每秒的速度运行。
Mask-RCNN网络有两个主要部分。
第一个是区域提议网络,每个图像生成大约300个区域提议。在训练过程中,每个提议(roi)都经过了第二部分,即目标检测和mask预测网络,如上所示。注意,由于mask预测分支与标签和框预测分支并行运行,对于每个给定的ROI,该网络预测属于所有类的mask。
推理时对区域建议进行非最大抑制,mask预测分支只处理得分最高的100个检测框。因此,对于100个roi区域和90个目标,网络的mask预测部分输出大小为100x90x15x15的4D张量,其中每个mask的大小为15×15。
基于Mask-RCNN对象检测和实例分割(c++ /Python)现在让我们看看如何使用OpenCV运行Mask-RCNN。
第一步:下载模型
下载tensorflow模型到当前的工作目录,下载完成后,我们提取模型文件frozen_inference_graph.pb得到模型的权重。
wget http://download.tensorflow.org/models/object_detection/mask_rcnn_inception_v2_coco_2018_01_28.tar.gz tar zxvf mask_rcnn_inception_v2_coco_2018_01_28.tar.gz
步骤2:初始化参数
Mask-RCNN算法产生预测的检测输出作为bounding boxes。每个bounding box都与一个置信分数相关联。置信度阈值参数以下的所有框将被忽略以进行进一步处理。
Python
# Initialize the parameters confThreshold = 0.5 #Confidence threshold maskThreshold = 0.3 # Mask threshold
C++
// Initialize the parameters float confThreshold = 0.5; // Confidence threshold float maskThreshold = 0.3; // Mask threshold
步骤3:加载模型和类
mscoco_labels.names文件包含模型为之训练的所有对象。我们读取类名。然后我们读取并加载colors.txt文件,该文件包含了所有用于mask对象的颜色。
接下来,我们使用这两个文件加载网络
frozen_inference_graph.pb :预备训练的权重。mask_rcnn_inception_v2_coco_2018_01_28.ppbtxt:由OpenCV的DNN支持组调优的文本图形文件,以便使用OpenCV加载网络。
我们在这里将DNN backend设置为OpenCV,处理器设置为CPU。您可以尝试将首选目标设置为cv.dnn。DNN_TARGET_OPENCL在GPU上运行。但请记住,当前OpenCV版本的DNN模块仅在英特尔的gpu上进行过测试。
Python
# Load names of classes classesFile = "mscoco_labels.names"; classes = None with open(classesFile, 'rt') as f: classes = f.read().rstrip('n').split('n') # Load the colors colorsFile = "colors.txt"; with open(colorsFile, 'rt') as f: colorsStr = f.read().rstrip('n').split('n') colors = [] for i in range(len(colorsStr)): rgb = colorsStr[i].split(' ') color = np.array([float(rgb[0]), float(rgb[1]), float(rgb[2])]) colors.append(color) # Give the textGraph and weight files for the model textGraph = "./mask_rcnn_inception_v2_coco_2018_01_28.pbtxt"; modelWeights = "./mask_rcnn_inception_v2_coco_2018_01_28/frozen_inference_graph.pb"; # Load the network net = cv.dnn.readNetFromTensorflow(modelWeights, textGraph); net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV) net.setPreferableTarget(cv.dnn.DNN_TARGET_CPU)
C++
// Load names of classes string classesFile = "mscoco_labels.names"; ifstream ifs(classesFile.c_str()); string line; while (getline(ifs, line)) classes.push_back(line); // Load the colors vectorcolors; string colorsFile = "colors.txt"; ifstream colorFptr(colorsFile.c_str()); while (getline(colorFptr, line)) { char* pEnd; double r, g, b; r = strtod (line.c_str(), &pEnd); g = strtod (pEnd, NULL); b = strtod (pEnd, NULL); colors.push_back(Scalar(r, g, b, 255.0)); } // Give the configuration and weight files for the model String textGraph = "./mask_rcnn_inception_v2_coco_2018_01_28.pbtxt"; String modelWeights = "./mask_rcnn_inception_v2_coco_2018_01_28/frozen_inference_graph.pb"; // Load the network Net net = readNetFromTensorflow(modelWeights, textGraph); net.setPreferableBackend(DNN_BACKEND_OPENCV); net.setPreferableTarget(DNN_TARGET_CPU);
Step 4 : Read the input
在这个步骤中,我们读取图像、视频流或网络摄像头。此外,我们保存检测到带有bounding box的帧。
Python
outputFile = "mask_rcnn_out_py.avi" if (args.image): # Open the image file if not os.path.isfile(args.image): print("Input image file ", args.image, " doesn't exist") sys.exit(1) cap = cv.VideoCapture(args.image) outputFile = args.image[:-4]+'_mask_rcnn_out_py.jpg' elif (args.video): # Open the video file if not os.path.isfile(args.video): print("Input video file ", args.video, " doesn't exist") sys.exit(1) cap = cv.VideoCapture(args.video) outputFile = args.video[:-4]+'_mask_rcnn_out_py.avi' else: # Webcam input cap = cv.VideoCapture(0) # Get the video writer initialized to save the output video if (not args.image): vid_writer = cv.VideoWriter(outputFile, cv.VideoWriter_fourcc('M','J','P','G'), 28, (round(cap.get(cv.CAP_PROP_frame_WIDTH)),round(cap.get(cv.CAP_PROP_frame_HEIGHT))))
C++
outputFile = "mask_rcnn_out_cpp.avi"; if (parser.has("image")) { // Open the image file str = parser.get("image"); ifstream ifile(str); if (!ifile) throw("error"); cap.open(str); str.replace(str.end()-4, str.end(), "_mask_rcnn_out.jpg"); outputFile = str; } else if (parser.has("video")) { // Open the video file str = parser.get ("video"); ifstream ifile(str); if (!ifile) throw("error"); cap.open(str); str.replace(str.end()-4, str.end(), "_mask_rcnn_out.avi"); outputFile = str; } // Open the webcam else cap.open(parser.get ("device")); // Get the video writer initialized to save the output video if (!parser.has("image")) { video.open(outputFile, VideoWriter::fourcc('M','J','P','G'), 28, Size(cap.get(CAP_PROP_frame_WIDTH), cap.get(CAP_PROP_frame_HEIGHT))); }
Step 4 : Process each frame
输入到神经网络的图像需要采用一种称为blob的特定格式。
从输入图像或视频流读取帧后,通过blobFromImage函数将其转换为用于神经网络的输入blob格式。在这个过程中,它以原始大小接收输入图像帧,并将swapRGB参数设置为true。
然后将blob作为输入传入网络,并运行一个前向传递,从网络中名为“detection_out_final”和“detection_masks”的输出层中获得一列预测的bounding boxes和目标mask。这些bounding boxes经过后期处理步骤,并过滤掉可信度低bounding boxes。我们将在下一节中更详细地介绍后期处理步骤。每一帧的推理时间打印在左上角。带有最终边界框和相应mask的图像然后保存到磁盘。
Python
while cv.waitKey(1) < 0: # Get frame from the video hasframe, frame = cap.read() # Stop the program if reached end of video if not hasframe: print("Done processing !!!") print("Output file is stored as ", outputFile) cv.waitKey(3000) break # Create a 4D blob from a frame. blob = cv.dnn.blobFromImage(frame, swapRB=True, crop=False) # Set the input to the network net.setInput(blob) # Run the forward pass to get output from the output layers boxes, masks = net.forward(['detection_out_final', 'detection_masks']) # Extract the bounding box and mask for each of the detected objects postprocess(boxes, masks) # Put efficiency information. t, _ = net.getPerfProfile() label = 'Mask-RCNN : Inference time: %.2f ms' % (t * 1000.0 / cv.getTickFrequency()) cv.putText(frame, label, (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0)) # Write the frame with the detection boxes if (args.image): cv.imwrite(outputFile, frame.astype(np.uint8)); else: vid_writer.write(frame.astype(np.uint8)) cv.imshow(winName, frame)
C++
// Process frames. while (waitKey(1) < 0) { // get frame from the video cap >> frame; // Stop the program if reached end of video if (frame.empty()) { cout << "Done processing !!!" << endl; cout << "Output file is stored as " << outputFile << endl; waitKey(3000); break; } // Create a 4D blob from a frame. blobFromImage(frame, blob, 1.0, Size(frame.cols, frame.rows), Scalar(), true, false); //Sets the input to the network net.setInput(blob); // Runs the forward pass to get output from the output layers std::vectoroutNames(2); outNames[0] = "detection_out_final"; outNames[1] = "detection_masks"; vector outs; net.forward(outs, outNames); // Extract the bounding box and mask for each of the detected objects postprocess(frame, outs); // Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes) vector layersTimes; double freq = getTickFrequency() / 1000; double t = net.getPerfProfile(layersTimes) / freq; string label = format("Mask-RCNN : Inference time for a frame : %.2f ms", t); putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 0)); // Write the frame with the detection boxes Mat detectedframe; frame.convertTo(detectedframe, CV_8U); if (parser.has("image")) imwrite(outputFile, detectedframe); else video.write(detectedframe); imshow(kWinName, frame); }
现在,让我们详细讨论一下上面使用的一些后处理函数。
步骤4a:对网络的输出进行后处理
网络的输出mask对象是一个四维对象,其中第一维表示帧中检测到的bounding boxed的数量,第二维表示模型中类的数量,第三维和第四维表示我们示例中的掩码形状(15×15)。
如果一个框的置信度小于给定的阈值,则丢弃该边界框,不考虑进行进一步处理。
Python
# For each frame, extract the bounding box and mask for each detected object def postprocess(boxes, masks): # Output size of masks is NxCxHxW where # N - number of detected boxes # C - number of classes (excluding background) # HxW - segmentation shape numClasses = masks.shape[1] numDetections = boxes.shape[2] frameH = frame.shape[0] frameW = frame.shape[1] for i in range(numDetections): box = boxes[0, 0, i] mask = masks[i] score = box[2] if score > confThreshold: classId = int(box[1]) # Extract the bounding box left = int(frameW * box[3]) top = int(frameH * box[4]) right = int(frameW * box[5]) bottom = int(frameH * box[6]) left = max(0, min(left, frameW - 1)) top = max(0, min(top, frameH - 1)) right = max(0, min(right, frameW - 1)) bottom = max(0, min(bottom, frameH - 1)) # Extract the mask for the object classMask = mask[classId] # Draw bounding box, colorize and show the mask on the image drawBox(frame, classId, score, left, top, right, bottom, classMask)
C++
// For each frame, extract the bounding box and mask for each detected object void postprocess(Mat& frame, const vector& outs) { Mat outDetections = outs[0]; Mat outMasks = outs[1]; // Output size of masks is NxCxHxW where // N - number of detected boxes // C - number of classes (excluding background) // HxW - segmentation shape const int numDetections = outDetections.size[2]; const int numClasses = outMasks.size[1]; outDetections = outDetections.reshape(1, outDetections.total() / 7); for (int i = 0; i < numDetections; ++i) { float score = outDetections.at (i, 2); if (score > confThreshold) { // Extract the bounding box int classId = static_cast (outDetections.at (i, 1)); int left = static_cast (frame.cols * outDetections.at (i, 3)); int top = static_cast (frame.rows * outDetections.at (i, 4)); int right = static_cast (frame.cols * outDetections.at (i, 5)); int bottom = static_cast (frame.rows * outDetections.at (i, 6)); left = max(0, min(left, frame.cols - 1)); top = max(0, min(top, frame.rows - 1)); right = max(0, min(right, frame.cols - 1)); bottom = max(0, min(bottom, frame.rows - 1)); Rect box = Rect(left, top, right - left + 1, bottom - top + 1); // Extract the mask for the object Mat objectMask(outMasks.size[2], outMasks.size[3],CV_32F, outMasks.ptr (i,classId)); // Draw bounding box, colorize and show the mask on the image drawBox(frame, classId, score, box, objectMask); } } }
步骤4c:绘制预测框
最后,我们在输入帧上绘制经过后处理步骤过滤的框,它们带有指定的类标签和置信度分数。我们也覆盖了彩色mask及他轮廓在bounding box里。在这段代码中,我们为属于同一类的所有对象使用了相同的颜色,但您也可以为不同的实例使用不同的颜色。
Python
# Draw the predicted bounding box, colorize and show the mask on the image def drawBox(frame, classId, conf, left, top, right, bottom, classMask): # Draw a bounding box. cv.rectangle(frame, (left, top), (right, bottom), (255, 178, 50), 3) # Print a label of class. label = '%.2f' % conf if classes: assert(classId < len(classes)) label = '%s:%s' % (classes[classId], label) # Display the label at the top of the bounding box labelSize, baseLine = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 0.5, 1) top = max(top, labelSize[1]) cv.rectangle(frame, (left, top - round(1.5*labelSize[1])), (left + round(1.5*labelSize[0]), top + baseLine), (255, 255, 255), cv.FILLED) cv.putText(frame, label, (left, top), cv.FONT_HERSHEY_SIMPLEX, 0.75, (0,0,0), 1) # Resize the mask, threshold, color and apply it on the image classMask = cv.resize(classMask, (right - left + 1, bottom - top + 1)) mask = (classMask > maskThreshold) roi = frame[top:bottom+1, left:right+1][mask] color = colors[classId%len(colors)] # Comment the above line and uncomment the two lines below to generate different instance colors #colorIndex = random.randint(0, len(colors)-1) #color = colors[colorIndex] frame[top:bottom+1, left:right+1][mask] = ([0.3*color[0], 0.3*color[1], 0.3*color[2]] + 0.7 * roi).astype(np.uint8) # Draw the contours on the image mask = mask.astype(np.uint8) im2, contours, hierarchy = cv.findContours(mask,cv.RETR_TREE,cv.CHAIN_APPROX_SIMPLE) cv.drawContours(frame[top:bottom+1, left:right+1], contours, -1, color, 3, cv.LINE_8, hierarchy, 100)
C++
// Draw the predicted bounding box, colorize and show the mask on the image void drawBox(Mat& frame, int classId, float conf, Rect box, Mat& objectMask) { //Draw a rectangle displaying the bounding box rectangle(frame, Point(box.x, box.y), Point(box.x+box.width, box.y+box.height), Scalar(255, 178, 50), 3); //Get the label for the class name and its confidence string label = format("%.2f", conf); if (!classes.empty()) { CV_Assert(classId < (int)classes.size()); label = classes[classId] + ":" + label; } //Display the label at the top of the bounding box int baseLine; Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine); box.y = max(box.y, labelSize.height); rectangle(frame, Point(box.x, box.y - round(1.5*labelSize.height)), Point(box.x + round(1.5*labelSize.width), box.y + baseLine), Scalar(255, 255, 255), FILLED); putText(frame, label, Point(box.x, box.y), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0,0,0),1); Scalar color = colors[classId%colors.size()]; // Comment the above line and uncomment the two lines below to generate different instance colors //int colorInd = rand() % colors.size(); //Scalar color = colors[colorInd]; // Resize the mask, threshold, color and apply it on the image resize(objectMask, objectMask, Size(box.width, box.height)); Mat mask = (objectMask > maskThreshold); Mat coloredRoi = (0.3 * color + 0.7 * frame(box)); coloredRoi.convertTo(coloredRoi, CV_8UC3); // Draw the contours on the image vectorcontours; Mat hierarchy; mask.convertTo(mask, CV_8U); findContours(mask, contours, hierarchy, RETR_CCOMP, CHAIN_APPROX_SIMPLE); drawContours(coloredRoi, contours, -1, color, 5, LINE_8, hierarchy, 100); coloredRoi.copyTo(frame(box), mask); }
C++完整代码
// Copyright (C) 2018-2019, BigVision LLC (LearnOpenCV.com), All Rights Reserved. // Author : Sunita Nayak // Article : https://www.learnopencv.com/deep-learning-based-object-detection-and-instance-segmentation-using-mask-r-cnn-in-opencv-python-c/ // License: BSD-3-Clause-Attribution (Please read the license file.) // Usage example: ./mask_rcnn.out --video=run.mp4 // ./mask_rcnn.out --image=bird.jpg #include#include #include #include #include #include #include const char* keys = "{help h usage ? | | Usage examples: ntt./mask-rcnn.out --image=traffic.jpg ntt./mask-rcnn.out --video=sample.mp4}" "{image i | | input image }" "{video v | | input video }" "{device d | | device }" ; using namespace cv; using namespace dnn; using namespace std; // Initialize the parameters float confThreshold = 0.5; // Confidence threshold float maskThreshold = 0.3; // Mask threshold vector classes; vector colors; // Draw the predicted bounding box void drawBox(Mat& frame, int classId, float conf, Rect box, Mat& objectMask); // Postprocess the neural network's output for each frame void postprocess(Mat& frame, const vector & outs); int main(int argc, char** argv) { CommandLineParser parser(argc, argv, keys); parser.about("Use this script to run object detection using YOLO3 in OpenCV."); if (parser.has("help")) { parser.printMessage(); return 0; } // Load names of classes string classesFile = "mscoco_labels.names"; ifstream ifs(classesFile.c_str()); string line; while (getline(ifs, line)) classes.push_back(line); string device = parser.get ("device"); // Load the colors string colorsFile = "colors.txt"; ifstream colorFptr(colorsFile.c_str()); while (getline(colorFptr, line)) { char* pEnd; double r, g, b; r = strtod (line.c_str(), &pEnd); g = strtod (pEnd, NULL); b = strtod (pEnd, NULL); Scalar color = Scalar(r, g, b, 255.0); colors.push_back(Scalar(r, g, b, 255.0)); } // Give the configuration and weight files for the model String textGraph = "/home/SMCV/einrj/my_projects/Cxx/DeepLearnCV/MaskRCNN/mask_rcnn_inception_v2_coco_2018_01_28.pbtxt"; String modelWeights = "/home/SMCV/einrj/my_projects/Cxx/DeepLearnCV/MaskRCNN/mask_rcnn_inception/frozen_inference_graph.pb"; // Load the network Net net = readNetFromTensorflow(modelWeights, textGraph); if (device == "cpu") { cout << "Using CPU device" << endl; net.setPreferableBackend(DNN_TARGET_CPU); } else if (device == "gpu") { cout << "Using GPU device" << endl; net.setPreferableBackend(DNN_BACKEND_CUDA); net.setPreferableTarget(DNN_TARGET_CUDA); } // Open a video file or an image file or a camera stream. string str, outputFile; VideoCapture cap; VideoWriter video; Mat frame, blob; try { outputFile = "mask_rcnn_out_cpp.avi"; if (parser.has("image")) { // Open the image file str = parser.get ("image"); cout << "Image file input : " << str << endl; ifstream ifile(str); if (!ifile) throw("error"); cap.open(str); str.replace(str.end()-4, str.end(), "_mask_rcnn_out.jpg"); outputFile = str; } else if (parser.has("video")) { // Open the video file str = parser.get ("video"); ifstream ifile(str); if (!ifile) throw("error"); cap.open(str); str.replace(str.end()-4, str.end(), "_mask_rcnn_out.avi"); outputFile = str; } // Open the webcam else cap.open(parser.get ("webcam")); } catch(...) { cout << "Could not open the input image/video stream" << endl; return 0; } // Get the video writer initialized to save the output video if (!parser.has("image")) { video.open(outputFile, VideoWriter::fourcc('M','J','P','G'), 28, Size(cap.get(CAP_PROP_frame_WIDTH), cap.get(CAP_PROP_frame_HEIGHT))); } // Create a window static const string kWinName = "Deep learning object detection in OpenCV"; namedWindow(kWinName, WINDOW_NORMAL); // Process frames. while (waitKey(1) < 0) { // get frame from the video cap >> frame; // Stop the program if reached end of video if (frame.empty()) { cout << "Done processing !!!" << endl; cout << "Output file is stored as " << outputFile << endl; waitKey(3000); break; } // Create a 4D blob from a frame. blobFromImage(frame, blob, 1.0, Size(frame.cols, frame.rows), Scalar(), true, false); //blobFromImage(frame, blob); //Sets the input to the network net.setInput(blob); // Runs the forward pass to get output from the output layers std::vector outNames(2); outNames[0] = "detection_out_final"; outNames[1] = "detection_masks"; vector outs; net.forward(outs, outNames); // Extract the bounding box and mask for each of the detected objects postprocess(frame, outs); // Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes) vector layersTimes; double freq = getTickFrequency() / 1000; double t = net.getPerfProfile(layersTimes) / freq; string label = format("Mask-RCNN on 2.5 GHz Intel Core i7 CPU, Inference time for a frame : %0.0f ms", t); putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 0)); // Write the frame with the detection boxes Mat detectedframe; frame.convertTo(detectedframe, CV_8U); if (parser.has("image")) imwrite(outputFile, detectedframe); else video.write(detectedframe); imshow(kWinName, frame); } cap.release(); if (!parser.has("image")) video.release(); return 0; } // For each frame, extract the bounding box and mask for each detected object void postprocess(Mat& frame, const vector & outs) { Mat outDetections = outs[0]; Mat outMasks = outs[1]; // Output size of masks is NxCxHxW where // N - number of detected boxes // C - number of classes (excluding background) // HxW - segmentation shape const int numDetections = outDetections.size[2]; const int numClasses = outMasks.size[1]; outDetections = outDetections.reshape(1, outDetections.total() / 7); for (int i = 0; i < numDetections; ++i) { float score = outDetections.at (i, 2); if (score > confThreshold) { // Extract the bounding box int classId = static_cast (outDetections.at (i, 1)); int left = static_cast (frame.cols * outDetections.at (i, 3)); int top = static_cast (frame.rows * outDetections.at (i, 4)); int right = static_cast (frame.cols * outDetections.at (i, 5)); int bottom = static_cast (frame.rows * outDetections.at (i, 6)); left = max(0, min(left, frame.cols - 1)); top = max(0, min(top, frame.rows - 1)); right = max(0, min(right, frame.cols - 1)); bottom = max(0, min(bottom, frame.rows - 1)); Rect box = Rect(left, top, right - left + 1, bottom - top + 1); // Extract the mask for the object Mat objectMask(outMasks.size[2], outMasks.size[3],CV_32F, outMasks.ptr (i,classId)); // Draw bounding box, colorize and show the mask on the image drawBox(frame, classId, score, box, objectMask); } } } // Draw the predicted bounding box, colorize and show the mask on the image void drawBox(Mat& frame, int classId, float conf, Rect box, Mat& objectMask) { //Draw a rectangle displaying the bounding box rectangle(frame, Point(box.x, box.y), Point(box.x+box.width, box.y+box.height), Scalar(255, 178, 50), 3); //Get the label for the class name and its confidence string label = format("%.2f", conf); if (!classes.empty()) { CV_Assert(classId < (int)classes.size()); label = classes[classId] + ":" + label; } //Display the label at the top of the bounding box int baseLine; Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine); box.y = max(box.y, labelSize.height); rectangle(frame, Point(box.x, box.y - round(1.5*labelSize.height)), Point(box.x + round(1.5*labelSize.width), box.y + baseLine), Scalar(255, 255, 255), FILLED); putText(frame, label, Point(box.x, box.y), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0,0,0),1); Scalar color = colors[classId%colors.size()]; // Resize the mask, threshold, color and apply it on the image resize(objectMask, objectMask, Size(box.width, box.height)); Mat mask = (objectMask > maskThreshold); Mat coloredRoi = (0.3 * color + 0.7 * frame(box)); coloredRoi.convertTo(coloredRoi, CV_8UC3); // Draw the contours on the image vector contours; Mat hierarchy; mask.convertTo(mask, CV_8U); findContours(mask, contours, hierarchy, RETR_CCOMP, CHAIN_APPROX_SIMPLE); drawContours(coloredRoi, contours, -1, color, 5, LINE_8, hierarchy, 100); coloredRoi.copyTo(frame(box), mask); }
github 源码地址:https://github.com/yuanxinshui/DeepLearnCV/tree/main/Mask-RCNN
参考Mask R-CNN
Faster R-CNN
Fast R-CNN
R-CNN
https://learnopencv.com/deep-learning-based-object-detection-and-instance-segmentation-using-mask-rcnn-in-opencv-python-c/
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)