锚框和边缘框_java

这篇主要是参考了李沐大佬的教程，锚框和边缘框是目标检测中最重要的工具，用于框住图片中的物体。

1. 边缘框

边缘框是图片中物体的真实位置和范围，有两种表示方式，一种是边角坐标表示法，通过物体左上和右下两个角的坐标表示一个矩形框，还有一种是中心表示法，用物体的中心和宽高表示矩形框。代码如下：

import torch
from d2l import torch as d2l
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
img = mpimg.imread("./catdog.jpg")
plt.figure(figsize=(5,5))

# 边界框的描述方法
# 1.边角表示，2.中心表示
def box_corner_to_center(boxes):
    x1, y1, x2, y2 = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
    cx = (x1 + x2) / 2
    cy = (y1 + y2) / 2
    w = x2 - x1
    h = y2 - y1
    boxes = torch.stack((cx, cy, w, h), axis=-1)
    return boxes

def box_center_to_corner(boxes):
    cx, cy, w, h = boxes[:, 0], boxes[:, 1], boxes[:, 2], boxes[:, 3]
    x1 = cx - w / 2
    y1 = cy - h / 2
    x2 = cx + w / 2
    y2 = cy + h / 2
    boxes = torch.stack((x1, y1, x2, y2), axis=-1)
    return boxes

dog_bbox, cat_bbox = [60.0, 45.0, 378.0, 516.0], [400.0, 112.0, 655.0, 493.0]
boxes = torch.tensor((dog_bbox, cat_bbox))
box_center_to_corner(box_corner_to_center(boxes)) == boxes

两种表示方式可以互相转换。
绘制边缘框的代码如下：

def bbox_to_rect(bbox, color):
    return plt.Rectangle(
        xy=(bbox[0], bbox[1]),
        width=bbox[2] - bbox[0],
        height=bbox[3] - bbox[1],
        fill=False,
        edgecolor=color,
        linewidth=2
    )
fig = plt.imshow(img)
fig.axes.add_patch(bbox_to_rect(dog_bbox, "blue"))
fig.axes.add_patch(bbox_to_rect(cat_bbox, "red"))

结果如下所示：

2. 锚框

锚框是目标检测在图像上采样得到的一系列边框，通过深度学习方法完成两项任务，一项是分类任务，即判断锚框中的物体是什么，第二项是回归任务，即需要使得锚框尽可能和边缘框重合。

通常采用以像素为中心生成一系列锚框，因此需要定义两个参数， s i z e s = { s 1 , s 2 , s 3 , . . . , s n } sizes=\{s_1, s_2, s_3, ...,s_n\} sizes={s1,s2,s3,...,sn}和 r a t i o s = { r 1 , r 2 , r 3 , . . . , r m } ratios=\{r_1,r_2, r_3,...,r_m\} ratios={r1,r2,r3,...,rm},分别用于控制锚框占据图片的比例以及锚框的宽高比。为了避免过多组合，我们将其写为如下 n + m − 1 n+m-1 n+m−1种组合:
{ ( s 1 , r 1 ) , ( s 2 , r 1 ) , ( s 3 , r 1 ) , . . . . , ( s n , r 1 ) , ( s 1 , r 2 ) , ( s 1 , r 2 ) , . . . , ( s 1 , r m ) } \{(s_1, r_1), (s_2, r_1), (s_3, r_1),....,(s_n, r_1), (s_1, r_2), (s_1, r_2),..., (s_1, r_m)\} {(s1,r1),(s2,r1),(s3,r1),....,(sn,r1),(s1,r2),(s1,r2),...,(s1,rm)}
由于卷积神经网络抽取特征的特点，每次经过一次卷积层总会改变图片的尺寸，因此需要计算出锚框的宽高分别占此时图片的比例。

计算如下:
令 w w w, h h h为锚框的实际宽高， W W W, H H H为图片的宽高，定义 w h W H = s 2 \frac{wh}{WH}=s^2 WHwh=s2, w h = r \frac{w}{h}=r hw=r。

那么， w = s W H r w=s\sqrt{WHr} w=sWHr , h = s W H r h=s\sqrt{\frac{WH}{r}} h=srWH ,将其归一化则可以得到 w 0 = s H r W w_0=s\sqrt{\frac{Hr}{W}} w0=sWHr 以及 h 0 = s W H r h_0=s\sqrt{\frac{W}{Hr}} h0=sHrW 。
代码如下所示：

def multibox_prior(data, sizes, ratios):
	# data [batch_size, channels, H, W]
    in_height, in_width = data.shape[-2:]
    device, num_sizes, num_ratios = data.device, len(sizes), len(ratios)
    boxes_per_pixel = (num_sizes + num_ratios - 1)
    size_tensor = torch.tensor(sizes, device=device)
    ratio_tensor = torch.tensor(ratios, device=device)

    # 为了将锚点移动到像素的中心，需要设置偏移量。
    # 因为一个像素的的高为1且宽为1，我们选择偏移我们的中心0.5
    # 所有的 *** 作都是归一化的
    offset_h, offset_w = 0.5, 0.5
    steps_h = 1.0 / in_height  # 在y轴上缩放步长
    steps_w = 1.0 / in_width  # 在x轴上缩放步长

    # 生成锚框的所有中心点
    center_h = (torch.arange(in_height, device=device) + offset_h) * steps_h
    center_w = (torch.arange(in_width, device=device) + offset_w) * steps_w
    shift_y, shift_x = torch.meshgrid(center_h, center_w)
    shift_y, shift_x = shift_y.reshape(-1), shift_x.reshape(-1)
    # shiftx, shifty [in_height * in _width]

    # 生成“boxes_per_pixel”个高和宽，
    # 之后用于创建锚框的四角坐标(xmin,xmax,ymin,ymax)
    w = torch.cat((size_tensor[0] * torch.sqrt((in_height * ratio_tensor[:]) / in_width),
                   size_tensor[1:] * torch.sqrt((in_height * ratio_tensor[0]) / in_width)))

    h = torch.cat((size_tensor[0] * torch.sqrt(in_width / (in_height * ratio_tensor[:])),
                  size_tensor[1:] * torch.sqrt(in_width / (in_height * ratio_tensor[0]))))
	# w [n + m - 1, ]
	# h [n + m - 1, ]

    # 除以2来获得半高和半宽
    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
                                        in_height * in_width, 1) / 2
	# anchor_manipulations [(n + m -1) * in_height * in_width, 4]
    # 每个中心点都将有“boxes_per_pixel”个锚框，
    # 所以生成含所有锚框中心的网格，重复了“boxes_per_pixel”次
    out_grid = torch.stack([shift_x, shift_y, shift_x, shift_y],
                dim=1).repeat_interleave(boxes_per_pixel, dim=0)
    # out_grid [(n + m -1) * in_height * in_width, 4]
    output = out_grid + anchor_manipulations
    return output.unsqueeze(0)

接下来读取图片，最后能够生成 H × W × ( n + m − 1 ) H \times W \times (n + m - 1) H×W×(n+m−1)个锚框。

img = mpimg.imread("./catdog.jpg")
h, w = img.shape[:2]

print(h, w)
X = torch.rand(size=(1, 3, h, w))
Y = multibox_prior(X, sizes=[0.75, 0.5, 0.25], ratios=[1, 2, 0.5])
print(Y.shape)
boxes = Y.reshape(h, w, 5, 4)

最后显示锚框，这里选取一个像素点进行生成：

def show_bboxes(axes, bboxes, labels=None, colors=None):
    """显示所有边界框"""
    def _make_list(obj, default_values=None):
        if obj is None:
            obj = default_values
        elif not isinstance(obj, (list, tuple)):
            obj = [obj]
        return obj

    labels = _make_list(labels)
    colors = _make_list(colors, ['b', 'g', 'r', 'm', 'c'])
    for i, bbox in enumerate(bboxes):
        color = colors[i % len(colors)]
        rect = bbox_to_rect(bbox.detach().numpy(), color)
        axes.add_patch(rect)
        if labels and len(labels) > i:
            text_color = 'k' if color == 'w' else 'w'
            axes.text(rect.xy[0], rect.xy[1], labels[i],
                      va='center', ha='center', fontsize=9, color=text_color,
                      bbox=dict(facecolor=color, lw=0))
plt.figure(figsize=(5, 5))
bbox_scale = torch.tensor((w, h, w, h))
fig = plt.imshow(img)
show_bboxes(fig.axes, boxes[250, 250, :, :] * bbox_scale,
            ['s=0.75, r=1', 's=0.5, r=1', 's=0.25, r=1', 's=0.75, r=2',
             's=0.75, r=0.5'])

显示如下：

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/759790.html

锚框和边缘框

发表评论

评论列表（0条）