🤖

目标检测入门

📂 ai ⏱ 2 min 227 words

目标检测入门

什么是目标检测

目标检测是在图像中定位并识别多个对象的任务，它不仅要分类，还要输出每个对象的边界框（bounding box）。

YOLO算法

YOLO将目标检测视为回归问题，速度极快：

import torch
import torch.nn as nn

class YOLOBlock(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(YOLOBlock, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, 3, padding=1)
        self.bn = nn.BatchNorm2d(out_channels)
        self.leaky_relu = nn.LeakyReLU(0.1)
    
    def forward(self, x):
        return self.leaky_relu(self.bn(self.conv(x)))

def parse_predictions(predictions, grid_size=7, num_boxes=2, num_classes=20):
    batch_size = predictions.size(0)
    predictions = predictions.view(batch_size, grid_size, grid_size, 
                                   num_boxes * 5 + num_classes)
    return predictions

Anchor Boxes

Anchor boxes是预定义的不同尺寸的边界框模板：

def generate_anchors(feature_map_size, anchor_sizes, image_size):
    anchors = []
    for size in anchor_sizes:
        for y in range(feature_map_size):
            for x in range(feature_map_size):
                cx = (x + 0.5) * image_size / feature_map_size
                cy = (y + 0.5) * image_size / feature_map_size
                anchors.append([cx, cy, size, size])
    return torch.tensor(anchors)

anchor_sizes = [32, 64, 128, 256, 512]
anchors = generate_anchors(13, anchor_sizes, 416)

计算IoU

IoU（交并比）是评估边界框重叠度的指标：

def calculate_iou(box1, box2):
    x1 = torch.max(box1[..., 0], box2[..., 0])
    y1 = torch.max(box1[..., 1], box2[..., 1])
    x2 = torch.min(box1[..., 2], box2[..., 2])
    y2 = torch.min(box1[..., 3], box2[..., 3])
    
    intersection = (x2 - x1).clamp(min=0) * (y2 - y1).clamp(min=0)
    
    box1_area = (box1[..., 2] - box1[..., 0]) * (box1[..., 3] - box1[..., 1])
    box2_area = (box2[..., 2] - box2[..., 0]) * (box2[..., 3] - box2[..., 1])
    
    union = box1_area + box2_area - intersection
    
    return intersection / (union + 1e-6)

mAP评估

mAP（平均精度均值）是目标检测的主要评估指标：

def calculate_ap(precisions, recalls):
    recalls = torch.cat([torch.tensor([0.0]), recalls])
    precisions = torch.cat([torch.tensor([1.0]), precisions])
    
    for i in range(1, len(precisions)):
        precisions[i] = torch.max(precisions[i], precisions[i-1])
    
    ap = 0
    for i in range(1, len(recalls)):
        if recalls[i] != recalls[i-1]:
            ap += (recalls[i] - recalls[i-1]) * precisions[i]
    
    return ap

总结

目标检测是计算机视觉的重要任务。YOLO提供了实时检测能力，而Faster R-CNN则在精度上更胜一筹。