探索 YOLO v3 源码 - 完结篇预测

时间：2020-12-19 11:34:20 阅读：1 评论：0 收藏：0 [点我收藏+]

标签：公众 self hsv 就是 res print lib 处理 redirect

YOLO，即You Only Look Once的缩写，是一个基于卷积神经网络（CNN）的物体检测算法。而YOLO v3是YOLO的第3个版本，即YOLO、YOLO 9000、YOLO v3，检测效果，更准更强。

YOLO v3的更多细节，可以参考YOLO的官网。

技术图片

YOLO是一句美国的俗语，You Only Live Once，你只能活一次，即人生苦短，及时行乐。

本文主要分享，如何实现YOLO v3的算法细节，Keras框架。这是第6篇，检测图片中的物体，使用训练完成的模型，通过框置信度与类别置信度的乘积，筛选最优的检测框。本系列一共6篇，已完结，这是一个完整版：）

本文的GitHub源码：https://github.com/SpikeKing/keras-yolo3-detection

已更新：

第1篇训练：

https://mp.weixin.qq.com/s/T9LshbXoervdJDBuP564dQ

第2篇模型：

https://mp.weixin.qq.com/s/N79S9Qf1OgKsQ0VU5QvuHg

第3篇网络：

https://mp.weixin.qq.com/s/hC4P7iRGv5JSvvPe-ri_8g

第4篇真值：

https://mp.weixin.qq.com/s/5Sj7QadfVvx-5W9Cr4d3Yw

第5篇 Loss：

https://mp.weixin.qq.com/s/4L9E4WGSh0hzlD303036bQ

欢迎关注，微信公众号深度算法（ID: DeepAlgorithm），了解更多深度技术！

1. 检测函数

使用已经训练完成的YOLO v3模型，检测图片中的物体，其中：

创建YOLO类的实例yolo；
使用Image.open()加载图像image；
调用yolo.detect_image()检测图像image；
关闭yolo的session；
显示检测完成的图像r_image；

实现：

def detect_img_for_test(): yolo = YOLO() img_path = ‘./dataset/img.jpg‘ image = Image.open(img_path) r_image = yolo.detect_image(image) yolo.close_session() r_image.show()

输出：

技术图片

2. YOLO参数

YOLO类的初始化参数：

anchors_path：anchor box的配置文件，9个宽高组合；
model_path：已训练完成的模型，支持重新训练的模型；
classes_path：类别文件，与模型文件匹配；
score：置信度的阈值，删除小于阈值的候选框；
iou：候选框的IoU阈值，删除同类别中大于阈值的候选框；
class_names：类别列表，读取classes_path；
anchors：anchor box列表，读取anchors_path；
model_image_size：模型所检测图像的尺寸，输入图像都需要按此填充；
colors：通过HSV色域，生成随机颜色集合，数量等于类别数class_names；
boxes、scores、classes：检测的核心输出，函数generate()所生成，是模型的输出封装。

实现：

self.anchors_path = ‘configs/yolo_anchors.txt‘ # Anchors self.model_path = ‘model_data/yolo_weights.h5‘ # 模型文件 self.classes_path = ‘configs/coco_classes.txt‘ # 类别文件 self.score = 0.20 self.iou = 0.20 self.class_names = self._get_class() # 获取类别 self.anchors = self._get_anchors() # 获取anchor self.sess = K.get_session() self.model_image_size = (416, 416) # fixed size or (None, None), hw self.colors = self.__get_colors(self.class_names) self.boxes, self.scores, self.classes = self.generate()

在__get_colors()中：

将HSV的第0位H值，按1等分，其余SV值，均为1，生成一组HSV列表；
调用hsv_to_rgb，将HSV色域转换为RGB色域；
0~1的RGB值乘以255，转换为完整的颜色值，(0~255)；
随机shuffle颜色列表；

实现：

@staticmethod def __get_colors(names): # 不同的框，不同的颜色 hsv_tuples = [(float(x) / len(names), 1., 1.) for x in range(len(names))] # 不同颜色 colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors)) # RGB np.random.seed(10101) np.random.shuffle(colors) np.random.seed(None) return colors

选择HSV划分，而不是RGB的原因是，HSV的颜色值偏移更好，画出的框，颜色更容易区分。

3. 输出封装

boxes、scores、classes是在模型的基础上，继续封装，由函数generate()所生成，其中：

boxes：框的四个点坐标，(top, left, bottom, right)；
scores：框的类别置信度，融合框置信度和类别置信度；
classes：框的类别；

在函数generate()中，设置参数：

num_anchors：anchor box的总数，一般是9个；
num_classes：类别总数，如COCO是80个类；
yolo_model：由yolo_body所创建的模型，调用load_weights加载参数；

实现：

num_anchors = len(self.anchors) # anchors的数量 num_classes = len(self.class_names) # 类别数 self.yolo_model = yolo_body(Input(shape=(416, 416, 3)), 3, num_classes) self.yolo_model.load_weights(model_path) # 加载模型参数

接着，设置input_image_shape为placeholder，即TF中的参数变量。在yolo_eval中：

继续封装yolo_model的输出output；
anchors，anchor box列表；
类别class_names的总数len()；
输入图片的可选尺寸，input_image_shape，即(416, 416)；
score_threshold，框的整体置信度阈值score；
iou_threshold，同类别框的IoU阈值iou；
返回，框的坐标boxes，框的类别置信度scores，框的类别classes；

实现：

self.input_image_shape = K.placeholder(shape=(2,)) boxes, scores, classes = yolo_eval( self.yolo_model.output, self.anchors, len(self.class_names), self.input_image_shape, score_threshold=self.score, iou_threshold=self.iou) return boxes, scores, classes

输出的scores值，都会大于score_threshold，小于的在yolo_eval()中已被删除。

4. YOLO评估

在函数yolo_eval()中，完成预测逻辑的封装，其中输入：

yolo_outputs：YOLO模型的输出，3个尺度的列表，即13-26-52，最后1维是预测值，由255=3x(5+80)组成，3是每层的anchor数，5是4个框值xywh和1个框中含有物体的置信度，80是COCO的类别数；
anchors：9个anchor box的值；
num_classes：类别个数，COCO是80个类别；
image_shape：placeholder类型的TF参数，默认(416, 416)；
max_boxes：图中最大的检测框数，20个；
score_threshold：框置信度阈值，小于阈值的框被删除，需要的框较多，则调低阈值，需要的框较少，则调高阈值；
iou_threshold：同类别框的IoU阈值，大于阈值的重叠框被删除，重叠物体较多，则调高阈值，重叠物体较少，则调低阈值；

其中，yolo_outputs格式，如下：

[(?, 13, 13, 255), (?, 26, 26, 255), (?, 52, 52, 255)]

其中，anchors列表，如下：

[(10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198), (373,326)]

实现：

boxes, scores, classes = yolo_eval( self.yolo_model.output, self.anchors, len(self.class_names), self.input_image_shape, score_threshold=self.score, iou_threshold=self.iou) def yolo_eval(yolo_outputs, anchors, num_classes, image_shape, max_boxes=20, score_threshold=.6, iou_threshold=.5):

接着，处理参数：

num_layers，输出特征图的层数，3层；
anchor_mask，将anchors划分为3个层，第1层13x13是678，第2层26x26是345，第3层52x52是012；
input_shape：输入图像的尺寸，也就是第0个特征图的尺寸乘以32，即13x32=416，这与Darknet的网络结构有关。

num_layers = len(yolo_outputs) anchor_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] if num_layers == 3 else [[3, 4, 5], [1, 2, 3]] # default setting input_shape = K.shape(yolo_outputs[0])[1:3] * 32

特征图越大，13->52，检测的物体越小，需要的anchors越小，所以anchors列表以倒序赋值。

接着，在YOLO的第l层输出yolo_outputs中，调用yolo_boxes_and_scores()，提取框_boxes和置信度_box_scores，将3个层的框数据放入列表boxes和box_scores，再拼接concatenate展平，输出的数据就是所有的框和置信度。

其中，输出的boxes和box_scores的格式，如下：

boxes: (?, 4) # ?是框数 box_scores: (?, 80)

实现：

boxes = [] box_scores = [] for l in range(num_layers): _boxes, _box_scores = yolo_boxes_and_scores( yolo_outputs[l], anchors[anchor_mask[l]], num_classes, input_shape, image_shape) boxes.append(_boxes) box_scores.append(_box_scores) boxes = K.concatenate(boxes, axis=0) box_scores = K.concatenate(box_scores, axis=0)

concatenate的作用是：将多个层的数据展平，因为框已经还原为真实坐标，不同尺度没有差异。

在函数yolo_boxes_and_scores()中：

yolo_head的输出：box_xy是box的中心坐标，(0~1)相对位置；box_wh是box的宽高，(0~1)相对值；box_confidence是框中物体置信度；box_class_probs是类别置信度；
yolo_correct_boxes，将box_xy和box_wh的(0~1)相对值，转换为真实坐标，输出boxes是(y_min,x_min,y_max,x_max)的值；
reshape，将不同网格的值展平为框的列表，即(?,13,13,3,4)->(?,4)；
box_scores是框置信度与类别置信度的乘积，再reshape展平，(?,80)；
返回框boxes和框置信度box_scores。

实现：

def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape): ‘‘‘Process Conv layer output‘‘‘ box_xy, box_wh, box_confidence, box_class_probs = yolo_head( feats, anchors, num_classes, input_shape) boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape) boxes = K.reshape(boxes, [-1, 4]) box_scores = box_confidence * box_class_probs box_scores = K.reshape(box_scores, [-1, num_classes]) return boxes, box_scores

接着：

mask，过滤小于置信度阈值的框，只保留大于置信度的框，mask掩码；
max_boxes_tensor，每张图片的最大检测框数，max_boxes是20；

实现：

mask = box_scores >= score_threshold max_boxes_tensor = K.constant(max_boxes, dtype=‘int32‘)

接着：

通过掩码mask和类别c，筛选框class_boxes和置信度class_box_scores；
通过NMS，非极大值抑制，筛选出框boxes的NMS索引nms_index；
根据索引，选择gather输出的框class_boxes和置信class_box_scores度，再生成类别信息classes；
将多个类别的数据组合，生成最终的检测数据框，并返回。

实现：

boxes_ = [] scores_ = [] classes_ = [] for c in range(num_classes): class_boxes = tf.boolean_mask(boxes, mask[:, c]) class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c]) nms_index = tf.image.non_max_suppression( class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold) class_boxes = K.gather(class_boxes, nms_index) class_box_scores = K.gather(class_box_scores, nms_index) classes = K.ones_like(class_box_scores, ‘int32‘) * c boxes_.append(class_boxes) scores_.append(class_box_scores) classes_.append(classes) boxes_ = K.concatenate(boxes_, axis=0) scores_ = K.concatenate(scores_, axis=0) classes_ = K.concatenate(classes_, axis=0)

输出格式：

boxes_: (?, 4) scores_: (?,) classes_: (?,)

5. 检测方法

第1步，图像处理：

将图像等比例转换为检测尺寸，检测尺寸需要是32的倍数，周围进行填充；
将图片增加1维，符合输入参数格式；

if self.model_image_size != (None, None): # 416x416, 416=32*13，必须为32的倍数，最小尺度是除以32 assert self.model_image_size[0] % 32 == 0, ‘Multiples of 32 required‘ assert self.model_image_size[1] % 32 == 0, ‘Multiples of 32 required‘ boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size))) # 填充图像 else: new_image_size = (image.width - (image.width % 32), image.height - (image.height % 32)) boxed_image = letterbox_image(image, new_image_size) image_data = np.array(boxed_image, dtype=‘float32‘) print(‘detector size {}‘.format(image_data.shape)) image_data /= 255. # 转换0~1 image_data = np.expand_dims(image_data, 0) # 添加批次维度，将图片增加1维

第2步，feed数据，图像，图像尺寸；

out_boxes, out_scores, out_classes = self.sess.run( [self.boxes, self.scores, self.classes], feed_dict={ self.yolo_model.input: image_data, self.input_image_shape: [image.size[1], image.size[0]], K.learning_phase(): 0 })

第3步，绘制边框，自动设置边框宽度，绘制边框和类别文字，使用Pillow绘图库。

font = ImageFont.truetype(font=‘font/FiraMono-Medium.otf‘, size=np.floor(3e-2 * image.size[1] + 0.5).astype(‘int32‘)) # 字体 thickness = (image.size[0] + image.size[1]) // 512 # 厚度 for i, c in reversed(list(enumerate(out_classes))): predicted_class = self.class_names[c] # 类别 box = out_boxes[i] # 框 score = out_scores[i] # 执行度 label = ‘{} {:.2f}‘.format(predicted_class, score) # 标签 draw = ImageDraw.Draw(image) # 画图 label_size = draw.textsize(label, font) # 标签文字 top, left, bottom, right = box top = max(0, np.floor(top + 0.5).astype(‘int32‘)) left = max(0, np.floor(left + 0.5).astype(‘int32‘)) bottom = min(image.size[1], np.floor(bottom + 0.5).astype(‘int32‘)) right = min(image.size[0], np.floor(right + 0.5).astype(‘int32‘)) print(label, (left, top), (right, bottom)) # 边框 if top - label_size[1] >= 0: # 标签文字 text_origin = np.array([left, top - label_size[1]]) else: text_origin = np.array([left, top + 1]) # My kingdom for a good redistributable image drawing library. for i in range(thickness): # 画框 draw.rectangle( [left + i, top + i, right - i, bottom - i], outline=self.colors[c]) draw.rectangle( # 文字背景 [tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[c]) draw.text(text_origin, label, fill=(0, 0, 0), font=font) # 文案 del draw

补充

1. concatenate

concatenate将相同维度的数据元素连接到一起。

实现：

from keras import backend as K sess = K.get_session() a = K.constant([[2, 4], [1, 2]]) b = K.constant([[3, 2], [5, 6]]) c = [a, b] c = K.concatenate(c, axis=0) print(sess.run(c)) """ [[2. 4.] [1. 2.] [3. 2.] [5. 6.]] """

2. gather

gather以索引选择列表元素。

实现：

from keras import backend as K sess = K.get_session() a = K.constant([[2, 4], [1, 2], [5, 6]]) b = K.gather(a, [1, 2]) print(sess.run(b)) """ [[1. 2.] [5. 6.]] """

OK, that‘s all! Enjoy it!

长按或扫描二维码，欢迎关注，微信公众号深度算法（ID: DeepAlgorithm），了解更多深度技术！

探索 YOLO v3 源码 - 完结篇预测

标签：公众 self hsv 就是 res print lib 处理 redirect

原文地址：https://www.cnblogs.com/shuimuqingyang/p/14132285.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

探索 YOLO v3 源码 - 完结篇 预测

探索 YOLO v3 源码 - 完结篇预测