码迷,mamicode.com
首页 > 其他好文 > 详细

Progress and Prospect of target detection technology based on deep learning

时间:2018-08-03 01:14:23      阅读:228      评论:0      收藏:0      [点我收藏+]

标签:fine   seconds   run   net   ESS   ado   test   only   sum   

Viola-Jones face detector

技术分享图片

  • One of the more successful examples of object detection in the whole computer field is the Viola-Jones face detector that appeared around 2000,which makes it a more mature technique compared to the object detection.The basic idea of this method is sliding window type,using a fixed size window to slide in the input image,and the window frame will be sent to the classifier to judge whether it is a face window or a non human face window.The size of the sliding window is fixed,but the size of the face is varied.In size of the sliding is fixed,but the size of the face is varied.In order to detect the different size of the face,it is necessary to scale the input image to different sizes,so that the face of different sizes,so that the face of different sizes can match the size of the window at a certain scale.One obvious problem with this sliding window approach is that there are too many problem with this sliding window approach is that there are too many places to check to determine whether the face is human or not.
  • Judging whether it is a human face,this is the two classification problem.In 2000,the AdaBoost classifier was used.When classifying,the input of the classifying,the input of the classifier is a Haar feature,which is a very simple feature that can see a lot of small black and white blocks on the graph.The Haar feature is the sum of all pixel values of the black region minus the sum of all the pixels in the white area,with this difference as a feature,black block and white.Blocks have different sizes and relative positions,which form many different Haar features.AdaBoost classifier isa strong classifier which is composed of multiple weak classifiers.The Viola-Jones detector is composed of multiple AdaBoost classifiers.This cascade is an important function of acceleration.
  • When face detection technology is mature in 2000,some practital applications have appeared,such as the function of face focusing in digital cameras.When the camera is photographed,the camera will automatically detect the face,and the adjust the focal length to better according to the position of the face.

    Deformable component model

  • After Viola-Jones face detector,another important method appeared in 2009:Deformable part model(DPM),that is,deformable part model.As far as face detection is concerned,a face can be roughly regarded as a rigid body,usually without very large deformation,such as the position of the mouth to the nose.But for other objects,such as the human body,people can lift their arms up and turn their legs up,which makes the body have a very large,very large non rigid transformation,and DPM can better deal with the transformation by modeling the components.At the beginning,people tried to try to test pedestrians with the Haar feature +AdaBoost classifier ,but find the results were not good.By 2009,there was a DPM to model different parts,such as the head with arms and knees,and then classifying the parts based on the local components and the whole.That‘s a lot better.DPM is relatively complex and the detection speed is relatively slow,but it has achieved some results in the task of face detection and pedestrian and vehicle detection.Later,some methods of speeding up DPM were introduced to improve the detection speed.DPM gets the modeling of components,which is a good method,but is covered by the light of the deep learning.Deep learning brought a great improvement in the accuracy of detection.so some people studying DPM are also rapidly moving to deep learning.

    R-CNN series

  • For object detection methods based on deep learning,one of those is R-CNN series,another is the combination of traditional methods and deep learning methods.
    技术分享图片

The so-called R-CNN,is based on a very simple thinking,for the input image,through selective search and other methods,we first identify 2000 windows that are most likely to contain objects,and for these 2000 windows,we want it to be able to contain objects,and for these 2000 windows,we want it to be able to achieve a very high recall of the detected object.Then each of these 2000 is used to extract and classify the features from CNN.For these 2000 areas to run a CNN,then it is very slow,and even if it takes only 0.5 seconds each time,20000 windows will take 1000 seconds.In order to SPP-net,which is to run a CNN on the whole map,without doing each window alone,but there is a small difficulty is that the size of each of the 2000 candidate windows is different.In order to solve to solve this problem,SPP-net designs the spatial pyramid pooling,which makes the different large the large windows characteristic of the same dimension.This method makes it unnecessary to compute the convolution of each candidate window,but it is still not fast enough.It still takes several seconds to detect second to detect an image.
技术分享图片

  • Fast R-CNN draws on the practice of SPP-net, convolutions in the entire graph, and then uses ROI-pooling to get the fixed feature vectors, such as the size of the window, converted to 7x7 so large. Fast R-CNN also introduces an important strategy to classify the window and return the frame of the object to make the detection box more accurate. In front of us, we say that the candidate window will have a very high recall rate, but the possible location of the frame is not very accurate. For example, a person‘s body frame may be a lack of arm and leg, then the detection frame can be calibrated by regression, and the initial position is refined. Fast R-CNN put classification and regression together and adopted a multi task collaborative learning approach.
    Faster R-CNN brings a bigger change than Fast R-CNN, and the step that will generate the candidate window is also done with the depth network, and the network is shared with the Fast R-CNN classification network. The network that produces the candidate window is called the RPN, which is the core of Faster R-CNN. RPN replaced the very slow Selective Search before, and usually the number of candidate windows used is relatively small, only 300 is enough, which makes the back classification faster. In order to detect a variety of objects, RPN introduces the design of the so-called anchor box. Specifically, on the feature graph of the last coiling layer output, RPN is first used to obtain the eigenvectors of each position with the convolution of 3X3, and then based on the eigenvector to return to 9 windows of different sizes and the ratio of length to width, if the size of the feature graph is size. It‘s 40x60, so there will be about more than 20 thousand windows in total, sorting these windows by credibility, and taking the first 300 as a candidate window to do the final classification. By replacing Selective Search with RPN and using a shared coiling layer and reducing the number of candidate windows at the same time, Faster R-CNN has been significantly improved in speed, and the speed of 5fps can be reached on GPU.

The Unknown Word

The First Column The Second Column
execution [eksi‘kjution]执行
resume 恢复
resume the script exection 恢复脚本执行
pedestrain [pe‘destrien]行人
vehicle 车辆[vi:ekl]
deformable [di‘fo:mebel]可变形的
graphics 图像[‘graefiks]
transactions 处理
graphics transactions 图像处理
spatial 空间的[‘speishel]
pyramid 金字塔[‘piremid]

Progress and Prospect of target detection technology based on deep learning

标签:fine   seconds   run   net   ESS   ado   test   only   sum   

原文地址:https://www.cnblogs.com/hugeng007/p/9410837.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!