标签:att between 生成 instance 视频监控系统 hiera 利用 hat mission
The traditional video surveillance systems [6] are mainly designed for the offline analysis of the recorded video streams as well as significantly depend on human operators for their processing. Hence, it cannot afford any uninterrupted real-time video surveillance tasks. Recently proposed smart surveillance systems, however, minimized human intervention on the video stream processing, object detection, and abnormal behavior analysis by using various smart deep learning and machine learning algorithms. For example, automatically processes the collected video frames in a cloud to detect and report any unusual events [20]. A general framework of smart surveillance system involves three level tasks:
传统的视频监控系统[6]主要设计用于离线分析记录的视频流,并且在很大程度上依赖于人工操作来进行处理。因此,它无法承担任何不间断的实时视频监控任务。然而,最近提出的智能监控系统,通过使用各种智能深度学习和机器学习算法,最大限度地减少了人类对视频流处理、目标检测和异常行为分析的干预。例如,自动处理云中收集的视频帧,以检测和报告任何异常事件[20]。智能监控系统的总体框架包括三个层次的任务:
? Level 1: the low-level conducts information extraction like feature detection and object tracking;
? 1级:底层进行特征检测、目标跟踪等信息提取;
? Level 2: the intermediate-level is in charge of mode recognition like action recognition and behavior understanding; and
? 2级:中层负责动作识别、行为理解等模式识别;和
? Level 3: the high-level is about decision making like abnormal event detection.
? 3级:高层负责异常事件检测等决策。
As the first step for any video surveillance application, object detection and classification is essential for further object tracking tasks. Compared to normal moving object detection, distinguishing human being in video frames is more challenging due to the occlusion that is resulted from variable appearance, illumination, and background. While there are a lot of work conducted in the surveillance area, there are only few literatures focusing on human object detection and tracking.
作为任何视频监控应用的第一步,目标检测和分类对于进一步的目标跟踪任务至关重要。与普通的运动目标检测相比,视频帧中人的识别具有更大的挑战性,因为视频帧中人的遮挡是由变化的外观、光照和背景引起的。虽然在监控领域开展了大量的工作,但针对人体目标检测与跟踪的文献却很少。
The Haar-like rectangle feature that encodes the intensity contrast between neighboring regions is suitable for face detection [19]. However, the intensity contrast between regions of a human body depends on the appearance of the human wear, which varies randomly such that the Haar-like feature is not a discriminating feature for human body detection [15]. Scale invariance feature transformation (SIFT) provides an alternative algorithm for human detection through extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene [14]. Grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection [7] and HOG+SVM algorithm [14] has better performance in human detection. Compared with prevalent machine learning classifiers like Convolutional Neuron Network (CNN) [12], SVM has less classifying complexity through taking advantage of only a small amount of training data called support vectors to construct optimal hyperplane. High generation ability of support vector
networks allows SVM to outperform AdaBoost [16] by using non-linear kernels that fit data better. In this paper, we exploit HOG and SVM as feature description and object classifier individually for human body detection.
编码相邻区域之间强度对比度的类Haar矩形特征适合于人脸检测[19]。然而,人体各区域之间的强度对比取决于人体穿着的外观,其随机变化使得Haar样特征不是用于人体检测的判别特征[15]。尺度不变性特征变换(SIFT)通过从图像中提取可用于在对象或场景的不同视图之间执行可靠匹配的特征,为人类检测提供了一种替代算法[14]。方向梯度直方图(HOG)描述符的直方图网格在人体检测[7]中的性能显著优于现有的特征集,并且HOG+SVM算法[14]在人类检测中具有更好的性能。与目前常用的机器学习分类器如卷积神经网络(CNN)[12]相比,SVM仅利用少量的支持向量的训练数据来构造最优超平面,因而具有较少的分类复杂度。支持向量网络的高生成能力使得支持向量机能够通过使用更适合数据的非线性核来优于AdaBoost[16]。本文分别利用HOG和SVM作为特征描述和目标分类器进行人体检测。
After distinguishing human beings from the background, object tracking algorithm is adapted to generate the trajectory of target objects over time through computing their location in each frame of the video streams. Nowadays the most popular tracking schemes include Point Tracking, Kernel Tracking and Silhouette Tracking [21]. Tracking-Learning-Detection (TLD) tracking framework is widely applied in modern object tracking arts of a field [13], which explicitly decomposes the long-term tracking task into tracking, learning, and detection. Boosting [8] and multiple instance learning (MIL) [2] are capable of online training that makes the classifier adaptive while tracking the object. However, a lot of resources are consumed for updating. KCF requires fewer resources with high tracking success rate [9], which becomes a preferable online tracking method in the realtime surveillance system.
在将人与背景区分开后,目标跟踪算法通过计算目标在视频流的每一帧中的位置来生成目标随时间的轨迹。目前最流行的跟踪方案包括点跟踪、核跟踪和轮廓跟踪[21]。跟踪学习检测(TLD)跟踪框架广泛应用于某领域的现代目标跟踪领域[13],它将长期跟踪任务明确地分解为跟踪、学习和检测。提升方法算法[8]和多示例学习(MIL)[2]能够在线训练,使分类器在跟踪目标时具有自适应性。但是,更新需要消耗大量的资源。KCF所需资源少,跟踪成功率高[9],成为实时监控系统中较理想的在线跟踪方法。
In most surveillance systems, the collected video streams are processed and analyzed in cloud centers where abundant computing resources are allocated. Consequently, all video records need to be transmitted to remote servers no matter they are processed online or offline. Researchers concern the delays that are not tolerable in many delay sensitive, mission-critical tasks [17]. Recently, an online, uninterrupted target tracking system has been proposed leveraging the fog computing paradigm to meet the requirements of real-time video processing and instant decision making [3], [4], [5]. Raw video stream generated by drones are merged on near-sight fog computing devices, such as tablet or laptop. However, image transmission between drone and fog computing nodes still consumes a lot of the bandwidth. Thus, it increases the probability of latency and error rate and post challenges to fast-response requirement.
在大多数监控系统中,采集到的视频流都是在云中心进行处理和分析的,云中心分配了丰富的计算资源。因此,无论在线还是离线处理,所有视频记录都需要传输到远程服务器。研究人员担心在许多对延迟敏感、任务关键的任务中无法容忍的延迟[17]。近年来,利用光纤陀螺的计算模式,提出了一种在线、不间断的目标跟踪系统,以满足实时视频处理和即时决策的要求[3]、[4]、[5]。无人机生成的原始视频流被合并到平板电脑或笔记本电脑等近距离雾计算设备上。然而,无人机和光纤陀螺计算节点之间的图像传输仍然占用大量带宽。因此,它增加了延迟和错误率的概率,并对快速响应需求提出了挑战。
Unlike the existing approaches, we are using edge computing for detecting and tracking human targets. All the generated raw video streams are initially processed locally by edge devices instead of sending the raw video frames to the remote fog nodes or cloud center. Only the processed features and tracking information are transmitted to fog or cloud for higher level processing, such as model recognition and anomalous detection. The hierarchical approach enables quasi-real-time video frame processing by enhancing the parallelism as well as decreases network delay by reducing the network bandwidth utilization and communication workload.
与现有的方法不同,我们使用边缘计算来检测和跟踪人类目标。所有生成的原始视频流最初由边缘设备本地处理,而不是将原始视频帧发送到远程雾节点或云中心。只有经过处理的特征和跟踪信息才能传输到雾或云进行更高层次的处理,如模型识别和异常检测。分层方法通过提高并行性实现准实时视频帧处理,同时通过降低网络带宽利用率和通信工作量减少网络延迟。
标签:att between 生成 instance 视频监控系统 hiera 利用 hat mission
原文地址:https://www.cnblogs.com/caihan/p/12245129.html