标签:修复 端到端 目标 any 卷积神经网络 clear 分类 输入 优势
不同于vanilla convolution
,将所有输入的像素做为有效像素(valid pixels)
,gated convolution
,是一个基于partial convolution
的,不同的是gated convolution
为所有通道和每一个维度的位置(inside or outside masks, RGB channels or user-guidance channels
)提供了一种可学习的,动态的特征选择机制。
为了稳定加速训练,提出了patch-based GAN loss
,即SN-PatchGAN
。SN-patchGAN
是通过应用spectral normalized discriminator on dense image patches.
图像修复别名:Image inpainting
、image completion
、image hole-filling
。
图像修复定义:在缺失区域图像中,合成替代内容。
图像修复的用途:可以用于移除分散注意力的物体或修改照片中不需要的区域。还可以扩展到图像/视频剪切(un-cropping
)、旋转、拼接、重新定位(re-targeting
)、重新组合(recompression
)、超分辨率、协调(harmonization
)和许多其它任务。
图像修复分类:
stationary textures
)。为了解决这一局限性,部分卷积(PartialConv
),其中卷积被掩蔽(masked
)和归一化,仅以有效像素为条件。基于规则的掩码更新策略,用于更新下一层的有效位置。部分卷积将所有位置视为无效或有效,并用0
或1
掩码乘以所有层的输入,该掩码可以看做是一个单一的不可学习的特征门(gate
)通道。
然而,这种假设是有个几个局限性:
1
掩码的部分卷积不能提供这样的信息。gated convolution 掩码更新过程
:the input feature is firstly used to compute gating values \(g = σ(w_gx)\) (\(σ\) is sigmoid function, \(w_g\) is learnable parameter). The final output is a multiplication of learned feature and gating values \(y = φ(wx)⊙g\) where φ can be any activation function.
与其他算法对比优势:
目录结构:1.Gated Convolution,SN-PatchGAN
,2.inpainting network
,3. our extension to allow optional user guidance
.
vanilla convolutions
不适合图像修复任务
The equation shows that for all spatial locations (y, x), the same filters are applied to produce the output in vanilla convolutional layers.This makes sense for tasks such as image classification and object detection, where all pixels of input image are valid, to extract local features in a sliding window fashion.
However, for image inpainting, the input are composed(组成) of both regions with valid pixels/features outside holes and invalid pixels/features (in shallow layers) or synthesized pixels/features (in deep layers) in masked regions. This causes ambiguity (歧义) during training and leads to visual artifacts such as color discrepancy(差异), blurriness and obvious edge responses during testing.
inpainting
任务,因为inpainting
任务中hole
里面的像素是无效值,因此对hole
里面的内容和外面的内容要加以区分,partial conv
虽然将里面和外面的内容加以区分了,但是它将含有1个有效值像素的区域与含有9个有效值像素的区域同等对待,这明显是不合理的,gated conv
则是使用卷积和sigmoid函数来使得网络去学习这种区分。partial conv
的不足之处:
(1) It heuristically(启发式) classifies all spatial locations to be either valid or invalid. The mask in next layer will be set to ones no matter how many pixels are covered by the filter range in previous layer (for example, 1 valid pixel and 9 valid pixels are treated as same to update current mask).
无论像素多少,只要存在至少一个,就将mask
设置为1
。(2) It is incompatible(不兼容) with additional user inputs. We aim at a user-guided image inpainting system where users can optionally(随意) provide sparse sketch(草图) inside the mask as conditional channels. In this situation, should these pixel locations be considered as valid or invalid? How to properly update the mask for next layer?
(3) For partial convolution the invalid pixels will progressively disappear in deep layers, gradually converting all mask values to ones. However, our study shows that if we allow the network to learn optimal(最佳) mask automatically, the network assigns soft mask values to every spatial locations even in deep layers.
(4) All channels in each layer share the same mask, which limits the flexibility. Essentially, partial convolution can be viewed as un-learnable single-channel feature hard-gating.
gated convolution learns a dynamic feature selection mechanism for each channel and each spatial location. Interestingly(有趣的是), visualization(可视化) of intermediate gating values(中间的gate值) show that it learns to select the feature not only according to background, mask, sketch, but also considering semantic segmentation in some channels. Even in deep layers, gated convolution learns to highlight the masked regions and sketch information in separate(单独,分开) channels to better generate inpainting results.
之前的修复网络,为了修复带有矩形缺失部分的图片,提出了local GAN
(局部GAN)来提升实验的结果。
然而,我们要研究的是对任意形状的缺失的情况,借鉴global and local GANs
、MarkovianGANs
、perceptual loss
、spectral-normalized GANs
,作者提出了一个有效的GAN loss
,即为SN-PatchGAN
。
SN-PatchGAN
的组成,是由卷积网络构成,输入为image
、mask
、guidance channel
,输出是一个形状为h×w×c
的3维特征,h、w、c
分别代表高、宽和通道数。如图3
,6个卷积层(卷积核大小为2,步幅为2)堆叠来获得Markovian Patches
特征的统计信息。然后直接将SN-PatchGAN
应用到特征图的每一个特征元素,以输入图像的不同位置和不同语义(在不同的通道中表示)的形式表示GAN的h×w×c
个。
值得注意的是,在训练的环境中,输出图中每个神经元的感受野可以覆盖整个输入图像,因此不需要全局判别器
。
作者也采用了最近提出的Spectral normalization
来进一步稳定GANs
的训练。我们采用SN-GANs
中描述的默认Spectral normalization
的fast approximation
算法。
为了判别输入的真假,作者也采用了hings loss
来作为目标函数,生成器:
判别器:
其中\(D^{sn}\)表示spectral-normalized discriminator
,\(G\)表示输入缺失图像\(z\)的图像修复网络。
未采用Perceptual loss
的原因是相似的patch-level information
已经被编码在SN-PatchGAN
中。
最后的目标函数:pixel-wise ?1 reconstructionloss and SN-PatchGAN loss
,权重为1:1.
Gated convolution layer
和SN-PatchGan loss
的generative inpainting network
。encoder-decoder network
(PartialConv采用的是类似U-net的结构)。Places2
、CelebA-HQ faces
4.1M
512×512
。(无论缺失部分的大小为多少)mean ?1 error and mean ?2 error
。validation images of Places2
both center rectangle mask and free-form mask
Partial Convolution
得到了比较糟糕的效果,这要是因为不可学习的掩码更新规则(un-learnable rule-based gating
)。SN-PatchGAN
损失和逐像素?1
损失的简单组合,默认损失平衡超参数为1:1,产生了逼真的修复效果。free-form image inpainting system
basd on an end-to-end generative network
with gated convolution
,trained with pixel-wise ?1 loss
and SN-PatchGAN
.Free-Form Image Inpainting with Gated Convolution
标签:修复 端到端 目标 any 卷积神经网络 clear 分类 输入 优势
原文地址:https://www.cnblogs.com/wenshinlee/p/12638518.html