码迷,mamicode.com
首页 > 其他好文 > 详细

Introduction to CELP Coding

时间:2017-10-24 11:48:59      阅读:263      评论:0      收藏:0      [点我收藏+]

标签:region   man   详细   body   only   this   org   air   ima   

Speex is based on CELP, which stands for Code Excited Linear Prediction. This section attempts to introduce the principles behind CELP, so if you are already familiar with CELP, you can safely skip to section 7. The CELP technique is based on three ideas:

 

  1. The use of a linear prediction (LP) model to model the vocal tract
  2. The use of (adaptive and fixed) codebook entries as input (excitation) of the LP model
  3. The search performed in closed-loop in a ``perceptually weighted domain‘‘

This section describes the basic ideas behind CELP. Note that it‘s still incomplete.

 


Linear Prediction (LPC)

Linear prediction is at the base of many speech coding techniques, including CELP. The idea behind it is to predict the signal 技术分享 using a linear combination of its past samples:

 

 

技术分享

 

 

where 技术分享 is the linear prediction of 技术分享. The prediction error is thus given by: 

 

技术分享

 

 

The goal of the LPC analysis is to find the best prediction coefficients 技术分享 which minimize the quadratic error function: 

 

技术分享

 

 

That can be done by making all derivatives 技术分享 equal to zero: 

 

技术分享

 

 

The 技术分享 filter coefficients are computed using the Levinson-Durbin algorithm, which starts from the auto-correlation 技术分享 of the signal 技术分享.

 

 

技术分享

 

 

For an order 技术分享 filter, we have: 

 

技术分享

 

 

 

 

技术分享

 

 

The filter coefficients 技术分享 are found by solving the system 技术分享. What the Levinson-Durbin algorithm does here is making the solution to the problem 技术分享instead of 技术分享 by exploiting the fact that matrix 技术分享 is toeplitz hermitian. Also, it can be proven that all the roots of 技术分享 are within the unit circle, which means that 技术分享 is always stable. This is in theory; in practice because of finite precision, there are two commonly used techniques to make sure we have a stable filter. First, we multiply 技术分享 by a number slightly above one (such as 1.0001), which is equivalent to adding noise to the signal. Also, we can apply a window to the auto-correlation, which is equivalent to filtering in the frequency domain, reducing sharp resonances.

The linear prediction model represents each speech sample as a linear combination of past samples, plus an error signal called the excitation (or residual). 

 

技术分享

 

 

In the z-domain, this can be expressed as

 

 

技术分享

 

 

where 技术分享 is defined as

 

 

技术分享

 

 

We usually refer to 技术分享 as the analysis filter and 技术分享 as the synthesis filter. The whole process is called short-term prediction as it predicts the signal 技术分享using a prediction using only the 技术分享 past samples, where 技术分享 is usually around 10.

Because LPC coefficients have very little robustness to quantization, they are converted to Line Spectral Pair (LSP) coefficients which have a much better behaviour with quantization, one of them being that it‘s easy to keep the filter stable.

 


Pitch Prediction

During voiced segments, the speech signal is periodic, so it is possible to take advantage of that property by approximating the excitation signal 技术分享 by a gain times the past of the excitation:

 

 

技术分享

 

 

where 技术分享 is the pitch period, 技术分享 is the pitch gain. We call that long-term prediction since the excitation is predicted from 技术分享 with 技术分享.

 

Innovation Codebook

The final excitation 技术分享 will be the sum of the pitch prediction and an innovation signal 技术分享 taken from a fixed codebook, hence the name Code Excited Linear Prediction. The final excitation is given by:

 

 

技术分享

 

 

The quantization of 技术分享 is where most of the bits in a CELP codec are allocated. It represents the information that couldn‘t be obtained either from linear prediction or pitch prediction. In the z-domain we can represent the final signal 技术分享 as 

 

技术分享

 

 

 


Analysis-by-Synthesis and Error Weighting

Most (if not all) modern audio codecs attempt to ``shape‘‘ the noise so that it appears mostly in the frequency regions where the ear cannot detect it. For example, the ear is more tolerant to noise in parts of the spectrum that are louder and vice versa. That‘s why instead of minimizing the simple quadratic error 

 

技术分享

 

 

where 技术分享 is the encoder signal, we minimize the error for the perceptually weighted signal 

 

技术分享

 

 

where 技术分享 is the weighting filter, usually of the form

 

技术分享 (1)

 

with control parameters 技术分享. If the noise is white in the perceptually weighted domain, then in the signal domain its spectral shape will be of the form 

 

技术分享

 

 

If a filter 技术分享 has (complex) poles at 技术分享 in the 技术分享-plane, the filter 技术分享 will have its poles at 技术分享, making it a flatter version of 技术分享.

Analysis-by-synthesis refers to the fact that when trying to find the best pitch parameters (技术分享技术分享) and innovation signal 技术分享, we do not work by making the excitation 技术分享 as close as the original one (which would be simpler), but apply the synthesis (and weighting) filter and try making 技术分享 as close to the original as possible.

 

参考资料:

1 百科总结: https://zh.wikipedia.org/wiki/%E7%A0%81%E6%BF%80%E5%8A%B1%E7%BA%BF%E6%80%A7%E9%A2%84%E6%B5%8B
2 详细介绍: http://ntools.net/arc/Documents/speex/manual/node8.html

Introduction to CELP Coding

标签:region   man   详细   body   only   this   org   air   ima   

原文地址:http://www.cnblogs.com/dylancao/p/7722155.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!