语音识别-background

时间：2016-02-10 08:45:16 阅读：274 评论：0 收藏：0 [点我收藏+]

标签：

http://blog.csdn.net/zouxy09/article/details/9156785/

http://www.zhihu.com/question/20962240

Pipeline

1. collect speech and silence (aka background ambient noise)

AudioCollector - record audio data and save it as raw audio file.

-If the volume of the audio recording is either too high or too low,

adjust the audio gain by tweaking m RecordingGain in MicrophoneRecorder.java

adjust the audio gain by tweaking m RecordingGain in MicrophoneRecorder.java.

2.Audacity - label the data

3.AudioFeatureExtractor - read raw audio file from the collector and a label file from Audacity to obtain an “features.arff” file that is used by Weka to generate a speech detection classifier

4.once you have generated your speech detection classifier you need to build the actual signal processing pipeline with your new classifier. You need to add your pipeline into the audio collector, or your lab 2 app.

the MFCC computation pipeline（

Mel‐frequency Spectral Coefficients）

1.window:20-40ms,特征相对稳定

2.Power Spectrum: 计算每一帧的power spectrum, 识别每一帧中的主要频率

3.Apply Mel Fliterbank: 频率越高，相近频率越难区分；combine the frequency spectrum into bins that is similar to how our ear perceive voice；模拟人耳

The first filter is very narrow and gives an indication of how much energy exists near 0 Hertz where human hearing is very sensitive to variations. As the frequencies get higher our filters get wider as we become less concerned about variations.

covert from frequency to Mel scale

4. Logarithm of the Mel filterbank. 响度与能量成对数关系

5. DCT of the log filter bank: there are a lot of correlations between the log filterbank energies, and this step tries to extract the most useful and independent features.

13 MFCC coefficients; Together, you get a 39 element acoustic vector that are the core features used in speech processing algorithms.

Other audible features: characteristics of sound, eg. prosody

1.Pitch

2.Intensity

3.Temporal Aspects

4.Voice Quality

(glottal waveform)

5.Spectrogram: describe the energy distribution across frequency bands (speak dependent and emotion related)

Speech processing

识别音素的组合

use a Hidden Markov Model (HMM).

a HMM for each letter in the alphabet tries to look for a sequence of phonemes

Diagnose for mental illness

Gaussian Mixture Models (GMM): a clustering approach

可以用某一种潜在的高斯分布描述每个cluster

Monitoring Affect with a Mobile Phone

1.Cellphone monitoring of healthy subjects as part of a healthcare package

2.Subsidized “callingcard" number for atrisk populations

3.Monitoring of humancomputer speech interfaces and interpersonal speech for elders in assisted or independent care

4.Monitoring our stress in everyday lives

5.Monitoring Social Interactions (or lack of it)

语音识别-background

标签：

原文地址：http://www.cnblogs.com/Ruizhen/p/5185841.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行