码迷,mamicode.com
首页 > 其他好文 > 详细

语音识别-background

时间:2016-02-10 08:45:16      阅读:274      评论:0      收藏:0      [点我收藏+]

标签:

 
 

 Pipeline
1. collect speech and silence (aka background ambient noise)
AudioCollector - record audio data and save it as raw audio file.
-If the volume of the audio recording is either too high or too low,
adjust the audio gain by tweaking m RecordingGain in MicrophoneRecorder.java
 
adjust the audio gain by tweaking m RecordingGain in MicrophoneRecorder.java.
 
2.Audacity - label the data
 
3.AudioFeatureExtractor - read raw audio file from the collector and a label file from Audacity to obtain an “features.arff” file that is used by Weka to generate a speech detection classifier ­­ 
 
4.once you have generated your speech detection classifier you need to build the actual signal processing pipeline with your new classifier. You need to add your pipeline into the audio collector, or your lab 2 app. 
 

the MFCC computation pipeline(
Mel‐frequency Spectral Coefficients)
技术分享
1.window:20-40ms,特征相对稳定
2.Power Spectrum: 计算每一帧的power spectrum, 识别每一帧中的主要频率
3.Apply Mel Fliterbank: 频率越高,相近频率越难区分;combine the frequency spectrum into bins that is similar to how our ear perceive voice;模拟人耳
The first filter is very narrow and gives an indication of how much energy exists near 0 Hertz where human hearing is very sensitive to variations. As the frequencies get higher our filters get wider as we become less concerned about variations.
covert from frequency to Mel scale
技术分享
4. Logarithm of the Mel filterbank. 响度与能量成对数关系
5. DCT of the log filter bank:  there are a lot of correlations between the log filterbank energies, and this step tries to extract the most useful and independent features.
13 MFCC coefficients; Together, you get a 39 element acoustic vector that are the core features used in speech processing algorithms.
 
Other audible features: characteristics of sound, eg. prosody
1.Pitch
2.Intensity
3.Temporal Aspects
4.Voice Quality
技术分享
(glottal waveform)
5.Spectrogram: describe the energy distribution across frequency bands (speak dependent and emotion related)
 

Speech processing
识别音素的组合
use a Hidden Markov Model (HMM).
a HMM for each letter in the alphabet tries to look for a sequence of phonemes
 
Diagnose for mental illness
Gaussian Mixture Models (GMM): a clustering approach
可以用某一种潜在的高斯分布描述每个cluster
 
Monitoring Affect with a Mobile Phone
1.Cell­phone monitoring of healthy subjects as part of a health­care package
2.Subsidized “calling­card" number for at­risk populations
3.Monitoring of human­computer speech interfaces and interpersonal speech for elders in assisted or independent care
4.Monitoring our stress in everyday lives
5.Monitoring Social Interactions (or lack of it)

语音识别-background

标签:

原文地址:http://www.cnblogs.com/Ruizhen/p/5185841.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!