Single Hidden Layer Neural Network, thus Multi-layer Perceptron
(Classifier)
Audio Preprocessing
Feature: PMSC (Principal Mel-Spectrum
Components)
Original Data:
30s, 22.05KHz, mono, wav
Process
Steps:
DFT (spectral
domain) we compute DFTs over windows of 1024
samples on audio at 22.05 KHz (i.e. roughly 46ms) with a frame step
of 512
samples.
Mel-Compression we
run the spectral amplitudes through a set of 256
mel-scaled triangular filters to abtain a set of spectral energy
bands.
Principal Component
analysis whitening (PCA whitening) we compute the principal components of
a random sub-sample of training set. In order to obtain features with
unitary variance, we multiply(乘以) each component by the inverse square of
its eigenvalue(特征值平方的倒数). ---- PCA whitening.
Model
PFC (Pooled Features
Classifier)
Pooling Operation the model applies a given set of pooling functions
(how many?) to the PMSC features, and sends the pooled features to a
classifier(MLP, with hidden layer of 2000 units, sigmoid activation, L2 weight
decay and cross-entropy cost).
Classify each pooling window is considered as a training example for
the classifier, and average the predictions of the classifier over all the
windows of a given clip to obtain the final classification (what is the
rule?).
Tasks
Classification (train/test task) the MLP outputs an affinity prediction
for each class (pooling functions tread each pooling window as a training
example).
Tagging
Affinity the
affinity scores for a song is
thus directly the output of the MLP.
Binary Classification choose the threshold that optimizes the
F1-score on the validation set.
Tools
Theano: Theano is
a numerical computation library for Python. In
Theano, computations are expressed using a NumPy-like
syntax and compiled to
run efficiently on either CPU or GPU architectures.