Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription
审稿意见
Posted on behalf of anonymous ICML reviewer.
Summary:
The papers main strengths are the generality of the proposed
methods, their clear improvement of the state of the art, the clarity of the
presentation, and the thoroughness of the experimental results.
Perhaps the greatest weakness is the degree of novelty: the only new model is
the RNN-RBM which is an incremental change from an RTRBM. However some of the
model-combinations (such as RNN-NADE) appear to be novel, as is the application
to
music.
--------------------------------------------------------
Detailed
Comments:
The paper addresses the problem of modelling polyphonic musical
notes. More specifically it aims to learn a predictive distribution over musical
notes at the next timestep, given those at previous timesteps. As well as being
interesting in its right, the task is compelling because it combines
(relatively) high dimensional, unconstrained density modelling with long-range
sequence modelling. Unlike most existing approaches (which use domain-specific
representations to reduce the output space) the proposed system allows any note
to be emitted at any time, which suggests that it will transfer well to other
kinds of high-dimensional sequence.
The paper also demonstrates that the predictive model can be used to improve the performance of polyphonic transcription from audio data - the musical equivalent of combining a language model with an audio model in speech recognition. The combination is performed in a somewhat ad-hoc manner, and requires the audio data to be preprocessed. However it clearly outperforms the baseline HMM smoothing model.
The system is based on various flavour of recurrent neural network. This is attractive for several reasons: RNNs place no restrictions on the amount of previous context used; they can be easily adapted to different kinds of sequential data; and they allow the density modelling and the sequence modelling to be jointly optimised. As well as evaluating several RNN variants found in the literature, the paper introduces a new RNN: the RNN-RBM. Although this only a slight modification of an existing architecture (the RTRBM), the change is clearly motivated and its advantages are demonstrated by the experiments.
The experimental results in Section 6 are exceptionally thorough, comparing the six proposed variants of RNN models with seven common baselines across four different datasets. This makes it possible to tease out the relative pros and cons of the different architectures. I also liked the comparison with the non-temporal frame-level models, which clearly shows how much benefit the sequential models bring. The best proposed systems outperform the baselines on all the datasets, often by a large margin.
The paper is very clearly written, well organized, and commendably concise, given the large amount of material covered.
Suggestions:
Abstract:
‘in a completely general representation‘ doesn‘t say much.
Either remove this or reword it to be more specific.
‘, that is appropriate
to discover temporal dependencies in such high-dimensional sequences with the
help of Hessian-free optimization and pretraining techniques.‘ -> ‘that is
able to discover temporal dependencies in high-dimensional sequences.‘
Introduction:
‘designed for the multiple classification task‘ Does this
mean 1-of-K classification? Please clarify.
Section 2:
line 217. If possible, put the exact gradient into the paper to
keep it self-contained.
Section 4.1
Is the cross entropy cost in equation (12) the objective
function used for the RNN and RNN (HF) models in Section 6? Please clarify.
Section 5
What does Figure 3 add besides an obligatory receptive field
picture? I think the space could be better used.
Section 6:
Line 560: What does ‘transposed in a common tonality‘
mean?
Lines 592-598: This sentence isn‘t clear to me. Which datasets are
complex and which are simple, and why?
line 629: There‘s an additional n in
‘additionnal‘
line 632: Why do you say RNN-NADE is more robust? RNN-RBM has
the best log-loss on 2/4 datasets and the best accuracy on 3.
Table 1:
Put the best score in each column in bold. Maybe also reorder the table
according to some overall performance measure... average log-loss?
How big
was N for the note N-gram and N-gram models, and how many different N-grams were
needed? I would like to see this recorded for each dataset.
Conclusions:
Make sure you take out the bold text for the final
version.
原文链接
http://icml.cc/discuss/2012/590.html
[转载]审稿意见分享Modeling Temporal Dependencies in High,布布扣,bubuko.com
[转载]审稿意见分享Modeling Temporal Dependencies in High
原文地址:http://www.cnblogs.com/huashiyiqike/p/3740195.html