标签:
Introduce probabilistic language modeling
let‘s thinking about the probability of sentence.
$P(w_1, w_2, w_3, w_4) \\= P(w_1, w_2, w_3) * P(w_4| w_1, w_2, w_3) \\= P(w_1, w_2) * P(w_3| w_1, w_2) * P(w_4| w_1, w_2, w_3) \\= P(w_1) * P(w_2| w_1) * P(w_3| w_1, w_2) * P(w_4| w_1, w_2, w_3)$
Markov Assumption
$ P(w_1, w_2, w_3, w_4...w_n) = P(w_n) * P (w_n| w_k, w_{k+1}\cdots w_n)$
for example
$P(w_1, w_2, w_3, w_4)\\= P(w_1, w_2, w_3) * P(w_4|w_3)\\= P(w_1, w_2) * P(w_3|w_2) * P(w_4| w_3)\\= P(w_1) * P(w_2| w_1) * P(w_3|w_2) * P(w_4|w_3)$
Then it is easier.
And there are N-grams, 3-grams, 4-grams.
标签:
原文地址:http://www.cnblogs.com/ZJUT-jiangnan/p/4409959.html