TopicModel主题模型 - LDA的缺陷和改进

时间：2015-04-27 11:14:29 阅读：379 评论：0 收藏：0 [点我收藏+]

http://blog.csdn.net/pipisorry/article/details/45307369

LDA limitations: what’s next?

Although LDA is a great algorithm for topic-modelling, it still has some limitations, mainly due to the fact that it’s has become popular and available to the mass recently.

One major limitation is perhaps given by its underlying unigram text model: LDA doesn’t consider themutual position of the words in the document. Documents like “Man, I love this can” and “I can love this man” are probably modelled the same way. It’s also true that for longer documents, mismatching topics is harder. To overcome this limitation, at the cost of almost square the complexity, you can use 2-grams (or N-grams)along with 1-gram.

Another weakness of LDA is in the topics composition: they’re overlapping. In fact, you can find thesame word in multiple topics(the example above, of the word “can”, is obvious). The generated topics, therefore, are not independent andorthogonal(正交的) like in a PCA-decomposed basis, for example. This implies that you must pay lots of attention while dealing with them (e.g. don’t usecosine similarity).

For a more structured approach - especially if the topic composition is very misleading - you might consider thehierarchical variation of LDA, named H-LDA, (or simply Hierarchical LDA). In H-LDA, topics are joined together in a hierarchy by using a Nested Chinese Restaurant Process (NCRP). This model is more complex than LDA, and the description is beyond the goal of this blog entry, but if you like to have an idea of the possible output, here it is. Don’t forget that we’re still in theprobabilistic world: each node of the H-DLA tree is a topic distribution.

技术分享

[http://engineering.intenthq.com/2015/02/automatic-topic-modelling-with-lda/]

from:http://blog.csdn.net/pipisorry/article/details/45307369

ref:

TopicModel主题模型 - LDA的缺陷和改进

标签：lda 主题模型 python topic model topic

原文地址：http://blog.csdn.net/pipisorry/article/details/45307369

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行