码迷,mamicode.com
首页 > 其他好文 > 详细

Pattern Discovery Basic Concepts

时间:2015-02-16 15:38:52      阅读:250      评论:0      收藏:0      [点我收藏+]

标签:模式   pattern   

Pattern Discovery Basic Concepts

@(Pattern Discovery in Data Mining)[Pattern Discovery]
本文介绍了基本的模式挖掘的概念

Pattern: A set of items, subsequences, or substructures that occur
frequently together (or strongly correlated) in a data set.

Motivation to do pattern discovery in data:
* To find what may be bought after one/some goods by customer;
* To find what code segment may likely contain copy/paste bugs;
* To find what kind of events may happen after some news posted;
* What products were often purchased together?
* What are the subsequent purchases after buying an iPad?
* What code segments likely contain copy-and-paste bugs?
* What word sequences likely form phrases in this corpus?
* …

In conclusion, pattern discovery is important because
* Finding inherent regularities in a data set
* Foundation for many essential data mining tasks
* Association, correlation, and causality analysis
* Mining sequential, structural (e.g., sub-graph) patterns
* Pattern analysis in spatiotemporal, multimedia, time-series, and stream data
* Classification: Discriminative pattern-based analysis
* Cluster analysis: Pattern-based subspace clustering
* Broad applications
* Market basket analysis, cross-marketing, catalog design, sale campaign analysis, Web log analysis, biological sequence analysis

TODO: 上述具体应用

Frequent Pattern and Association Rule

Itemset: A set of one or more items
k-itemset: X=x1,...,xk
(absolute) support (count) of X: Frequency or the number of occurrences of an itemset X
(relative) support, s: The fraction of transactions that contains X (i.e., the probability that a transaction contains X)
frequent pattern: An itemset X is frequent if the support of X is no less than a minsup threshold (denoted as σ)

技术分享
技术分享

association rule: XY(s,c)
* support s: The probability that a transaction contains XY.
* confidence c: The conditional probability that a transaction containing X also contains Y
* c(XY)=sup(XY)/sup(X)

Association rule mining: Find all of the rules, XY, with minimum support and confidence.

技术分享

Drawbacks of Frequent Pattern: too many
So we need a compression method.

Closed Pattern & Max Pattern

Closed patterns: A pattern (itemset) X is closed if X is frequent, and there exists no super-pattern Y?X, with the same support as X.
* Closed pattern is a lossless compression of frequent patterns
* Reduces the # of patterns but does not lose the support information!

Notion: Here lossless means that given the set of closed frequent patterns, we can not only find the set of max frequent patterns, but also recover the set of all frequent patterns and their support.

Max-patterns: A pattern X is a max-pattern if X is frequent and there exists no frequent super-pattern Y?X
* Max-pattern is a lossy compression!

Frequent Pattern Support closed pattern max pattern
Beer, Nuts, Diaper 10 Y N
Beer, Coffee, Diaper, Nuts 20 Y Y
Beer, Diaper, Eggs 30 N N
Beer, Nuts, Eggs, Milk 40 Y N
Beer, Nuts, Diaper, Eggs, Milk 30 Y Y

R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases”, in Proc. of SIGMOD’93
R. J. Bayardo, “Efficiently mining long patterns from databases”, in Proc. of SIGMOD’98
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering frequent closed itemsets for association rules”, in Proc. of ICDT’99
J. Han, H. Cheng, D. Xin, and X. Yan, “Frequent Pattern Mining: Current Status and Future Directions”, Data Mining and Knowledge Discovery, 15(1): 55-86, 2007

Pattern Discovery Basic Concepts

标签:模式   pattern   

原文地址:http://blog.csdn.net/rk2900/article/details/43851743

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!