码迷,mamicode.com
首页 > 其他好文 > 详细

Mining Diverse Patterns

时间:2015-02-18 21:03:06      阅读:284      评论:0      收藏:0      [点我收藏+]

标签:data   association   pattern   

Mining Diverse Patterns

@(Pattern Discovery in Data Mining)

Mining Multi-level Association Rules

技术分享

The intuition to set hierarchical min_sup: Level-reduced min-support (Items at the lower level are expected to have lower support)

Efficient mining: Shared multi-level mining (Use the lowest min-support to pass down the set of candidates)

Redundancy Filtering at Mining Multi-Level Associations:
* Multi-level association mining may generate many redundant rules
* Redundancy filtering: Some rules may be redundant due to “ancestor” relationships between items
* (Suppose the 2% milk sold is about 1?4 of milk sold in gallons)
1. milk wheat bread [support = 8%, confidence = 70%]
2. 2% milk wheat bread [support = 2%, confidence = 72%]

  • A rule is redundant if its support is close to the “expected” value, according to its “ancestor” rule, and it has a similar confidence as its “ancestor”
  • Rule (1) is an ancestor of rule (2), so rule(2) is to prune.

Customized Min-Supports for Different Kinds of Items
* We have used the same min-support threshold for all the items or item sets to be mined in each association mining
* In reality, some items (e.g., diamond, watch, …) are valuable but less frequent
* It is necessary to have customized min-support settings for different kinds of items
* One Method: Use group-based “individualized” min-support
* E.g., {diamond, watch}: 0.05%; {bread, milk}: 5%; …
* How to mine such rules efficiently?
* Existing scalable mining algorithms can be easily extended to cover such cases

Mining Multi-dimensional Associations

  • Single-dimensional rules (e.g., items are all in “product” dimension)

    • buys(X, “milk”) buys(X, “bread”)
  • Multi-dimensional rules (i.e., items in 2 dimensions or predicates)

    • Inter-dimension association rules (no repeated predicates)
      • age(X, “18-25”) occupation(X, “student”) buys(X, “coke”)
    • Hybrid-dimension association rules (repeated predicates)
      • age(X, “18-25”) buys(X, “popcorn”) buys(X, “coke”)
  • Attributes can be categorical or numerical
    • Categorical Attributes (e.g., profession, product: no ordering among values): Data cube for inter-dimension association
    • Quantitative Attributes: Numeric, implicit ordering among values— discretization, clustering, and gradient approaches

Mining Quantitative Associations

技术分享
技术分享

Mining Negative Correlations

  • Rare Pattern vs. Negative Pattern
    技术分享

  • Defining Negative Correlated Patterns

  • Support-based definition
    技术分享

  • Kulczynski measure-based difinision
    技术分享

  • Exercise
    技术分享

Mining Compressed Patterns

Given a table of patterns and their supports:
技术分享

Why mining compressed patterns? Since there are too many scattered patterns but not so meaningful.

We can find that P1 and P2 are similar both in item-sets and support, and so do P1 and P5 with similar item-sets. But how to compressed those similar patterns?

We can also analyze about it that:
* Closed patterns
* P1, P2, P3, P4, P5(all have no identical supports)
* Emphasizes too much on support
* There is no compression
* Max-patterns
* P3: information loss
* Desired output (a good balance):
* P2, P3, P4

So we can define some compressing method

  1. pattern distance measure

    Dist(P1,P2)=1?|T(P1)T(P2)||T(P1)T(P2)|

    • δ-clustering: For each pattern P, find all patterns which can be expressed by P and whose distance to P is within δ (δ-cover)
    • All patterns in the cluster can be represented by P
    • Method for efficient, direct mining of compressed frequent patterns (e.g., Xin et al., VLDB’05)
  2. Redundancy-Aware Top-k Patterns
    技术分享

Mining Colossal Patterns

技术分享

技术分享

技术分享

技术分享

技术分享

技术分享

技术分享

技术分享

Mining Diverse Patterns

标签:data   association   pattern   

原文地址:http://blog.csdn.net/rk2900/article/details/43878111

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!