码迷,mamicode.com
首页 > 其他好文 > 详细

jaccard similarity coefficient 相似度计算

时间:2015-08-08 22:45:01      阅读:279      评论:0      收藏:0      [点我收藏+]

标签:

Jaccard index

From Wikipedia, the free encyclopedia
 
 

The Jaccard index, also known as the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statisticused for comparing the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:

技术分享

(If A and B are both empty, we define J(A,B) = 1.)

技术分享

The MinHash min-wise independent permutations locality sensitive hashing scheme may be used to efficiently compute an accurate estimate of the Jaccard similarity coefficient of pairs of sets, where each set is represented by a constant-sized signature derived from the minimum values of ahash function.

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union:

技术分享

An alternate interpretation of the Jaccard distance is as the ratio of the size of the symmetric difference 技术分享 to the union.

This distance is a metric on the collection of all finite sets.[1][2]

There is also a version of the Jaccard distance for measures, including probability measures. If 技术分享 is a measure on a measurable space 技术分享, then we define the Jaccard coefficient by 技术分享, and the Jaccard distance by 技术分享. Care must be taken if 技术分享 or 技术分享, since these formulas are not well defined in that case.

jaccard similarity coefficient 相似度计算

标签:

原文地址:http://www.cnblogs.com/baiting/p/4713940.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!