标签:dom for can 排除 fas 建立 href build ...
input: k, set of n points
place k centroids at random locations 随机选
--for each point i:
--for each cluster j:
(attribute不能是categorical or ordinal必须是numeric)
1 k-means++(using adaptive sampling scheme) :slow but samll error ; 随机选择:extremely fast,large error
2AFK-MC2: using Markov chain improving k-means++
paper :https://las.inf.ethz.ch/files/bachem16fast.pdf
Initial data points are states in the Mchain
a further data point is sampled to act as the candidate for the next state
randomized decision determines whether the Mchain transitions to the candidate or whether it remains old state
repeat and the last state returned as the initial cluster center
参数:epsilon =0 //threshold,minimun error used in stop condition
history_centroids = []
configuration记录:num_instances,num_features = dataset.shape
初始:prototype = dataset[np.random.randint(0,num_instances-1,size =k)]
np.ndarray num_instances个[],每个[]中num_features个元素,存放centroid:prototypes_old = np.zeros(prototype.shape)
存放cluster:belongs_to=np.zeros((num_instances,1))
4. 迭代:
while norm>epsilon:
iteration+=1
norm = dist_method(prototype,prototype_old) //用来看是否停止,迭代前后的变化
for index_in,instance in enumrate(dataset):
dist_vec = np.zeros((k,1))
for index_prototype,prototype in enumrate(prototypes):
dist_vec[index_prototype] =dist_method[prototype,instance]
belongs_to[index_in,0]=np.argmin(dist_vec)
tmp_prototype = np.zeros((k,num_features))
for .....(cluster)
sample and approximation approaches: 效果不好,当k增大分类更糟。
initial centroid selection:(seedling smarter): like ‘blaklist‘ 、‘Elkan‘s‘ 、‘Hamerly‘s‘ algorithm
在data上建立一个tree,在所有centroid上迭代,排除一些。
setup cost O(nlgn) to build tree, computation worst:O(knlgn) , memory O(k+nlgn)
计算centroids之间距离,平衡points和centroid的距离来减少距离计算
no setup costs,worst O(k^2+kn) memory O(k^2+kn)
paper: http://www.ratml.org/pub/pdf/2016dual.pdf
访问pair(T Q节点的组合) no more than once并对combination计算给出score
if score>bound or infinite, the combination is pruned。否则计算Tnode的每个点和Qnode的每个点,而不是计算每个descendant point之间score
直到tree只有叶子的时候,call base case
!!:dual-tree algorithm = space tree+pruning dual-tree traversal+BaseCase() Score()
进一步理解见link
标签:dom for can 排除 fas 建立 href build ...
原文地址:http://www.cnblogs.com/yumanman/p/7580049.html