标签:reject rate plot orm from color 大于 特征 nal
Generate a random multilabel classification problem.
In the above process, rejection sampling is used to make sure that n is never zero or more than n_classes, and that the document length is never zero. Likewise, we reject classes which have already been chosen.
在上面的过程中,为确保n不为0或不超过变量n_classes,且文本长度不为0,采用拒绝抽样的方法。同样的,我们拒绝已经选择的类。
| Parameters: | n_samples : int, optional (default=100) 
 n_features : int, optional (default=20) 
 n_classes : int, optional (default=5) 
 n_labels : int, optional (default=2) 
 length : int, optional (default=50) 
 allow_unlabeled : bool, optional (default=True) 
 sparse : bool, optional (default=False) 
 return_indicator : ‘dense’ (default) | ‘sparse’ | False 
 return_distributions : bool, optional (default=False) 
 random_state : int, RandomState instance or None, optional (default=None) 
 | 
|---|---|
| Returns: | X : array of shape [n_samples, n_features] 
 Y : array or sparse CSR matrix of shape [n_samples, n_classes] 
 p_c : array, shape [n_classes] 
 p_w_c : array, shape [n_features, n_classes] 
 | 
官网教程:
"""
==============================================
Plot randomly generated multilabel dataset【绘制随机生成的多标签数据集】
==============================================
This
 illustrates the `datasets.make_multilabel_classification` dataset 
generator. Each sample consists of counts of two features (up to 50 in 
total), which are differently distributed in each of two classes.Points 
are labeled as follows, where Y means the class is present:
【数据集生成器“datasets.make_multilabel_classification”说明:】
    =====  =====  =====  ======
      1      2      3    Color
    =====  =====  =====  ======
      Y      N      N    Red
      N      Y      N    Blue
      N      N      Y    Yellow
      Y      Y      N    Purple
      Y      N      Y    Orange
      Y      Y      N    Green
      Y      Y      Y    Brown
    =====  =====  =====  ======
A
 star marks the expected sample for each class; its size reflects the 
probability of selecting that class label.【一颗星星标志着每个类标签的预期样本,它的大小反映了
选择该类标签的概率。】
The
 left and right examples highlight the ``n_labels`` parameter: more of 
the samples in the right plot have 2 or 3 labels.Note that this 
two-dimensional example is very degenerate:generally the number of 
features would be much greater than the "document length", while here we
 have much larger documents than vocabulary.
Similarly, with ``n_classes > n_features``, it is much less likely that a feature distinguishes a particular class.
【左右两幅图显示“n_labels”的参数;右边的大多数样本有2到3个标签。注意,这个二维的样本是非常退化的:通常,特征的总数比“文本”的总数要多,但是在这里,我们的文本长度大于词汇数。类似地,因为``n_classes(3)> n_features(2)``,特征不太可能区分特定的类】
"""
sklearn学习:make_multilabel_classification——多标签数据集方法
标签:reject rate plot orm from color 大于 特征 nal
原文地址:http://www.cnblogs.com/openAI/p/7450158.html