scikit-learn：4.5. Random Projection

时间：2015-07-26 14:19:28 阅读：222 评论：0 收藏：0 [点我收藏+]

标签：

参考：http://scikit-learn.org/stable/modules/random_projection.html

The sklearn.random_projection module 通过trading accuracy（可控的范围）来降维数据，提高效率。实现了两类unstructured random matrix：: Gaussian random matrix and sparse random matrix.

理论基础：the Johnson-Lindenstrauss lemma (quoting Wikipedia)，该引理大概内容为：

In mathematics, the Johnson-Lindenstrauss lemma is a result concerning low-distortion embeddings(低失真嵌入) of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. The map used for the embedding is at least Lipschitz, and can even be taken to be an orthogonal projection(正交投影).

the sklearn.random_projection.johnson_lindenstrauss_min_dim 可以仅通过样本的数量来得到随机子空间的保守最小维度（同时保证向低维空间随机投影时造成的失真是bounded的，estimates conservatively the minimal size of the random subspace to guarantee a bounded distortion introduced by the random projection）：

>>> from sklearn.random_projection import johnson_lindenstrauss_min_dim
>>> johnson_lindenstrauss_min_dim(n_samples=1e6, eps=0.5)
663
>>> johnson_lindenstrauss_min_dim(n_samples=1e6, eps=[0.5, 0.1, 0.01])
array([    663,   11841, 1112658])
>>> johnson_lindenstrauss_min_dim(n_samples=[1e4, 1e5, 1e6], eps=0.1)
array([ 7894,  9868, 11841])

其本质就是一个映射函数而已：

技术分享

Gaussian random matrix ：

The sklearn.random_projection.GaussianRandomProjection reduces the dimensionality by projecting the original input space on a randomly generated matrix where components are drawn from the following distribution N(0, 1/n_components)。

>>> import numpy as np
>>> from sklearn import random_projection
>>> X = np.random.rand(100, 10000)
>>> transformer = random_projection.GaussianRandomProjection()
>>> X_new = transformer.fit_transform(X)
>>> X_new.shape
(100, 3947)

sparse random matrix ：

The sklearn.random_projection.SparseRandomProjection reduces the dimensionality by projecting the original input space using a sparse random matrix.效果和dense Gaussian random projection matrix一样，不过更省内存、计算更快。

If we define s = 1 / density, the elements of the random matrix are drawn from

技术分享

where $技术分享$ is the size of the projected subspace. By default the density of non zero elements is set to the minimum density as recommended by Ping Li et al.: $技术分享$ .

>>> import numpy as np
>>> from sklearn import random_projection
>>> X = np.random.rand(100,10000)
>>> transformer = random_projection.SparseRandomProjection()
>>> X_new = transformer.fit_transform(X)
>>> X_new.shape
(100, 3947)

scikit-learn：4.5. Random Projection

标签：

原文地址：http://blog.csdn.net/mmc2015/article/details/47067003

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行