码迷,mamicode.com
首页 > 其他好文 > 详细

机器学习学习笔记 PRML Chapter 2.0 : Prerequisite 2 -Singular Value Decomposition (SVD)

时间:2016-07-06 01:53:45      阅读:332      评论:0      收藏:0      [点我收藏+]

标签:

Chapter 2.0 : Prerequisite 2 -Singular Value Decomposition (SVD)

 
 

Chapter 2.0 : Prerequisite 2 -Singular Value Decomposition (SVD)

 

Christopher M. Bishop, PRML, Chapter 2 Probability Distributions

1. Vector Terminology

  • Orthogonality
    Two vectors 技术分享 and 技术分享 are said to be orthogonal to each other if their inner product equals zero, i.e.,
    技术分享
  • Normal Vector
    A normal vector (or unit vector ) 技术分享 is a vector of length 1, i.e.,

    技术分享

     

  • Orthonormal Vectors
    Vectors of unit length that are orthogonal to each other are said to be orthonormal.

2. Matrix Terminology

2.1 Orthogonal Matrix

A matrix 技术分享 is orthogonal if

技术分享
where 技术分享 is the identity matrix.

 

2.2 Eigenvectors and Eigenvalues

An eigenvector is a nonzero vector that satisfies the equation

技术分享
where 技术分享 is a square matrix,

 

  • the scalar 技术分享 is an eigenvalue, and
  • 技术分享 is the eigenvector.

Eigenvalues and eigenvectors are also known as, respectively, characteristic roots(特征值) and characteristic vectors(特征向量), or latent roots and latent vectors.

THE KEY IDEAS [see Ref-7]:

  • 技术分享 says that eigenvectors 技术分享 keep the same direction when multiplied by 技术分享.
  • 技术分享 also says that 技术分享. This determines 技术分享 eigenvalues.
  • The eigenvalues of 技术分享 and 技术分享 are 技术分享 and 技术分享, respectively, with the same eigenvectors.
  • The sum of the 技术分享’s equals the sum down the main diagonal of 技术分享 (the trace), i.e.,
    技术分享
  • The product of the 技术分享’s equals the determinant, i.e.,
    技术分享

2.3 Understanding eigenvectors and eigenvalues in terms of transformation and the corresponding matrix [see Ref-9]

In linear algebra, an eigenvector or characteristic vector of a linear transformation 技术分享 from a vector space 技术分享 over a field 技术分享 into itself is a non-zero vector that does not change its direction when that linear transformation is applied to it. In other words, if 技术分享 is a vector that is not the zero vector, then it is an eigenvector of a linear transformation 技术分享 if 技术分享 is a scalar multiple of 技术分享. This condition can be written as the mapping

技术分享
where 技术分享 is a scalar in the field 技术分享, known as the eigenvalue or characteristic value associated with the eigenvector 技术分享.

 

If the vector space 技术分享 is finite-dimensional, then the linear transformation 技术分享 can be represented as a square matrix 技术分享, and the vector 技术分享 by a column vector, rendering the above mapping as a matrix multiplication on the left hand side and a scaling of the column vector on the right hand side in the equation

技术分享

 

There is a correspondence between 技术分享 by 技术分享 square matrices and linear transformations from an n-dimensional vector space to itself. For this reason, it is equivalent to define eigenvalues and eigenvectors using either the language of matrices or the language of linear transformations.

Geometrically, an eigenvector corresponding to a real, nonzero eigenvalue points in a direction that is stretched by the transformation and the eigenvalue is the factor by which it is stretched. If the eigenvalue is negative, the direction is reversed.

It can be shown in the following figure, where matrix 技术分享 acts by stretching the vector 技术分享, not changing its direction, so 技术分享 is an eigenvector of 技术分享.

技术分享

3. Singular Value Decomposition

3.1 Understanding of SVD

Singular value decomposition (SVD) can be looked at from three mutually compatible points of view.

  • 1) a method for transforming correlated variables into a set of uncorrelated ones that better expose the various relationships among the original data items.
  • 2) a method for identifying and ordering the dimensions along which data points exhibit the most variation.
  • 3) a method for data reduction, since once we have identified where the most variation is, it’s possible to find the best approximation of the original data points using fewer dimensions.

3.2 Statement of the SVD Theorem

SVD is based on a theorem from linear algebra which says that a rectangular matrix 技术分享 can be broken down into the product of three matrices:

  • an orthogonal matrix 技术分享(i.e., 技术分享);
  • a diagonal matrix 技术分享;
  • the transpose of an orthogonal matrix 技术分享 (i.e, 技术分享).

The theorem is usually presented something like this:

技术分享

 

  • assuming 技术分享 [see Ref-4 for this figure]:

技术分享

  • assuming 技术分享 [see Ref-4 for this figure]: 技术分享

  • The columns of 技术分享 and the columns of 技术分享 are called the left-singular vectors and right-singular vectors of 技术分享 , respectively.

  • The columns of 技术分享 are orthonormal eigenvectors of 技术分享.
    There is a brief proof. Let 技术分享, where the column vector 技术分享, for 技术分享, with 技术分享.

    技术分享
    Firstly, to calculate the product of 技术分享:
    技术分享
    The LHS of (3.1) equals:
    技术分享
    Substitute (3.2) into (3.1) to generate the RHS of (3.1):
    技术分享
    You can testify the second line of (3.4) by listing all the elements of the column vectors, and doing matrix production based on the matrix product rule. Therefore (3.3) and (3.4) give us the following euqation
    技术分享

     

  • Similarly, we can prove that the columns of 技术分享 are orthonormal eigenvectors of 技术分享,

    技术分享

     

  • 技术分享 is a diagonal matrix containing the square roots of non-zero eigenvalues of both 技术分享 and 技术分享. A common convention is to list the singular values in descending order. In this case, the diagonal matrix 技术分享 is uniquely determined by 技术分享 (though not the matrices 技术分享 and 技术分享).

    技术分享
    assuming 技术分享, with 技术分享, where 技术分享 is called the singular values of the matrix 技术分享.

     

  • 技术分享 is the rank of matrix 技术分享, i.e., 技术分享, where 技术分享, 技术分享 means the range of 技术分享, that is the set of possible linear combinations of the columns of 技术分享.

Some Conclusion and Simple Proof:

Let 技术分享, where 技术分享, for 技术分享; and 技术分享, where 技术分享, for 技术分享.

技术分享
技术分享
where 技术分享.
Similarly, we have
技术分享
where 技术分享.
That is, the columns of 技术分享 and 技术分享 are orthonormal vectors, respectively.

 

3.3 An example of SVD:

 

技术分享

 

  • To calculate 技术分享 via finding the eigenvalues and corresponding eigenvectors of 技术分享, to give
    技术分享
  • To calculate 技术分享 via finding the eigenvalues and corresponding eigenvectors of 技术分享, to give
    技术分享
  • S =
    技术分享
  • SVD result is
    技术分享

3.4 Intuitive Interpretations of SVD [see Ref-5]

1) Points in d-dimension Space:

To gain insight into the SVD, treat the rows of an 技术分享 (here we use 技术分享 instead of 技术分享, since it is common to be used to represent those n points of d-dimension) matrix 技术分享 as 技术分享 points in a d-dimensional space.

 

技术分享
is equivalent to
技术分享
where the inner product 技术分享 means the projection of point 技术分享 (represented by column vector 技术分享, i.e., the 技术分享 row of matrix A) onto the line along which 技术分享 is a unit vector.

 

2) The Best Least Squares Fit Problem:

Consider the problem of finding the best k-dimensional subspace with respect to the set of points. Here “best” means minimize the sum of the squares of the perpendicular distances of the points to the subspace. We begin with a special case of the problem where the subspace is 1-dimensional, a line through the origin. We will see later that the best-fitting k-dimensional subspace can be found by k applications of the best fitting line algorithm (i.e., 应用k次1-dim直线fitting即可得到the fitting k-dim subspace). Finding the best fitting line through the origin with respect to a set of points 技术分享 in the plane means minimizing the sum of the squared distances of the points to the line. Here distance is measured perpendicular to the line (the corresponding problem is called the best least squares fit), or more often measured vertical in the y direction, to the subspace of 技术分享 (with the corresponding problem - least squares fit).

Returning to the best least squares fit problem, consider projecting a point 技术分享 onto a line through the origin. Then based on the following figure 技术分享
we can get

技术分享

 

From (3.9) and the observation that 技术分享 is a constant ( i.e., independent of the line), we get the equivalence

 

技术分享
So minimizing the sum of the squares of the distances is equivalent to maximizing the sum of the squares of the lengths of the projections onto the line. This conclusion helps to introduce the subsequent definition of singular vectors.

 

3) Singular Vectors and Singular Values:

  • Singular Vectors: Consider the rows of 技术分享 as 技术分享 points in a d-dimensional space. Consider the best fit line through the origin. Let 技术分享 be a unit vector along this line. The length of the projection of 技术分享 (i.e., the 技术分享 row of 技术分享) onto 技术分享 is 技术分享. From this we see that the sum of length squared of the projections is 技术分享. The best fit line is the one maximizing 技术分享 and hence minimizing the sum of the squared distances of the points to the line.
  • The First Singular Vector: With this in mind, define the first singular vector, 技术分享 of 技术分享, which is a column vector, as the best fit line through the origin for the 技术分享 points in d-space that are the rows of 技术分享. Thus

    技术分享

     

  • The First Singular Value: The value 技术分享 is called the first singular value of 技术分享. Note that 技术分享 is the sum of the squares of the projections of the points to the line determined by 技术分享.

  • The Second Singular Vector: The second singular vector 技术分享, is defined by the best fit line perpendicular to 技术分享

    技术分享

     

  • The Second Singular Value: The value 技术分享 is called the second singular value of 技术分享. Note that 技术分享 is the sum of the squares of the projections of the points to the line determined by 技术分享.

  • The Third Singular Vector: The third singular vector 技术分享 is defined similarly by
    技术分享
  • The process stops when we have found 技术分享 as singular vectors and
    技术分享
    where 技术分享, i.e, there exist at most 技术分享 linearly independent eigenvectors.

4) The Frobenius norm of A:

Consider one row, say 技术分享 of matrix 技术分享. Since 技术分享 span the space of all rows of 技术分享, 技术分享 0 for all 技术分享 perpendicular to 技术分享. Thus, for each row 技术分享, 技术分享. Summing over all rows,

技术分享
But
技术分享
that is the sum of squares of all the entries of 技术分享. Thus, the sum of squares of the singular values of 技术分享 is indeed the square of the “whole content of 技术分享”, i.e., the sum of squares of all the entries. There is an important norm associated with this quantity, the Frobenius norm of 技术分享, denoted by 技术分享, defined as
技术分享
It is shown is the following lemma:

 

技术分享

技术分享

 

3.5 Intuitive Interpretations of SVD [see Ref-6]

技术分享

1) The image shows:

  • Upper Left: The unit disc with the two canonical unit vectors.
  • Upper Right: Unit disc transformed with M and singular Values 技术分享 and 技术分享 indicated.
  • Lower Left: The action of 技术分享 on the unit disc. This is just a rotation. Here 技术分享 means conjugate transpose.
  • Lower Right: The action of 技术分享 on the unit disc. 技术分享 scales in vertically and horizontally.
    In this special case, the singular values are 技术分享 and 技术分享 where 技术分享 is the Golden ratio, i.e.,
    技术分享
    技术分享 is a (counter clockwise) rotation by an angle 技术分享 where 技术分享 satisfies 技术分享. 技术分享 is a rotation by an angle 技术分享 with 技术分享.

2) Singular values as semiaxes of an ellipse or ellipsoid:

As shown in the figure, the singular values can be interpreted as the semiaxes of an ellipse in 2D. This concept can be generalized to n-dimensional Euclidean space, with the singular values of any 技术分享 square matrix being viewed as the semiaxes of an n-dimensional ellipsoid. See below for further details.

3) The columns of U and V are orthonormal bases:

Since 技术分享 and 技术分享 are unitary, the columns of each of them form a set of orthonormal vectors, which can be regarded as basis vectors. The matrix 技术分享 maps the basis vector 技术分享 to the stretched unit vector 技术分享 . By the definition of a unitary matrix, the same is true for their conjugate transposes 技术分享 and 技术分享, except the geometric interpretation of the singular values as stretches is lost. In short, the columns of 技术分享, 技术分享, 技术分享 and 技术分享 are orthonormal bases.

4. Expansion of eigenvalues and eigenvectors [see Ref-8]

Problem - PRML Exercise 2.19:

Show that a real, symmetric matrix 技术分享 satisfying the eigenvector equation 技术分享 cam be expressed as an expansion of its eigenvalues and eigenvectors of the following form

技术分享
and similarly, the inverse 技术分享 can be expressed as
技术分享

 

Solution:

1) Lemma 4-1: 实对称矩阵正交相似于对角矩阵。即: 技术分享为实对称方阵 技术分享 技术分享正交矩阵技术分享, such that

 

技术分享
or due to 技术分享, equivalent equations include
技术分享
and
技术分享

 

2) Lemma 4-2: Matrix 技术分享 and 技术分享 are identical if and only if for all vectors 技术分享, 技术分享. That is,

 

技术分享

 

3) Proof:

The proof of (4.1) and (4.2) use (4.5) and (4.6). For any column vector 技术分享,
we have

技术分享

 

Since the inner product 技术分享 in (4.7) is a scalar, and 技术分享 is also a scalar, therefore we can change the order of the terms,

 

技术分享
Thus applying the Lemma 2 shown in (4.6) to (4.8), we can prove (4.1).

 

Since 技术分享, inverting both sides gives 技术分享, and hence 技术分享. Applying the above result to 技术分享, noting that 技术分享 is just the diagonal matrix of the inverses of the diagonal elements of 技术分享, we have proved (4.2).

5. Best Rank k Approximation using SVD [see Ref-5]

Let 技术分享 be an 技术分享 matrix and think of the rows of 技术分享 as 技术分享 points in d-dimensional space. There are two important matrix norms, the Frobenius norm denoted 技术分享 and the 2-norm denoted 技术分享.

  • The 2-norm of the matrix A is given by
    技术分享
    and thus equals the largest singular value of the matrix. That is, the 2-norm is the square root of the sum of squared distances to the origin along the direction that maximizes this quantity.
  • The Frobenius norm of 技术分享 is the square root of the sum of the squared distance of the points to the origin, shown in (3.17).

Let 技术分享 and

技术分享
be the SVD of 技术分享. For 技术分享, let
技术分享
be the sum truncated after 技术分享 terms. It is clear that 技术分享 has rank 技术分享. Furthermore, 技术分享 is the best rank 技术分享 approximation to 技术分享 when the error is measured in either the 2-norm or the Frobenius norm (see Theorem 5.2 and Theorem 5.3).
Without proof, we give the following theorems (if interested, please check Lemma 1.6, Theorem 1.7, Theorem 1.8, and Theorem 1.9 in page 9-10 of Ref-5).

 

Theorem 5.1:

The rows of matrix 技术分享 are the projections of the rows of 技术分享 onto the subspace 技术分享 spanned by the first 技术分享 singular vectors of 技术分享.

Theorem 5.2:

Let 技术分享 be an 技术分享 matrix, for any matrix 技术分享 of rank at most 技术分享, it holds that

技术分享

 

Theorem 5.3:

Let 技术分享 be an 技术分享 matrix, for any matrix 技术分享 of rank at most 技术分享, it holds that

技术分享

 

Theorem 5.4:

Let 技术分享 be an 技术分享 matrix, for 技术分享 in (5.2) it holds that

技术分享

 

6. The Geometry of Linear Transformations [see Ref-3]

6.1 Matrix and Transformation

Let us begin by looking at some simple matrices, namely those with two rows and two columns. Our first example is the diagonal matrix

技术分享

 

Geometrically, we may think of a matrix like this as taking a point 技术分享 in the plane and transforming it into another point 技术分享using matrix multiplication:

技术分享

 

The effect of this transformation is shown below : the plane is horizontally stretched by a factor of 技术分享, while there is no vertical change.

技术分享

Now let’s look at

技术分享
.
技术分享
The four vertices of the red square shown in the following figure, 技术分享 are transformed into 技术分享, respectively, which produces this effect

 

技术分享

It is not so clear how to describe simply the geometric effect of the transformation. However, let’s rotate our grid through a 技术分享 angle and see what happens. The four vertices of the red square, 技术分享 are transformed into 技术分享, respectively, which produces this effect

技术分享

We see now that this new grid is transformed in the same way that the original grid was transformed by the diagonal matrix: the grid is stretched by a factor of 技术分享 in one direction.

This is a very special situation due to the fact that the matrix 技术分享 is symmetric, i.e., 技术分享. If we have a symmetric 技术分享 matrix, it turns out that

we may always rotate the grid in the domain so that the matrix acts by stretching and perhaps reflecting in the two directions. In other words, symmetric matrices behave like diagonal matrices.

结论:

以上的几张图,就是为了讨论given a 技术分享 symmetric matrix 技术分享, 即

技术分享

 

  • 如何放置坐标grid(或者说如何确定一个单位长度的正方形在坐标系中的位置和方向, 要知道这个正方形可以用两个彼此互相垂直的单位向量技术分享技术分享 来表示),使得当该正方形被施加transformation(represented by a symmetric matrix 技术分享)时,这个正方形的形变发生沿着技术分享技术分享 方向的单纯的拉伸或压缩。这就与后面即将讨论的矩阵的特征向量和特征值联系起来。 即:
    技术分享
    表示特征向量 技术分享 被矩阵 技术分享 变换之后,新的向量与原来向量平行(包括同向和反向),只是模长发生了改变而已。
  • 如何求得这样的技术分享技术分享呢? 答案就是当技术分享为对称矩阵时(当然,对称矩阵是一种特殊情况,接下来我们会讨论更为一般的矩阵),这样的技术分享技术分享就是对称矩阵技术分享的两个特征向量。即由技术分享,求得特征向量和特征值为:
    技术分享
    which accords with the 技术分享 rotation of the red sqaure shown above.
  • 对于这种特殊的对称矩阵技术分享, 它的SVD就演变成了 Lemma 4-1: 实对称矩阵正交相似于对角矩阵,正如(4.5)所示。可以把它看成是SVD的一种特殊情况, 即:对于矩阵技术分享, 有如下SVD:
    • 对于一般的矩阵技术分享, 存在正交矩阵技术分享技术分享(即技术分享), 使得
      技术分享
      对于(6.2), 技术分享 即为技术分享的特征向量组成,技术分享 即为技术分享的特征向量组成,对角矩阵技术分享技术分享(或者技术分享)的特征值的正平方根构成 。
    • 技术分享是实对称矩阵时, 存在正交矩阵技术分享(即技术分享), 使得
      技术分享
      对于(6.3), 技术分享 即为对称矩阵技术分享的特征向量组成,对角矩阵技术分享对称矩阵技术分享的特征值构成 。当然也可以通过上面介绍的方法求解,即技术分享 是由技术分享的特征向量组成,对角矩阵技术分享技术分享的特征值的正平方根构成技术分享。两种方法是等价的、是一致的。

6.2 The Geometry of Eigenvectors and Eigenvalues

Said with more mathematical precision, given a symmetric matrix 技术分享, we may find a set of orthogonal vectors 技术分享 so that 技术分享 is a scalar multiple of 技术分享; that is

技术分享
where 技术分享 is a scalar.

 

Geometrically, this means that the vectors 技术分享 are simply stretched and/or reflected(即方向改变了180°) when multiplied by 技术分享. Because of this property, we call

  • Eigenvectors: the vectors 技术分享 eigenvectors of 技术分享;
  • Eigenvalues: the scalars 技术分享 are called eigenvalues.

An important fact, which is easily verified, is that eigenvectors of a symmetric matrix corresponding to different eigenvalues are orthogonal. If we use the eigenvectors of a symmetric matrix to align the grid, the matrix stretches and/or reflects the grid in the same way that it does the eigenvectors.

The geometric description we gave for this linear transformation is a simple one: the grid is simply stretched in one direction. For more general matrices, we will ask if we can find an orthogonal grid that is transformed into another orthogonal grid. Let’s consider a final example using a matrix that is not symmetric:

技术分享

 

This matrix produces the geometric effect known as a shear, shown as

技术分享

It’s easy to find one family of eigenvectors along the horizontal axis. However, our figure above shows that these eigenvectors cannot be used to create an orthogonal grid that is transformed into another orthogonal grid.

  • Nonetheless, let’s see what happens when we rotate the grid first by 技术分享, shown as
    技术分享 Notice that the angle at the origin formed by the red parallelogram on the right has increased.
  • Let’s next rotate the grid by 技术分享.
    技术分享 It appears that the grid on the right is now almost orthogonal.
  • In fact, by rotating the grid in the domain by an angle of roughly 技术分享, both grids are now orthogonal.

技术分享

How to calculate this angle of roughly 技术分享?

Solution:

Based on the discussion in (6.2), The columns of 技术分享 are the eigenvectors of 技术分享, results in 技术分享, and

技术分享
where
技术分享
We can get 技术分享, the corresponding eigenvectors are
技术分享
where the directions of 技术分享 and 技术分享 are ( You can run Matlab function 技术分享 to get the result as follows)
技术分享

 

6.3 The singular value decomposition

This is the geometric essence of the singular value decomposition for 技术分享 matrices:

for any 技术分享 matrix, we may find an orthogonal grid that is transformed into another orthogonal grid. We will express this fact using vectors:

  • with an appropriate choice of orthogonal unit vectors 技术分享 and 技术分享, the vectors 技术分享 and 技术分享 are orthogonal.

技术分享

We will use 技术分享 and 技术分享 to denote unit vectors in the direction of 技术分享 and 技术分享. The lengths of 技术分享 and 技术分享 – denoted by 技术分享 and 技术分享 – describe the amount that the grid is stretched in those particular directions. These numbers are called the singular values of 技术分享. (In this case, the singular values are the golden ratio and its reciprocal, but that is not so important here.)

技术分享

We therefore have

技术分享
技术分享

 

We may now give a simple description for how the matrix 技术分享 treats a general vector 技术分享. Since the vectors 技术分享 and 技术分享 are orthogonal unit vectors, we have

技术分享

 

This means that

技术分享

 

Remember that the inner dot product may be computed using the vector transpose

技术分享
which leads to
技术分享

 

This is usually expressed by writing

技术分享
where 技术分享 is a matrix whose columns are the vectors 技术分享 and 技术分享, 技术分享 is a diagonal matrix whose entries are 技术分享 and 技术分享, and 技术分享 is a matrix whose columns are 技术分享 and 技术分享.

 

This shows how to decompose the matrix 技术分享 into the product of three matrices:

  • 技术分享 describes an orthonormal basis in the domain (定义域), and
  • 技术分享 describes an orthonormal basis in the co-domain (值域), and
  • 技术分享 describes how much the vectors in 技术分享 are stretched to give the vectors in 技术分享.

6.4 How do we find the singular decomposition?

The power of the singular value decomposition lies in the fact that we may find it for any matrix. How do we do it? Let’s look at our earlier example and add the unit circle in the domain (定义域). Its image will be an ellipse whose major and minor axes define the orthogonal grid in the co-domain (值域).

技术分享

Notice that the major and minor axes are defined by 技术分享 and 技术分享. These vectors therefore are the longest and shortest vectors among all the images of vectors on the unit circle.

技术分享

In other words, the function 技术分享 on the unit circle has a maximum at 技术分享 and a minimum at 技术分享. This reduces the problem to a rather standard calculus problem in which we wish to optimize a function over the unit circle. It turns out that the critical points of this function occur at the eigenvectors of the matrix 技术分享. Since this matrix is symmetric (since it is obvious that 技术分享), eigenvectors corresponding to different eigenvalues will be orthogonal. This gives the family of vectors 技术分享.

The singular values are then given by 技术分享, and the vectors 技术分享 are obtained as unit vectors in the direction of 技术分享.

But why are the vectors 技术分享 orthogonal? To explain this, we will assume that 技术分享 and 技术分享 are distinct singular values. We have

技术分享
Let’s begin by looking at the expression 技术分享 and assuming, for convenience, that the singular values are non-zero.

 

  • On one hand, this expression is zero due to the orthogonal-to-one-another vectors 技术分享s’ and 技术分享s’, which are required to be eigenvectors of the symmetric matrix 技术分享, i.e.,
    技术分享
    Therefore,
    技术分享
  • On the other hand, we have
    技术分享
    Therefore, 技术分享 and 技术分享 are orthogonal, so we have found an orthogonal set of vectors 技术分享 that is transformed into another orthogonal set 技术分享. The singular values describe the amount of stretching in the different directions.

In practice, this is not the procedure used to find the singular value decomposition of a matrix since it is not particularly efficient or well-behaved numerically.

6.5 Another example

Let’s now look at the singular matrix

技术分享

 

We can get 技术分享, the corresponding eigenvectors are

技术分享
where the directions of 技术分享 and 技术分享 are ( You can run Matlab function 技术分享 to get the result as follows)
技术分享

 

The geometric effect of this matrix is the following:

技术分享

In this case, the second singular value is zero so that we may write:

技术分享

 

In other words, if some of the singular values are zero, the corresponding terms do not appear in the decomposition for 技术分享. In this way, we see that the rank of 技术分享, which is the dimension of the image of the linear transformation, is equal to the number of non-zero singular values.

6.6 SVD Application 1 – Data compression

Singular value decompositions can be used to represent data efficiently. Suppose, for instance, that we wish to transmit the following image, which consists of an array of 技术分享 black or white pixels.

技术分享

Since there are only three types of columns in this image, as shown below, it should be possible to represent the data in a more compact form.

技术分享

We will represent the image as a 技术分享 matrix 技术分享 in which each entry is either a 0, representing a black pixel, or 1, representing white. As such, there are 技术分享 entries in the matrix. If we perform a singular value decomposition on 技术分享, we find there are only three non-zero singular values 技术分享
Therefore, the matrix 技术分享 may be represented as

技术分享

 

This means that we have three vectors 技术分享, each of which has 技术分享 entries, three vectors 技术分享, each of which has 技术分享 entries, and three singular values 技术分享. This implies that we may represent the matrix using only 技术分享 numbers rather than the 技术分享 that appear in the matrix. In this way, the singular value decomposition discovers the redundancy in the matrix and provides a format for eliminating it.

Why are there only three non-zero singular values? Remember that the number of non-zero singular values equals the rank of the matrix. In this case, we see that there are three linearly independent columns in the matrix, which means that 技术分享.

6.7 SVD Application 2 – Noise reduction

The previous example showed how we can exploit a situation where many singular values are zero. Typically speaking, the large singular values point to where the interesting information is. For example, imagine we have used a scanner to enter this image into our computer. However, our scanner introduces some imperfections (usually called “noise“) in the image.

技术分享
We may proceed in the same way: represent the data using a 技术分享 matrix and perform a singular value decomposition. We find the following singular values:

技术分享

 

Clearly, the first three singular values are the most important so we will assume that the others are due to the noise in the image and make the approximation

技术分享
This leads to the following improved image.

 

技术分享

6.8 SVD Application 3 – Data analysis

Noise also arises anytime we collect data: no matter how good the instruments are, measurements will always have some error in them. If we remember the theme that large singular values point to important features in a matrix, it seems natural to use a singular value decomposition to study data once it is collected. As an example, suppose that we collect some data as shown below:

技术分享
We may take the data and put it into a matrix:

技术分享
and perform a singular value decomposition. We find the singular values
技术分享

 

With one singular value so much larger than the other, it may be safe to assume that the small value of 技术分享 is due to noise in the data and that this singular value would ideally be 技术分享. In that case, the matrix would have rank one meaning that all the data lies on the line defined by 技术分享.

技术分享
This brief example points to the beginnings of a field known as principal component analysis (PCA), a set of techniques that uses singular values to detect dependencies and redundancies in data.

In a similar way, singular value decompositions can be used to detect groupings in data, which explains why singular value decompositions are being used in attempts to improve Netflix’s movie recommendation system. Ratings of movies you have watched allow a program to sort you into a group of others whose ratings are similar to yours. Recommendations may be made by choosing movies that others in your group have rated highly.

8. Reference

[1]: Kirk Baker, Singular Value Decomposition Tutorial, https://www.ling.ohio-state.edu/~kbaker/pubs/Singular_Value_Decomposition_Tutorial.pdf;
[2]: Singular Value Decomposition (SVD) tutorial, http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm;
[3]: We Recommend a Singular Value Decomposition, http://www.ams.org/samplings/feature-column/fcarc-svd;
[4]: Computation of the Singular Value Decomposition, http://www.cs.utexas.edu/users/inderjit/public_papers/HLA_SVD.pdf;
[5]: CMU, SVD Tutorial, https://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/book-chapter-4.pdf.
[6]: Wiki: Singular value decomposition, https://en.wikipedia.org/wiki/Singular_value_decomposition.
[7]: Chapter 6 Eigenvalues and Eigenvectors, http://math.mit.edu/~gs/linearalgebra/ila0601.pdf.
[8]: Expressing a matrix as an expansion of its eigenvalues, http://math.stackexchange.com/questions/331826/expressing-a-matrix-as-an-expansion-of-its-eigenvalues.
[9]: Wiki: Eigenvalues and eigenvectors, https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors.

 

机器学习学习笔记 PRML Chapter 2.0 : Prerequisite 2 -Singular Value Decomposition (SVD)

标签:

原文地址:http://www.cnblogs.com/glory-of-family/p/5645554.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!