I would like to refine two points that I think are important:
I‘ll be assuming your data matrix is an m×n matrix that is organized such that rows are data samples (m samples), and columns are features (d features).
The first point is that SVD preforms low rank matrix approximation.
Your input to SVD is a number k (that is smaller than m or d), and the SVD procedure will return a set of k vectors of d dimensions (can be organized in a k×d matrix), and a set of k coefficients for each data sample (there are m data samples, so it can be organized in a m×k matrix), such that for each sample, the linear combination of it‘s k coefficients multiplied by the k vectors best reconstructs that data sample (in the euclidean distance sense). and this is true for all data samples.
So in a sense, the SVD procedure finds the optimum k vectors that together span a subspace in which most of the data samples lie in (up to a small reconstruction error).
PCA on the other hand is:
1) subtract the mean sample from each row of the data matrix.
2) preform SVD on the resulting matrix.
So, the second point is that PCA is giving you as output the subspace that
spans the deviations from the mean data sample, and SVD provides you with a subspace that
spans the data samples themselves (or, you can view this as a subspace that
spans the deviations from zero).
Note that these two subspaces are usually
NOT the same, and will be the same only if the mean data sample is zero.
In order to understand a little better why they are not the same, let‘s think of a data set where all features values for all data samples are in the range 999-1001, and each feature‘s mean is 1000.
From the SVD point of view, the main way in which these sample deviate from zero are along the vector (1,1,1,...,1).
From the PCA point of view, on the other hand, the main way in which these data samples deviate from the mean data sample is dependent on the precise data distributions around the mean data sample...
In short, we can think of SVD as "something that compactly summarizes the main ways in which my data is deviating from zero" and PCA as "something that compactly summarizes the main ways in which my data is deviating from the mean data sample".