Matlab中关于pca函数的说明写得并不直观, 很多人最直接的目的只是想得到pca降维后的结果 ,但是根据官方解释文档,很难一下看出哪个输出参数才是最终降维后的特征。因此,本文记录如何使用Matlab中自带的pca函数对数据进行降维。
PS. 本文并不详解pca的原理,仅仅记录如何使用Matlab中的pca函数。
输入参数:X是n x d的样本矩阵,其中n表示样本数,d表示特征纬度。
输出参数:
(1)coeff是主成分分量,即样本协方差矩阵的特征向量。
(2)score是主成分,即样本X在低维空间的投影,也就是我们想要得到的降维后的数据。
注意:score的维度和原始样本X的维度一致,若需要降到k维,则只需要取score的前k列即可。
此外,也可以根据coeff计算得到score,以下是具体步骤:
(1)计算样本X沿特征纬度的均值向量(因为X的每一列代表一个特征,因此此时是按照行计算均值):
(2) 利用去中心后的X乘上coeff便可以得到score:
运行后便可以看到res的结果非常非常小,此时便说明test和score非常接近。
[1] PCA原理分析和Matlab实现方法(三)
[2] Matlab: princomp() 主成分分析
1,4 matlab是有帮助文档的,我没有明白你所指的去中心化处理是什么,PCA的结果在数组自己的维度。以下是帮助文档,请仔细阅读
coeff = pca(X) returns the principal component coefficients, also known as loadings, for the n-by-p data matrix X. Rows of X correspond to observations and columns correspond to variables. The coefficient matrix is p-by-p. Each column of coeffcontains coefficients for one principal component, and the columns are in descending order of component variance. By default, pca centers the data and uses the singular value decomposition (SVD) algorithm.
example
coeff = pca(X,Name,Value) returns any of the output arguments in the previous syntaxes using additional options for computation and handling of special data types, specified by one or more Name,Value pair arguments.
For example, you can specify the number of principal components pca returns or an algorithm other than SVD to use.
example
[coeff,score,latent] = pca(___) also returns the principal component scores in score and the principal component variances in latent. You can use any of the input arguments in the previous syntaxes.
Principal component scores are the representations of X in the principal component space. Rows of score correspond to observations, and columns correspond to components.
The principal component variances are the eigenvalues of the covariance matrix of X.
example
[coeff,score,latent,tsquared] = pca(___) also returns the Hotelling's T-squared statistic for each observation in X.
example
[coeff,score,latent,tsquared,explained,mu] = pca(___) also returns explained, the percentage of the total variance explained by each principal component and mu, the estimated mean of each variable in X.
2. PCA 和SVD的不同是,他们分解矩阵的方式是不同的。我建议你翻看wikipedia里面SVD和PCA的说明,里面公式很清晰了
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)