如何使用Python(scikit-learn)计算FactorAnalysis得分?

如何使用Python(scikit-learn)计算FactorAnalysis得分?,第1张

概述我需要进行探索性因子分析,并使用 Python计算每个观察的分数,假设只有1个潜在因素.似乎sklearn.decomposition.FactorAnalysis()是要走的路,但遗憾的是 documentation和 example(遗憾的是我无法找到其他例子)对我来说还不够清楚如何完成工作. 我有以下测试文件,包含29个29变量的观察结果(test.csv): 49.6,34917,2432 @H_419_6@ 我需要进行探索性因子分析,并使用 Python计算每个观察的分数,假设只有1个潜在因素.似乎sklearn.decomposition.FactorAnalysis()是要走的路,但遗憾的是 documentation和 example(遗憾的是我无法找到其他例子)对我来说还不够清楚如何完成工作.

我有以下测试文件,包含29个29变量的观察结果(test.csv):

49.6,34917,24325.4,305,101350,98678,254.8,276.9,47.5,1,3,5.6,3.59,11.9,97.5,97.6,8,10,100,96.93,610.1,1718.22,6.7,28,5275.8,14667,11114.4,775,75002,74677,30,109,9.1,6.5,3.01,8.2,1558,2063.17,5.5,64,52.3,9372.5,8035.4,4.6,8111,8200,8.01,130,1.2,5,3.33,6.09,97.9,67.3,342.3,99.96,18.3,53,1457.27,4.8,47.10,13198.0,13266.4,1.1,708,695,6.1,80,0.4,4,3.1,97.8,45,82.7,99.68,4.5,13.8,31.97,2466.7,2900.6,19.7,5358,5335,10.1,23,0.5,2,3.14,97.3,97.2,9,74.5,98.2,99.64,79.8,54,1367.89,6.4,12,42.40,2999.4,2218.2,0.80,2045,2100,8.9,1.5,2.82,8.6,97.4,47.2,323.8,99.996,13.6,24,1249.67,2.7,30.59,4120.8,5314.5,0.54,14680,13688,14.9,117,2.94,3.4,97.7,11.8,872.6,9.3,52,1251.67,14,20.72,2067.7,2364,367,298,7.2,60,2.5,2.97,10.5,74.7,186.8,99.13,57,1800.45,21.14,2751.9,3066.8,3.5,1429,1498,7.7,1.6,2.86,76.7,240.1,99.93,1259.97,15,31.29,4802.6,5026.1,7859,7789,1.9,98,34,297.5,99.95,1306.44,8.5,40.40,639.0,660.3,1.3,25,0.1,94.2,4.3,50,1565.44,19.2,40.26,430.7,608.1,33,7,6,76.5,98.31,1490.08,44.99,2141.2,2357.6,3.60,339,320,8.1,0.2,5.9,58.1,206.3,99.58,13.2,95,1122.92,14.2,20.36,1453.7,1362.2,3.50,796,785,3.7,98.1,91.4,214.6,99.74,7.5,1751.98,11.5,1657.5,2421.1,2.8,722,690,11,37.4,404.2,99.98,10.9,35,1772.33,10.2,31.14,5635.2,5649.6,2681,2530,5.4,20,0.3,50.1,384.7,99.02,11.6,27,1306.08,16,20.6,1055.9,1487.9,69,65,63,137.9,5.1,48,1595.06,40.08,795.3,1174.7,1.40,85,76,2.2,39.3,149.3,98.27,1903.9,20.90,2514.0,2644.4,2.6,1173,1104,43,0.8,58.7,170.5,80.29,1292.72,20.27,870.4,949.7,1.8,252,240,31,64.5,6.6,29,1483.18,30.41,1295.1,2052.3,2.60,2248,2135,6.0,71.1,261.3,91.86,21,1221.71,9.4,41.10,3544.2,4268.9,2.1,735,730,1.7,317.2,99.62,9.8,46,1271.63,30.22,899.3,888.2,1.80,220,218,3.6,22.5,70.79,10.6,32,1508.02,40.24,1712.8,1735.5,1.30,41,3.28,16.6,720.2,1324.46,20.2,558.4,631.9,60.7,99.38,1535.08,20.21,599.9,1029,70,85.7,48.6,221.2,40,1381.44,25.6,20.10,131.3,190.6,2.9,58.9,189.4,6.9,42,1525.58,17.4,30.44,3881.4,5067.3,0.9,2732,2500,11.2,2.67,14.5,1326.2,99.06,1120.54,10.3,20.18,1024.8,1651.3,1.01,358,345,15.9,790.2,1531.04,30.46,682.9,784.2,103,166.3,44,1373.6,13.5,20.12,370.4,420.0,1.10,2.57,51.6,120,99.85,1297.94,30.03,552.4,555.1,49,33.6,594.5,3.2,1184.34,30.21,1256.5,2434.8,1265,1138,6.3,20.1,881,99.1,3.9,1265.93,7.8,30.09,320.6,745.7,37,49.2,376.4,39,1285.11,30.08,452.7,570.9,18,4.7,0.6,2.45,97.1,19.9,1103.8,22,1562.61,21.9,30.13,967.9,947.2,74,4.0,1.4,30.1,503.1,99.999,55,1269.33,20.07,495.0,570.3,3.62,13,29.8,430.5,99.7,4.9,1461.79,14.6,20.17,681.9,537.4,113,98.3,74.3,1290.16,30.05,639.7,898.2,0.40,3.0,1221.1,1372,40.65,2067.8,2084.2,2.50,414,398,7.3,0.7,2.16,60.1,146.3,10.4,1059.68,7.4,804.4,1416.4,3.30,579,602,4.2,2492.3,95.4,1345.76,2

使用我根据官方示例和this post编写的代码
我得到了奇怪的结果.码:

from sklearn import decomposition,preprocessingfrom sklearn.cross_valIDation import cross_val_scoreimport csvimport numpy as npdata = np.genfromtxt('test.csv',delimiter=',')def compute_scores(X):    n_components = np.arange(0,len(X),1)    X = preprocessing.scale(X) # data normalisation attempt    pca = decomposition.PCA()    fa = decomposition.FactorAnalysis(n_components=1)    pca_scores,fa_scores = [],[]    for n in n_components:        pca.n_components = n        fa.n_components = n        #pca_scores.append(np.mean(cross_val_score(pca,X))) # if I attempt to compute pca_scores I get the error.        fa_scores.append(np.mean(cross_val_score(fa,X)))    print pca_scores,fa_scorescompute_scores(data)

代码输出:

[],[-947738125363.77405,-947738145459.86035,-947738159924.70471,-947738174662.89746,-947738206142.62854,-947738179314.44739,-947738220921.50684,-947738223447.3678,-947738277298.33545,-947738383772.58606,-947738415104.84912,-947738406361.44482,-947738394379.30359,-947738456528.69275,-947738501001.14319,-947738991338.98291,-947739381280.06506,-947739389033.33557,-947739434992.48047,-947739549511.2655,-947739355699.70959,-947739879828.51514,-947739898216.39099,-947739905804.71033,-947739902618.47791,-947738564594.54639,-948816122907.87366,-947744046601.55029,-947738624937.61292,-947738625325.73486,-947738626111.14441,-947738624973.92188,-947738625200.06946,-947738625568.65027,-947738625528.69666,-947738625359.41992,-947738624906.67529,-947738625652.12439,-947739509002.01868,-947738625426.81946,-947738625380.45837]

这个结果远非预期的结果.这是此任务的R代码和相同的数据.它的输出正常(结果接近某些能够执行FA的IBM程序的输出):

data <-read.csv("test.csv",header=F)col_names <- names(data)drops <- c()for (name in col_names){  st_dev <- sd(data[,name],na.rm = T)  if (st_dev == 0){    drops <- c(drops,name)  }}da_nal <- data[,!(names(data) %in% drops)]factanal(na.omit(da_nal),factors = 1,scores = 'regression')$scores

此代码的输出是:

Factor11   4.891021902   3.650041873   0.146287004  -0.202558975  -0.015655706  -0.164388637   0.408359868  -0.258239849  -0.2081306410  0.0939006711 -0.2889129612 -0.2888275313 -0.2662435814 -0.2520227515 -0.2518132616 -0.1565367917 -0.2870228118 -0.2886565419 -0.2325150920 -0.2806612521 -0.1871438722 -0.2496911323 -0.2830255224 -0.2871261025 -0.2919652926 -0.2865998827 -0.2950252328 -0.1580291029 -0.2744011830 -0.2908366731 -0.2954822032 -0.2946105933 -0.2359485934 -0.2965433635 -0.2975965936 -0.2908500137 -0.2953907138 -0.2923430339 -0.2970210340 -0.2759513041 -0.27184361

所以我希望在Python中获得类似的结果(我知道我不会得到确切的数字),但我不知道如何.

解决方法 似乎我想出了如何获得分数.

from sklearn import decomposition,preprocessingimport numpy as npdata = np.genfromtxt('rangir_test.csv',')data = data[~np.isnan(data).any(axis=1)]data_normal = preprocessing.scale(data)fa = decomposition.FactorAnalysis(n_components = 1)fa.fit(data_normal)for score in fa.score_samples(data_normal):    print score

不幸的是,输出(见下文)与factanal()的输出非常不同.任何有关分解的建议.FactorAnalysis()将不胜感激.

Scikit-learn分数输出:

-69.8587183816-116.353511148-24.1529840248-36.5366398005-7.87165586175-24.9012815104-23.9148486368-10.047780535-4.03376369723-7.07428842783-7.44222705099-6.25705487929-13.2313513762-13.3253819521-9.23993173528-7.141616656-5.57915693405-6.82400483045-15.0906961724-3.37447211233-5.41032267015-5.75224753811-19.7230390792-6.75268922909-4.04911793705-10.6062761691-3.17417070498-9.95916350005-3.25893428094-3.88566777358-3.30908856716-3.58141292341-3.90778368669-4.01462493538-11.6683969455-5.30068548445-24.3400870389-7.66035331181-13.8321672858-8.93461397086-17.4068326999
总结

以上是内存溢出为你收集整理的如何使用Python(scikit-learn)计算FactorAnalysis得分?全部内容,希望文章能够帮你解决如何使用Python(scikit-learn)计算FactorAnalysis得分?所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/langs/1197347.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-03
下一篇 2022-06-03

发表评论

登录后才能评论

评论列表(0条)

保存