我正在计算矩阵的spearman相关性.我发现使用scipy.stats.spearmanr时矩阵输入和双数组输入给出了不同的结果.结果也与pandas.Data.Frame.corr不同.
from scipy.stats import spearmanr # scipy 1.0.1import pandas as pd # 0.22.0import numpy as np#Data X = pd.DataFrame({"A":[-0.4,1,12,78,84,26,0],"B":[-0.4,3.3,54,87,25,np.nan,1.2],"C":[np.nan,56,143,11,np.nan],"D":[0,-9.3,23,72,-2,-0.3,-0.4],"E":[78,-1,-11,323]})matrix_rho_scipy = spearmanr(X,nan_policy='omit',axis=0)[0]matrix_rho_pandas = X.corr('spearman')print(matrix_rho_scipy == matrix_rho_pandas.values) # All False except diagonalprint(spearmanr(X['A'],X['B'],axis=0)[0]) # 0.8839285714285714 from scipy 1.0.1print(spearmanr(X['A'],axis=0)[0]) # 0.8829187134416477 from scipy 1.1.0print(matrix_rho_scipy[0,1]) # 0.8263621207201486print(matrix_rho_pandas.values[0,1]) # 0.8829187134416477
后来我发现熊猫的rho和R的rho一样.
X = data.frame(A=c(-0.4,0),B=c(-0.4,NaN,1.2),C=c(NaN,NaN),D=c(0,-0.4),E=c(78,323)) cor.test(X$A,X$B,method='spearman',exact = FALSE,na.action="na.omit") # 0.8829187
但是,Pandas的corr不能用于大表(例如,here和我的情况是16,000).
感谢Warren Weckesser的测试,我发现Scipy 1.1.0的两个数组结果(但不是1.0.1)与Pandas和R的结果相同.
如果您有任何建议或意见,请与我们联系.谢谢.
我使用Python:3.6.2(Anaconda); Mac OS:10.10.5.
最佳答案当输入是一个数组并给出一个轴时,scipy.stats.spearmanr似乎没有按预期处理nan值.这是一个脚本,它比较了几种计算成对Spearman排序相关性的方法:import numpy as npimport pandas as pdfrom scipy.stats import spearmanrx = np.array([[np.nan,3.0,4.0,5.0,5.1,6.0,9.2],[5.0,4.1,4.8,4.9,4.1],[0.5,7.1,3.8,8.0,7.6]])r = spearmanr(x,axis=1)[0]print("spearmanr,array: %11.7f %11.7f %11.7f" % (r[0,1],r[0,2],r[1,2]))r01 = spearmanr(x[0],x[1],nan_policy='omit')[0]r02 = spearmanr(x[0],x[2],nan_policy='omit')[0]r12 = spearmanr(x[1],nan_policy='omit')[0]print("spearmanr,indivIDual: %11.7f %11.7f %11.7f" % (r01,r02,r12))df = pd.DataFrame(x.T)c = df.corr('spearman')print("Pandas df.corr('spearman'): %11.7f %11.7f %11.7f" % (c[0][1],c[0][2],c[1][2]))print("R cor.test: 0.2051957 0.4857143 -0.4707919")print(' (method="spearman",continuity=FALSE)')"""# R code:> x0 = c(NA,3,4,5,9.2)> x1 = c(5.0,NA,4.1)> x2 = c(0.5,7.6)> cor.test(x0,x1,method="spearman",continuity=FALSE)> cor.test(x0,x2,continuity=FALSE)> cor.test(x1,continuity=FALSE)"""
输出:
spearmanr,array: -0.0727393 -0.0714286 -0.4728054spearmanr,indivIDual: 0.2051957 0.4857143 -0.4707919Pandas df.corr('spearman'): 0.2051957 0.4857143 -0.4707919R cor.test: 0.2051957 0.4857143 -0.4707919 (method="spearman",continuity=FALSE)
我的建议是不要以spearmanr形式使用scipy.stats.spearmanr(x,nan_policy =’omit’,axis =< whatever>).使用Pandas DataFrame的corr()方法,或使用循环使用spearmanr(x0,nan_policy =’omit’)成对计算值. 总结
以上是内存溢出为你收集整理的Python Scipy spearman相关矩阵不匹配双数组相关也不匹配pandas.Data.Frame.corr()全部内容,希望文章能够帮你解决Python Scipy spearman相关矩阵不匹配双数组相关也不匹配pandas.Data.Frame.corr()所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)