再谈MK趋势检验之泰尔-森估算_python

通常在做线性倾斜估计时，多用最小二乘法去拟合。但利用pymannkendall包进行趋势检验时，默认采用的是泰尔-森估算，而并非最小二乘法。当数据存在异常值时，二者差异较大，记录在此，分享给有需要的同学避坑。
泰尔－森估算（Theil–Sen estimator）是非参数统计中一种拟合直线的稳健模型，对异常值不敏感，它比非鲁棒简单线性回归明显更准确。
下面就举个栗子说明一下，具体详见代码注释。

# -*- encoding: utf-8 -*-
'''
@File    :   mk.py
@Time    :   2022/03/23 10:52:13
@Author  :   HMX 
@Version :   1.0
@Contact :   [email protected]
'''
# here put the import lib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymannkendall as mk
from matplotlib import rcParams, ticker
from matplotlib.ticker import MultipleLocator
from sklearn.linear_model import LinearRegression


def cm2inch(x, y):
    return x/2.54,y/2.54


rcParams['font.sans-serif']=['SimHei']
# 读取数据
fn = r'D:\公众号\no.9.py\MK_trend_data.txt'
df = pd.read_csv(fn,header = 0,index_col=0)
print(df)

# 计算趋势
# 最小二乘法
model = LinearRegression()
x = np.arange(2003,2021).reshape(-1,1)
y = df.dd.values
model.fit(x,y)
print(model.coef_[0])

# 泰尔-森
mk_k = mk.original_test(df.dd).slope
mk_b = mk.original_test(df.dd).intercept
print(mk_k)


fig = plt.figure(figsize=(cm2inch(8,6)))
ax1 = plt.gca()
ax1.plot(df.index,df.dd,color = 'k',marker = '.')
ax1.plot(x,model.predict(x),'--',color = 'tab:blue',label = '最小二乘法')
ax1.plot(x,(x-2003)*mk_k+mk_b,':',color = 'tab:orange',label = '泰尔-森')
ax1.set_yticks(np.arange(0,4001,1000))
ax1.set_xticks(np.arange(2003,2021,5))
ax1.set_xlim(2002,2021)
ax1.set_ylim(0,4000)
xminorticks= MultipleLocator(1)
ax1.xaxis.set_minor_locator(xminorticks)
ax1.tick_params(axis='both', which='both', direction='in',
            color='black', labelcolor='black', labelsize=10)
formatter = ticker.ScalarFormatter(useMathText=True)
formatter.set_scientific(True) 
formatter.set_powerlimits((0,0)) 
ax1.yaxis.set_major_formatter(formatter)
ax1.legend()
plt.tight_layout()
plt.savefig(r'D:\公众号\no.9.py\MK——trend.png',dpi = 600)
plt.show()

结果如下：

当然由于数据的问题，这个栗子不够直观，但也能看出二者的区别。现只取前8年来做趋势分析比较。由下图可见二者的差距显著，值得注意。

# 绘制前8年
dfy1 = df.loc[:2010]
print(dfy1.dd)
x1 = np.arange(2003,2011).reshape(-1,1)
y1 = dfy1.dd.values
model1 = LinearRegression()
model1.fit(x1,y1)
mk1_k = mk.original_test(dfy1.dd).slope
mk1_b = mk.original_test(dfy1.dd).intercept
ax1.plot(x1,model1.predict(x1),'--',color = 'tab:blue',label = '最小二乘法')
ax1.plot(x1,(x1-2003)*mk1_k+mk1_b,':',color = 'tab:orange',label = '泰尔-森')