Python:滑动窗口均值,忽略丢失的数据

Python:滑动窗口均值,忽略丢失的数据,第1张

Python:滑动窗口均值,忽略丢失的数据

这是基于卷积方法,使用

np.convolve
-

mask = np.isnan(data)K = np.ones(win_size,dtype=int)out = np.convolve(np.where(mask,0,data), K)/np.convolve(~mask,K)

请注意,这将在两侧各增加一个元素。

如果您正在处理

2D
数据,我们可以使用
Scipy's 2Dconvolution

方法-

def original_app(data, win_size):    #Compute mean    result = np.zeros(data.size)    for count in range(data.size):        part_data = data[max(count - (win_size - 1) / 2, 0):       min(count + (win_size + 1) / 2, data.size)]        mask = np.isfinite(part_data)        if np.sum(mask) != 0: result[count] = np.sum(part_data[mask]) / np.sum(mask)        else: result[count] = None    return resultdef numpy_app(data, win_size):         mask = np.isnan(data)    K = np.ones(win_size,dtype=int)    out = np.convolve(np.where(mask,0,data), K)/np.convolve(~mask,K)    return out[1:-1]  # Slice out the one-extra elems on sides

样品运行-

In [118]: #Construct sample data     ...: n = 50     ...: n_miss = 20     ...: win_size = 3     ...: data= np.random.random(50)     ...: data[np.random.randint(0,n-1, n_miss)] = np.nan     ...:In [119]: original_app(data, win_size = 3)Out[119]: array([ 0.88356487,  0.86829731,  0.85249541,  0.83776219,         nan,    nan,  0.61054015,  0.63111926,  0.63111926,  0.65169837,        0.1857301 ,  0.58335324,  0.42088104,  0.5384565 ,  0.31027752,        0.40768907,  0.3478563 ,  0.34089655,  0.55462903,  0.71784816,        0.93195716,         nan,  0.41635575,  0.52211653,  0.65053379,        0.76762282,  0.72888574,  0.35250449,  0.35250449,  0.14500637,        0.06997668,  0.22582318,  0.18621848,  0.36320784,  0.19926647,        0.24506199,  0.09983572,  0.47595439,  0.79792941,  0.5982114 ,        0.42389375,  0.28944089,  0.36246113,  0.48088139,  0.71105449,        0.60234163,  0.40012839,  0.45100475,  0.41768466,  0.41768466])In [120]: numpy_app(data, win_size = 3)__main__:36: RuntimeWarning: invalid value encountered in divideOut[120]: array([ 0.88356487,  0.86829731,  0.85249541,  0.83776219,         nan,    nan,  0.61054015,  0.63111926,  0.63111926,  0.65169837,        0.1857301 ,  0.58335324,  0.42088104,  0.5384565 ,  0.31027752,        0.40768907,  0.3478563 ,  0.34089655,  0.55462903,  0.71784816,        0.93195716,         nan,  0.41635575,  0.52211653,  0.65053379,        0.76762282,  0.72888574,  0.35250449,  0.35250449,  0.14500637,        0.06997668,  0.22582318,  0.18621848,  0.36320784,  0.19926647,        0.24506199,  0.09983572,  0.47595439,  0.79792941,  0.5982114 ,        0.42389375,  0.28944089,  0.36246113,  0.48088139,  0.71105449,        0.60234163,  0.40012839,  0.45100475,  0.41768466,  0.41768466])

运行时测试-

In [122]: #Construct sample data     ...: n = 50000     ...: n_miss = 20000     ...: win_size = 3     ...: data= np.random.random(n)     ...: data[np.random.randint(0,n-1, n_miss)] = np.nan     ...:In [123]: %timeit original_app(data, win_size = 3)1 loops, best of 3: 1.51 s per loopIn [124]: %timeit numpy_app(data, win_size = 3)1000 loops, best of 3: 1.09 ms per loopIn [125]: import pandas as pd# @jdehesa's pandas solutionIn [126]: %timeit pd.Series(data).rolling(window=3, min_periods=1).mean()100 loops, best of 3: 3.34 ms per loop


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/4937999.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-11-13
下一篇 2022-11-13

发表评论

登录后才能评论

评论列表(0条)

保存