np.convolve-
mask = np.isnan(data)K = np.ones(win_size,dtype=int)out = np.convolve(np.where(mask,0,data), K)/np.convolve(~mask,K)
请注意,这将在两侧各增加一个元素。
如果您正在处理
2D数据,我们可以使用
Scipy's 2Dconvolution。
方法-
def original_app(data, win_size): #Compute mean result = np.zeros(data.size) for count in range(data.size): part_data = data[max(count - (win_size - 1) / 2, 0): min(count + (win_size + 1) / 2, data.size)] mask = np.isfinite(part_data) if np.sum(mask) != 0: result[count] = np.sum(part_data[mask]) / np.sum(mask) else: result[count] = None return resultdef numpy_app(data, win_size): mask = np.isnan(data) K = np.ones(win_size,dtype=int) out = np.convolve(np.where(mask,0,data), K)/np.convolve(~mask,K) return out[1:-1] # Slice out the one-extra elems on sides
样品运行-
In [118]: #Construct sample data ...: n = 50 ...: n_miss = 20 ...: win_size = 3 ...: data= np.random.random(50) ...: data[np.random.randint(0,n-1, n_miss)] = np.nan ...:In [119]: original_app(data, win_size = 3)Out[119]: array([ 0.88356487, 0.86829731, 0.85249541, 0.83776219, nan, nan, 0.61054015, 0.63111926, 0.63111926, 0.65169837, 0.1857301 , 0.58335324, 0.42088104, 0.5384565 , 0.31027752, 0.40768907, 0.3478563 , 0.34089655, 0.55462903, 0.71784816, 0.93195716, nan, 0.41635575, 0.52211653, 0.65053379, 0.76762282, 0.72888574, 0.35250449, 0.35250449, 0.14500637, 0.06997668, 0.22582318, 0.18621848, 0.36320784, 0.19926647, 0.24506199, 0.09983572, 0.47595439, 0.79792941, 0.5982114 , 0.42389375, 0.28944089, 0.36246113, 0.48088139, 0.71105449, 0.60234163, 0.40012839, 0.45100475, 0.41768466, 0.41768466])In [120]: numpy_app(data, win_size = 3)__main__:36: RuntimeWarning: invalid value encountered in divideOut[120]: array([ 0.88356487, 0.86829731, 0.85249541, 0.83776219, nan, nan, 0.61054015, 0.63111926, 0.63111926, 0.65169837, 0.1857301 , 0.58335324, 0.42088104, 0.5384565 , 0.31027752, 0.40768907, 0.3478563 , 0.34089655, 0.55462903, 0.71784816, 0.93195716, nan, 0.41635575, 0.52211653, 0.65053379, 0.76762282, 0.72888574, 0.35250449, 0.35250449, 0.14500637, 0.06997668, 0.22582318, 0.18621848, 0.36320784, 0.19926647, 0.24506199, 0.09983572, 0.47595439, 0.79792941, 0.5982114 , 0.42389375, 0.28944089, 0.36246113, 0.48088139, 0.71105449, 0.60234163, 0.40012839, 0.45100475, 0.41768466, 0.41768466])
运行时测试-
In [122]: #Construct sample data ...: n = 50000 ...: n_miss = 20000 ...: win_size = 3 ...: data= np.random.random(n) ...: data[np.random.randint(0,n-1, n_miss)] = np.nan ...:In [123]: %timeit original_app(data, win_size = 3)1 loops, best of 3: 1.51 s per loopIn [124]: %timeit numpy_app(data, win_size = 3)1000 loops, best of 3: 1.09 ms per loopIn [125]: import pandas as pd# @jdehesa's pandas solutionIn [126]: %timeit pd.Series(data).rolling(window=3, min_periods=1).mean()100 loops, best of 3: 3.34 ms per loop
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)