更新:(几乎)下面的“ new_function2”中完全矢量化的版本…
我将添加评论以解释一些事情。
它提供了约50倍的加速,如果您可以接受输出为numpy数组而不是列表,则可以实现更大的加速。照原样:
In [86]: %timeit new_function2(close, volume, INTERVAL_LENGTH)1 loops, best of 3: 1.15 s per loop
您可以将内部循环替换为对np.cumsum()的调用。请参见下面的“ new_function”函数。这大大提高了速度…
In [61]: %timeit new_function(close, volume, INTERVAL_LENGTH)1 loops, best of 3: 15.7 s per loop
与
In [62]: %timeit old_function(close, volume, INTERVAL_LENGTH)1 loops, best of 3: 53.1 s per loop
不过,应该有可能对整个对象进行矢量化处理,并完全避免for循环……请花一点时间,我将拭目以待……
import numpy as npARRAY_LENGTH = 500000INTERVAL_LENGTH = 15close = np.arange(ARRAY_LENGTH, dtype=np.float)volume = np.arange(ARRAY_LENGTH, dtype=np.float)def old_function(close, volume, INTERVAL_LENGTH): results = [] for i in xrange(len(close) - INTERVAL_LENGTH): for j in xrange(i+1, i+INTERVAL_LENGTH): ret = close[j] / close[i] vol = sum( volume[i+1:j+1] ) if (ret > 1.0001) and (ret < 1.5) and (vol > 100): results.append( (i, j, ret, vol) ) return resultsdef new_function(close, volume, INTERVAL_LENGTH): results = [] for i in xrange(close.size - INTERVAL_LENGTH): vol = volume[i+1:i+INTERVAL_LENGTH].cumsum() ret = close[i+1:i+INTERVAL_LENGTH] / close[i] filter = (ret > 1.0001) & (ret < 1.5) & (vol > 100) j = np.arange(i+1, i+INTERVAL_LENGTH)[filter] tmp_results = zip(j.size * [i], j, ret[filter], vol[filter]) results.extend(tmp_results) return resultsdef new_function2(close, volume, INTERVAL_LENGTH): vol, ret = [], [] I, J = [], [] for k in xrange(1, INTERVAL_LENGTH): start = k end = volume.size - INTERVAL_LENGTH + k vol.append(volume[start:end]) ret.append(close[start:end]) J.append(np.arange(start, end)) I.append(np.arange(volume.size - INTERVAL_LENGTH)) vol = np.vstack(vol) ret = np.vstack(ret) J = np.vstack(J) I = np.vstack(I) vol = vol.cumsum(axis=0) ret = ret / close[:-INTERVAL_LENGTH] filter = (ret > 1.0001) & (ret < 1.5) & (vol > 100) vol = vol[filter] ret = ret[filter] I = I[filter] J = J[filter] output = zip(I.flat,J.flat,ret.flat,vol.flat) return outputresults = old_function(close, volume, INTERVAL_LENGTH)results2 = new_function(close, volume, INTERVAL_LENGTH)results3 = new_function(close, volume, INTERVAL_LENGTH)# Using sets to compare, as the output # is in a different order than the original functionprint set(results) == set(results2)print set(results) == set(results3)
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)