计算范德蒙德矩阵的有效方法_随笔

计算范德蒙德矩阵的有效方法

这里有更多方法，其中一些方法（在我的计算机上）比迄今为止发布的方法要快得多。

我认为最重要的发现是，它实际上在很大程度上取决于您想要多少个学位。幂运算（我相信对小整数指数特别适用）仅对较小的指数范围有意义。指数越多，基于乘法的方法就越好。

我想强调一个

multiply.accumulate

基于方法（

ma

），它类似于numpy的内置方法，但是速度更快（不是因为我跳过了check-

nnc

，numpy-no-checks证明了这一点）。除了最小的指数范围外，对于我来说，它实际上是最快的。

由于我不明白的原因，据我所知，numpy实现会执行三项 *** 作，这些 *** 作据我所知都是缓慢而不必要的：（1）它使基本向量有很多副本。（2）使它们不连续。（3）它是就地积累，我相信会进行缓冲。

我想提及的另一件事是，通过简单的预防措施，将int提升为较大的dtype（可以说是一个简单

out_e_1

的方法

ma

），对于小范围的int（基本上是的手动版本）最快的速度减慢了两倍以上

safe_e_1

。误称）。

称为基于广播的方法，

bc_*

其中

表示广播轴（b表示基数，e表示exp）“欺诈”表示结果不连续。

时间（3个最佳时间）：

rep=100 n_b=5000 n_e=4 b_tp=<class 'numpy.int32'> e_tp=<class 'numpy.int32'>vander     0.16699657 msbc_b       0.09595204 msbc_e       0.07959786 msma         0.10755240 msnnc        0.16459018 msout_e_1    0.02037535 msout_e_2    0.02656622 mssafe_e_1   0.04652272 mssafe_e_2   0.04081079 mscheat bc_e_cheat 0.04668466 msrep=100 n_b=5000 n_e=8 b_tp=<class 'numpy.int32'> e_tp=<class 'numpy.int32'>vander     0.25086462 msbc_b  apparently failedbc_e  apparently failedma         0.15843041 msnnc        0.24713077 msout_e_1          apparently failedout_e_2          apparently failedsafe_e_1   0.15970622 mssafe_e_2   0.19672418 msbc_e_cheat       apparently failedrep=100 n_b=5000 n_e=4 b_tp=<class 'float'> e_tp=<class 'numpy.int32'>vander     0.16225773 msbc_b       0.53315020 msbc_e       0.56200830 msma         0.07626799 msnnc        0.16059748 msout_e_1    0.03653416 msout_e_2    0.04043702 mssafe_e_1   0.04060494 mssafe_e_2   0.04104209 mscheat bc_e_cheat 0.52966076 msrep=100 n_b=5000 n_e=8 b_tp=<class 'float'> e_tp=<class 'numpy.int32'>vander     0.24542852 msbc_b       2.03353578 msbc_e       2.04281270 msma         0.11075758 msnnc        0.24212880 msout_e_1    0.14809043 msout_e_2    0.19261359 mssafe_e_1   0.15206112 mssafe_e_2   0.19308420 mscheat bc_e_cheat 1.99176601 ms

码：

import numpy as npimport typesfrom timeit import repeatprom={np.dtype(np.int32): np.dtype(np.int64), np.dtype(float): np.dtype(float)}def RI(k, N, dt, top=100):    return np.random.randint(0, top if top else N, (k, N)).astype(dt)def RA(k, N, dt, top=None):    return np.add.outer(np.zeros((k,), int), np.arange(N)%(top if top else N)).astype(dt)def RU(k, N, dt, top=100):    return (np.random.random((k, N))*(top if top else N)).astype(dt)def data(k, N_b, N_e, dt_b, dt_e, b_fun=RI, e_fun=RA):    b = list(b_fun(k, N_b, dt_b))    e = list(e_fun(k, N_e, dt_e))    return b, edef f_vander(b, e):    return np.vander(b, len(e), increasing=True)def f_bc_b(b, e):    return b[:, None]**edef f_bc_e(b, e):    return np.ascontiguousarray((b**e[:, None]).T)def f_ma(b, e):    out = np.empty((len(b), len(e)), prom[b.dtype])    out[:, 0] = 1    np.multiply.accumulate(np.broadcast_to(b, (len(e)-1, len(b))), axis=0, out=out[:, 1:].T)    return outdef f_nnc(b, e):    out = np.empty((len(b), len(e)), prom[b.dtype])    out[:, 0] = 1    out[:, 1:] = b[:, None]    np.multiply.accumulate(out[:, 1:], out=out[:, 1:], axis=1)    return outdef f_out_e_1(b, e):    out = np.empty((len(b), len(e)), b.dtype)    out[:, 0] = 1    out[:, 1] = b    out[:, 2] = c = b*b    for i in range(3, len(e)):        c*=b        out[:, i] = c    return outdef f_out_e_2(b, e):    out = np.empty((len(b), len(e)), b.dtype)    out[:, 0] = 1    out[:, 1] = b    out[:, 2] = b*b    for i in range(3, len(e)):        out[:, i] = out[:, i-1] * b    return outdef f_safe_e_1(b, e):    out = np.empty((len(b), len(e)), prom[b.dtype])    out[:, 0] = 1    out[:, 1] = b    out[:, 2] = c = (b*b).astype(prom[b.dtype])    for i in range(3, len(e)):        c*=b        out[:, i] = c    return outdef f_safe_e_2(b, e):    out = np.empty((len(b), len(e)), prom[b.dtype])    out[:, 0] = 1    out[:, 1] = b    out[:, 2] = b*b    for i in range(3, len(e)):        out[:, i] = out[:, i-1] * b    return outdef f_bc_e_cheat(b, e):    return (b**e[:, None]).Tfor params in [(100, 5000, 4, np.int32, np.int32),    (100, 5000, 8, np.int32, np.int32),    (100, 5000, 4, float, np.int32),    (100, 5000, 8, float, np.int32)]:    k = params[0]    dat = data(*params)    ref = f_vander(dat[0][0], dat[1][0])    print('rep={} n_b={} n_e={} b_tp={} e_tp={}'.format(*params))    for name, func in list(globals().items()):        if not name.startswith('f_') or not isinstance(func, types.FunctionType): continue        try: assert np.allclose(ref, func(dat[0][0], dat[1][0])) if not func(dat[0][0], dat[1][0]).flags.c_contiguous:     print('cheat', end=' ') print("{:16s}{:16.8f} ms".format(name[2:], np.min(repeat(     'f(b.pop(), e.pop())', setup='b, e = data(*p)', globals={'f':func, 'data':data, 'p':params}, number=k)) * 1000 / k))        except: print("{:16s} apparently failed".format(name[2:]))

欢迎分享，转载请注明来源：内存溢出

原文地址: https://outofmemory.cn/zaji/5647948.html

计算范德蒙德矩阵的有效方法

发表评论

评论列表（0条）