在Python中的两个列表数组中查找最近的项目_随笔

在Python中的两个列表/数组中查找最近的项目

编辑2

KDTree

如果您可以选择多个邻居来保证阵列中的每个项目都具有唯一的邻居，则使用的解决方案可以表现良好。使用以下代码：

def nearest_neighbors_kd_tree(x, y, k) :    x, y = map(np.asarray, (x, y))    tree =scipy.spatial.cKDTree(y[:, None])        ordered_neighbors = tree.query(x[:, None], k)[1]    nearest_neighbor = np.empty((len(x),), dtype=np.intp)    nearest_neighbor.fill(-1)    used_y = set()    for j, neigh_j in enumerate(ordered_neighbors) :        for k in neigh_j : if k not in used_y :     nearest_neighbor[j] = k     used_y.add(k)     break    return nearest_neighbor

和一个

n=1000

点样本，我得到：

In [9]: np.any(nearest_neighbors_kd_tree(x, y, 12) == -1)Out[9]: TrueIn [10]: np.any(nearest_neighbors_kd_tree(x, y, 13) == -1)Out[10]: False

因此，最佳值为

k=13

，然后时间为：

In [11]: %timeit nearest_neighbors_kd_tree(x, y, 13)100 loops, best of 3: 9.26 ms per loop

但在最坏的情况下，您可能需要

k=1000

，然后：

In [12]: %timeit nearest_neighbors_kd_tree(x, y, 1000)1 loops, best of 3: 424 ms per loop

这比其他选项要慢：

In [13]: %timeit nearest_neighbors(x, y)10 loops, best of 3: 60 ms per loopIn [14]: %timeit nearest_neighbors_sorted(x, y)10 loops, best of 3: 47.4 ms per loop

编辑在搜索之前对数组进行排序可以得到1000个以上的数组的数组：

def nearest_neighbors_sorted(x, y) :    x, y = map(np.asarray, (x, y))    y_idx = np.argsort(y)    y = y[y_idx]    nearest_neighbor = np.empty((len(x),), dtype=np.intp)    for j, xj in enumerate(x) :        idx = np.searchsorted(y, xj)        if idx == len(y) or idx != 0 and y[idx] - xj > xj - y[idx-1] : idx -= 1        nearest_neighbor[j] = y_idx[idx]        y = np.delete(y, idx)        y_idx = np.delete(y_idx, idx)    return nearest_neighbor

具有10000个元素的长数组：

In [2]: %timeit nearest_neighbors_sorted(x, y)1 loops, best of 3: 557 ms per loopIn [3]: %timeit nearest_neighbors(x, y)1 loops, best of 3: 1.53 s per loop

对于较小的阵列，其性能稍差一些。

如果只丢弃重复项，则必须遍历所有项以实现贪婪的最近邻居算法。考虑到这一点，这是我能想到的最快的方法：

def nearest_neighbors(x, y) :    x, y = map(np.asarray, (x, y))    y = y.copy()    y_idx = np.arange(len(y))    nearest_neighbor = np.empty((len(x),), dtype=np.intp)    for j, xj in enumerate(x) :        idx = np.argmin(np.abs(y - xj))        nearest_neighbor[j] = y_idx[idx]        y = np.delete(y, idx)        y_idx = np.delete(y_idx, idx)    return nearest_neighbor

现在有：

n = 1000x = np.random.rand(n)y = np.random.rand(2*n)

我得到：

In [11]: %timeit nearest_neighbors(x, y)10 loops, best of 3: 52.4 ms per loop

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5631899.html

在Python中的两个列表数组中查找最近的项目

发表评论

评论列表（0条）