您可以使用:
from math import radians, cos, sin, asin, sqrtdef haversine(lon1, lat1, lon2, lat2): lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2]) # haversine formula dlon = lon2 - lon1 dlat = lat2 - lat1 a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2 c = 2 * asin(sqrt(a)) r = 6371 # Radius of earth in kilometers. Use 3956 for miles return c * r
首先需要交叉与加入
merge,删除一行,在相同的价值观
city_x,并
city_y通过
booleanindexing:
df['tmp'] = 1df = pd.merge(df,df,on='tmp')df = df[df.city_x != df.city_y]print (df) city_x lat_x lng_x tmp city_y lat_y lng_y1 Berlin 52.52437 13.41053 1 Potsdam 52.39886 13.065662 Berlin 52.52437 13.41053 1 Hamburg 53.57532 10.015343 Potsdam 52.39886 13.06566 1 Berlin 52.52437 13.410535 Potsdam 52.39886 13.06566 1 Hamburg 53.57532 10.015346 Hamburg 53.57532 10.01534 1 Berlin 52.52437 13.410537 Hamburg 53.57532 10.01534 1 Potsdam 52.39886 13.06566
然后应用Haversine函数:
df['dist'] = df.apply(lambda row: haversine(row['lng_x'], row['lat_x'], row['lng_y'], row['lat_y']), axis=1)
滤镜距离:
df = df[df.dist < 500]print (df) city_x lat_x lng_x tmp city_y lat_y lng_y dist1 Berlin 52.52437 13.41053 1 Potsdam 52.39886 13.06566 27.2157042 Berlin 52.52437 13.41053 1 Hamburg 53.57532 10.01534 255.2237823 Potsdam 52.39886 13.06566 1 Berlin 52.52437 13.41053 27.2157045 Potsdam 52.39886 13.06566 1 Hamburg 53.57532 10.01534 242.4641206 Hamburg 53.57532 10.01534 1 Berlin 52.52437 13.41053 255.2237827 Hamburg 53.57532 10.01534 1 Potsdam 52.39886 13.06566 242.464120
而在去年创造
list或获得
size有
groupby:
df1 = df.groupby('city_x')['city_y'].apply(list)print (df1)city_xBerlin [Potsdam, Hamburg]Hamburg [Berlin, Potsdam]Potsdam [Berlin, Hamburg]Name: city_y, dtype: objectdf2 = df.groupby('city_x')['city_y'].size()print (df2)city_xBerlin 2Hamburg 2Potsdam 2dtype: int64
也可以使用
numpy haversinesolution:
def haversine_np(lon1, lat1, lon2, lat2): """ Calculate the great circle distance between two points on the earth (specified in decimal degrees) All args must be of equal length. """ lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2]) dlon = lon2 - lon1 dlat = lat2 - lat1 a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2 c = 2 * np.arcsin(np.sqrt(a)) km = 6367 * c return kmdf['tmp'] = 1df = pd.merge(df,df,on='tmp')df = df[df.city_x != df.city_y]#print (df)df['dist'] = haversine_np(df['lng_x'],df['lat_x'],df['lng_y'],df['lat_y']) city_x lat_x lng_x tmp city_y lat_y lng_y dist1 Berlin 52.52437 13.41053 1 Potsdam 52.39886 13.06566 27.1986162 Berlin 52.52437 13.41053 1 Hamburg 53.57532 10.01534 255.0635413 Potsdam 52.39886 13.06566 1 Berlin 52.52437 13.41053 27.1986165 Potsdam 52.39886 13.06566 1 Hamburg 53.57532 10.01534 242.3118906 Hamburg 53.57532 10.01534 1 Berlin 52.52437 13.41053 255.0635417 Hamburg 53.57532 10.01534 1 Potsdam 52.39886 13.06566 242.311890
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)