Pandas Dataframe:根据其地理坐标(经度和纬度)联接范围内的项目

Pandas Dataframe:根据其地理坐标(经度和纬度)联接范围内的项目,第1张

Pandas Dataframe:根据其地理坐标(经度和纬度)联接范围内的项目

您可以使用:

from math import radians, cos, sin, asin, sqrtdef haversine(lon1, lat1, lon2, lat2):    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])    # haversine formula     dlon = lon2 - lon1     dlat = lat2 - lat1     a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2    c = 2 * asin(sqrt(a))     r = 6371 # Radius of earth in kilometers. Use 3956 for miles    return c * r

首先需要交叉与加入

merge
,删除一行,在相同的价值观
city_x
,并
city_y
通过
booleanindexing

df['tmp'] = 1df = pd.merge(df,df,on='tmp')df = df[df.city_x != df.city_y]print (df)    city_x     lat_x     lng_x  tmp   city_y     lat_y     lng_y1   Berlin  52.52437  13.41053    1  Potsdam  52.39886  13.065662   Berlin  52.52437  13.41053    1  Hamburg  53.57532  10.015343  Potsdam  52.39886  13.06566    1   Berlin  52.52437  13.410535  Potsdam  52.39886  13.06566    1  Hamburg  53.57532  10.015346  Hamburg  53.57532  10.01534    1   Berlin  52.52437  13.410537  Hamburg  53.57532  10.01534    1  Potsdam  52.39886  13.06566

然后应用Haversine函数:

df['dist'] = df.apply(lambda row: haversine(row['lng_x'], row['lat_x'], row['lng_y'], row['lat_y']), axis=1)

滤镜距离:

df = df[df.dist < 500]print (df)    city_x     lat_x     lng_x  tmp   city_y     lat_y     lng_y        dist1   Berlin  52.52437  13.41053    1  Potsdam  52.39886  13.06566   27.2157042   Berlin  52.52437  13.41053    1  Hamburg  53.57532  10.01534  255.2237823  Potsdam  52.39886  13.06566    1   Berlin  52.52437  13.41053   27.2157045  Potsdam  52.39886  13.06566    1  Hamburg  53.57532  10.01534  242.4641206  Hamburg  53.57532  10.01534    1   Berlin  52.52437  13.41053  255.2237827  Hamburg  53.57532  10.01534    1  Potsdam  52.39886  13.06566  242.464120

而在去年创造

list
或获得
size
groupby

df1 = df.groupby('city_x')['city_y'].apply(list)print (df1)city_xBerlin     [Potsdam, Hamburg]Hamburg     [Berlin, Potsdam]Potsdam     [Berlin, Hamburg]Name: city_y, dtype: objectdf2 = df.groupby('city_x')['city_y'].size()print (df2)city_xBerlin     2Hamburg    2Potsdam    2dtype: int64

也可以使用

numpy haversinesolution

def haversine_np(lon1, lat1, lon2, lat2):    """    Calculate the great circle distance between two points    on the earth (specified in decimal degrees)    All args must be of equal length.    """    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])    dlon = lon2 - lon1    dlat = lat2 - lat1    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2    c = 2 * np.arcsin(np.sqrt(a))    km = 6367 * c    return kmdf['tmp'] = 1df = pd.merge(df,df,on='tmp')df = df[df.city_x != df.city_y]#print (df)df['dist'] = haversine_np(df['lng_x'],df['lat_x'],df['lng_y'],df['lat_y'])    city_x     lat_x     lng_x  tmp   city_y     lat_y     lng_y        dist1   Berlin  52.52437  13.41053    1  Potsdam  52.39886  13.06566   27.1986162   Berlin  52.52437  13.41053    1  Hamburg  53.57532  10.01534  255.0635413  Potsdam  52.39886  13.06566    1   Berlin  52.52437  13.41053   27.1986165  Potsdam  52.39886  13.06566    1  Hamburg  53.57532  10.01534  242.3118906  Hamburg  53.57532  10.01534    1   Berlin  52.52437  13.41053  255.0635417  Hamburg  53.57532  10.01534    1  Potsdam  52.39886  13.06566  242.311890


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5655403.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存