假设我有一个数据帧df如下: –
index company url address 0 A . www.abc.contact.com 16D BayBerry Rd,New bedford,MA,02740,USA 1 A . www.abc.contact.com . MA,USA 2 A . www.abc.about.com . USA 3 B . www.pqr.com . New bedford,USA 4 B. www.pqr.com/about . MA,USA
我想从数据框中删除所有行,其中地址是另一个地址的子集,公司是相同的.例如,我希望这5行中的这两行.
index company url address 0 A . www.abc.contact.com 16D BayBerry Rd,USA 3 B . www.pqr.com . New bedford,USA
最佳答案也许它不是最佳解决方案,但它可以在这个小型数据框架上工作:EDIT添加了对公司名称的检查,假设我们删除了标点符号
df = pd.DataFrame({"company": ['A','A','B','B'],"address": ['16D BayBerry Rd,USA','MA,'USA','New bedford,USA']})# Splitting addresses by column and making sets from every address to use "issubset" lateraddresses = List(df['address'].apply(lambda x: set(x.split(','))).values)companIEs = List(df['company'].values)rows_to_drop = [] # Storing row indexes to drop here# Iterating by every addressfor i,(address,company) in enumerate(zip(addresses,companIEs)): # Iteraing by the remaining addresses rem_addr = addresses[:i] + addresses[(i + 1):] rem_comp = companIEs[:i] + companIEs[(i + 1):] for other_addr,other_comp in zip(rem_addr,rem_comp): # If address is a subset of another address,add it to drop if address.issubset(other_addr) and company == other_comp: rows_to_drop.append(i) breakdf = df.drop(rows_to_drop)print(df)company address0 A 16D BayBerry Rd,USA3 B New bedford,USA
总结 以上是内存溢出为你收集整理的如何根据列值删除行,其中某行的列值是另一行的子集?全部内容,希望文章能够帮你解决如何根据列值删除行,其中某行的列值是另一行的子集?所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)