从pandas数据框中选择包含某些值的行_随笔

从pandas数据框中选择包含某些值的行介绍

在选择行的核心，我们需要一个1D蒙版或一个熊猫系列的长度与of的长度相同的布尔元素

df

，我们称它为

mask

。因此，最后

df[mask]

，我们将在boolean-
indexing

df

之后删除选定的行。

这是我们的开始

df

：

In [42]: dfOut[42]:         A       B      C1   apple  banana   pear2    pear    pear  apple3  banana    pear   pear4   apple   apple   pear

I.匹配一个字符串

现在，如果我们只需要匹配一个字符串，则可以使用元素级等式直截了当：

In [42]: df == 'banana'Out[42]:        A      B      C1  False   True  False2  False  False  False3   True  False  False4  False  False  False

如果我们需要

ANY

在每一行中查找一个匹配项，请使用

.any

method：

In [43]: (df == 'banana').any(axis=1)Out[43]: 1     True2    False3     True4    Falsedtype: bool

要选择相应的行：

In [44]: df[(df == 'banana').any(axis=1)]Out[44]:         A       B     C1   apple  banana  pear3  banana    pear  pear

二。匹配多个字符串

1.搜索

ANY

比赛

这是我们的开始

df

：

In [42]: dfOut[42]:         A       B      C1   apple  banana   pear2    pear    pear  apple3  banana    pear   pear4   apple   apple   pear

NumPy

np.isin

可以在此处工作（或使用其他文章中列出的pandas.isin）从中的搜索字符串列表中获取所有匹配项

df

。所以，说我们正在寻找

'pear'

或

'apple'

在

df

：

In [51]: np.isin(df, ['pear','apple'])Out[51]: array([[ True, False,  True],       [ True,  True,  True],       [False,  True,  True],       [ True,  True,  True]])# ANY match along each rowIn [52]: np.isin(df, ['pear','apple']).any(axis=1)Out[52]: array([ True,  True,  True,  True])# Select corresponding rows with maskingIn [56]: df[np.isin(df, ['pear','apple']).any(axis=1)]Out[56]:         A       B      C1   apple  banana   pear2    pear    pear  apple3  banana    pear   pear4   apple   apple   pear

2.搜索

ALL

比赛

这是我们的新起点

df

：

In [42]: dfOut[42]:         A       B      C1   apple  banana   pear2    pear    pear  apple3  banana    pear   pear4   apple   apple   pear

因此，现在我们正在寻找具有

BOTH

说法的行

['pear','apple']

。我们将利用

NumPy-broadcasting

：

In [66]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1)Out[66]: array([[ True,  True],       [ True,  True],       [ True, False],       [ True,  True]])

因此，我们有一个

项目搜索列表，因此我们有一个带有

number of rows = len(df)

和的2D蒙版

number of cols =number of search items

。因此，在以上结果中，我们有第一个col用于

'pear'

和第二个col用于

'apple'

。

为了使事情具体，让我们为三个项目准备一个面具

['apple','banana', 'pear']

：

In [62]: np.equal.outer(df.to_numpy(copy=False),  ['apple','banana', 'pear']).any(axis=1)Out[62]: array([[ True,  True,  True],       [ True, False,  True],       [False,  True,  True],       [ True, False,  True]])

此蒙版的列分别用于

'apple','banana', 'pear'

。

回到

搜索项目的情况，我们之前有：

In [66]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1)Out[66]: array([[ True,  True],       [ True,  True],       [ True, False],       [ True,  True]])

由于，我们

ALL

在每一行中寻找匹配项：

In [67]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1).all(axis=1)Out[67]: array([ True,  True, False,  True])

最后，选择行：

In [70]: df[np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1).all(axis=1)]Out[70]:        A       B      C1  apple  banana   pear2   pear    pear  apple4  apple   apple   pear

欢迎分享，转载请注明来源：内存溢出

原文地址: https://outofmemory.cn/zaji/5629779.html

从pandas数据框中选择包含某些值的行

发表评论

评论列表（0条）