使用逗号和负数将Pandas Dataframe转换为Float

使用逗号和负数将Pandas Dataframe转换为Float,第1张

使用逗号和负数将Pandas Dataframe转换为Float

看来您需要清空:

replace
,``strings

print (df)2016-10-31    2,144.782016-07-31    2,036.622016-04-30    1,916.602016-01-31    1,809.402015-10-31    1,711.972016-01-31    6,667.222015-01-31    5,373.592014-01-31    4,071.002013-01-31    3,050.202016-09-30       -0.062016-06-30       -1.882016-03-31 2015-12-31       -0.132015-09-30 2015-12-31       -0.142014-12-31        0.072013-12-3102012-12-310Name: val, dtype: objectprint (pd.to_numeric(df.str.replace(',',''), errors='coerce'))2016-10-31    2144.782016-07-31    2036.622016-04-30    1916.602016-01-31    1809.402015-10-31    1711.972016-01-31    6667.222015-01-31    5373.592014-01-31    4071.002013-01-31    3050.202016-09-30      -0.062016-06-30      -1.882016-03-31        NaN2015-12-31      -0.132015-09-30        NaN2015-12-31      -0.142014-12-31       0.072013-12-31       0.002012-12-31       0.00Name: val, dtype: float64

编辑

如果采用追加,则有可能

dtype
第一
df
float
和第二
object
,因此需要投以
str
第一,因为得到的混合
Dataframe
-例如,第一行是
type

float
行和最后一行是
strings


print (pd.to_numeric(df.astype(str).str.replace(',',''), errors='coerce'))

也可以

types
通过以下方式检查:

print (df.apply(type))2016-09-30    <class 'float'>2016-06-30    <class 'float'>2015-12-31    <class 'float'>2014-12-31    <class 'float'>2014-01-31      <class 'str'>2013-01-31      <class 'str'>2016-09-30      <class 'str'>2016-06-30      <class 'str'>2016-03-31      <class 'str'>2015-12-31      <class 'str'>2015-09-30      <class 'str'>2015-12-31      <class 'str'>2014-12-31      <class 'str'>2013-12-31      <class 'str'>2012-12-31      <class 'str'>Name: val, dtype: object

编辑1:

如果需要将解决方案应用于所有

Dataframe
使用领域
apply

df1 = df.apply(lambda x: pd.to_numeric(x.astype(str).str.replace(',',''), errors='coerce'))print (df1) Revenue  Other, NetDate     2016-09-30    24.73       -0.062016-06-30    18.73       -1.882016-03-31    17.56         NaN2015-12-31    29.14       -0.132015-09-30    22.67         NaN2015-12-31    95.85       -0.142014-12-31    84.58        0.072013-12-31    58.33        0.002012-12-31    29.63        0.002016-09-30   243.91       -0.802016-06-30   230.77       -1.122016-03-31   216.58        1.322015-12-31   206.23       -0.052015-09-30   192.82       -0.342015-12-31   741.15       -1.372014-12-31   556.28       -1.902013-12-31   414.51       -1.482012-12-31   308.82        0.102016-10-31  2144.78       41.982016-07-31  2036.62       35.002016-04-30  1916.60      -11.662016-01-31  1809.40       27.092015-10-31  1711.97       -3.442016-01-31  6667.22       14.132015-01-31  5373.59      -18.692014-01-31  4071.00       -4.872013-01-31  3050.20       -5.70

print(df1.dtypes)Revenue       float64Other, Net    float64dtype: object

但是如果只需要转换

Dataframe
使用
subset
和的某些列
apply

cols = ['Revenue', ...]df[cols] = df[cols].apply(lambda x: pd.to_numeric(x.astype(str)       .str.replace(',',''), errors='coerce'))print (df) Revenue Other, NetDate    2016-09-30    24.73      -0.062016-06-30    18.73      -1.882016-03-31    17.562015-12-31    29.14      -0.132015-09-30    22.672015-12-31    95.85      -0.142014-12-31    84.58       0.072013-12-31    58.33          02012-12-31    29.63          02016-09-30   243.91       -0.82016-06-30   230.77      -1.122016-03-31   216.58       1.322015-12-31   206.23      -0.052015-09-30   192.82      -0.342015-12-31   741.15      -1.372014-12-31   556.28       -1.92013-12-31   414.51      -1.482012-12-31   308.82        0.12016-10-31  2144.78      41.982016-07-31  2036.62         352016-04-30  1916.60     -11.662016-01-31  1809.40      27.092015-10-31  1711.97      -3.442016-01-31  6667.22      14.132015-01-31  5373.59     -18.692014-01-31  4071.00      -4.872013-01-31  3050.20       -5.7

print(df.dtypes)Revenue       float64Other, Net     objectdtype: object

编辑2:

您的红利问题的解决方案:

df = pd.Dataframe({'A':['q','e','r'],        'B':['4','5','q'],        'C':[7,8,9.0],        'D':['1,000','3','50,000'],        'E':['5','3','6'],        'F':['w','e','r']})print (df)   A  B    C       D  E  F0  q  4  7.0   1,000  5  w1  e  5  8.0       3  3  e2  r  q  9.0  50,000  6  r#first apply original solutiondf1 = df.apply(lambda x: pd.to_numeric(x.astype(str).str.replace(',',''), errors='coerce'))print (df1)   A    B    C      D  E   F0 NaN  4.0  7.0   1000  5 NaN1 NaN  5.0  8.0      3  3 NaN2 NaN  NaN  9.0  50000  6 NaN#mask where all columns are NaN - string columnsmask = df1.isnull().all()print (mask)A     TrueB    FalseC    FalseD    FalseE    FalseF     Truedtype: bool#replace NaN to string columnsdf1.loc[:, mask] = df1.loc[:, mask].combine_first(df)print (df1)   A    B    C      D  E  F0  q  4.0  7.0   1000  5  w1  e  5.0  8.0      3  3  e2  r  NaN  9.0  50000  6  r


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5645556.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存