Pandas和Unicode

Pandas和Unicode,第1张

Pandas和Unicode

似乎您的往返IS导致了一些unipre。不知道为什么会这样,但是很容易解决。您无法将unipre存储在python2的HDFStore表中(但是在python3中可以正常工作)。如果需要,您可以将其作为固定格式(将被腌制)。看这里。

In [33]: df = pd.read_json(s)In [25]: dfOut[25]:   args     date host kwargs     operation  status   thingy      time0   [] 2013-12-02 00:33:59  yy38.segm1.org     {}       x_gbinf    -101  a13yy38  0.0008011   [] 2013-12-02 00:33:59  kyy1.segm1.org     {}     x_initobj       1  a19kyy1  0.0032442   [] 2013-12-02 00:34:00  yy10.segm1.org     {}  x_gobjParams    -101  a14yy10  0.0022473   [] 2013-12-02 00:34:00  yy24.segm1.org     {}        gtfull    -101  a14yy24  0.0027874   [] 2013-12-02 00:34:00  yy24.segm1.org     {}       x_gbinf    -101  a14yy24  0.0010675   [] 2013-12-02 00:34:00  yy34.segm1.org     {}       gxyzinf    -101  a12yy34  0.0026526   [] 2013-12-02 00:34:00  yy15.segm1.org     {}     deletemfg       1  a15yy15  0.0043717   [] 2013-12-02 00:34:00  yy15.segm1.org     {}       gxyzinf    -101  a15yy15  0.000602[8 rows x 8 columns]In [26]: df.dtypesOut[26]: args      objectdate         datetime64[ns]host      objectkwargs    objectoperation objectstatus     int64thingy    objecttime     float64dtype: object

推断

object
dtyped
Series的实际类型。仅当至少1个字符串为unipre时,它们才会以unipre的形式出现(否则它们将被推断为string)

In [27]: df.apply(lambda x: pd.lib.infer_dtype(x.values))Out[27]: args unipredate         datetime64host uniprekwargs          unipreoperation       uniprestatus          integerthingy          unipretimefloatingdtype: object

这是“修复”它的方法

In [28]: types = df.apply(lambda x: pd.lib.infer_dtype(x.values))In [29]: types[types=='unipre']Out[29]: args         uniprehost         uniprekwargs       unipreoperation    uniprethingy       unipredtype: objectIn [30]: for col in types[types=='unipre'].index:   ....:     df[col] = df[col].astype(str)   ....:

看起来一样

In [31]: dfOut[31]:   args     date host kwargs     operation  status   thingy      time0   [] 2013-12-02 00:33:59  yy38.segm1.org     {}       x_gbinf    -101  a13yy38  0.0008011   [] 2013-12-02 00:33:59  kyy1.segm1.org     {}     x_initobj       1  a19kyy1  0.0032442   [] 2013-12-02 00:34:00  yy10.segm1.org     {}  x_gobjParams    -101  a14yy10  0.0022473   [] 2013-12-02 00:34:00  yy24.segm1.org     {}        gtfull    -101  a14yy24  0.0027874   [] 2013-12-02 00:34:00  yy24.segm1.org     {}       x_gbinf    -101  a14yy24  0.0010675   [] 2013-12-02 00:34:00  yy34.segm1.org     {}       gxyzinf    -101  a12yy34  0.0026526   [] 2013-12-02 00:34:00  yy15.segm1.org     {}     deletemfg       1  a15yy15  0.0043717   [] 2013-12-02 00:34:00  yy15.segm1.org     {}       gxyzinf    -101  a15yy15  0.000602[8 rows x 8 columns]

但现在可以正确推断。

In [32]: df.apply(lambda x: pd.lib.infer_dtype(x.values))Out[32]: args  stringdate         datetime64host  stringkwargsstringoperation        stringstatus          integerthingystringtimefloatingdtype: object


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5647762.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存