在熊猫中有效地创建稀疏数据透视表？_随笔

在熊猫中有效地创建稀疏数据透视表？

@khammel先前发布的答案很有用，但不幸的是由于熊猫和Python的更改，该答案不再有效。以下应该产生相同的输出：

from scipy.sparse import csr_matrixfrom pandas.api.types import CategoricalDtypeperson_c = CategoricalDtype(sorted(frame.person.unique()), ordered=True)thing_c = CategoricalDtype(sorted(frame.thing.unique()), ordered=True)row = frame.person.astype(person_c).cat.prescol = frame.thing.astype(thing_c).cat.pressparse_matrix = csr_matrix((frame["count"], (row, col)),      shape=(person_c.categories.size, thing_c.categories.size))>>> sparse_matrix<3x4 sparse matrix of type '<class 'numpy.int64'>'     with 6 stored elements in Compressed Sparse Row format>>>> sparse_matrix.todense()matrix([[0, 1, 0, 1],        [1, 0, 0, 1],        [1, 0, 1, 0]], dtype=int64)dfs = pd.SparseDataframe(sparse_matrix,    index=person_c.categories,    columns=thing_c.categories,    default_fill_value=0)>>> dfs        a   b   c   d him    0   1   0   1  me    1   0   0   1 you    1   0   1   0

主要变化是：

```
.astype()
```
不再接受“分类”。您必须创建一个CategoricalDtype对象。
```
sort()
```
不再工作了

其他更改更肤浅：

使用类别大小而不是唯一的Series对象的长度，只是因为我不想不必要地制作另一个对象
```
csr_matrix
```
（
```
frame["count"]
```
）的数据输入不必是列表对象
熊猫
```
SparseDataframe
```
现在直接接受scipy.sparse对象

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5643985.html

在熊猫中有效地创建稀疏数据透视表？

发表评论

评论列表（0条）