我认为这是您想要的:
data = np.array([[ 4057, 8, 1374], [ 4057, 9, 759], [ 4057, 11, 96], [89205, 16, 146], [89205, 17, 154], [89205, 18, 244]])rows, row_pos = np.unique(data[:, 0], return_inverse=True)cols, col_pos = np.unique(data[:, 1], return_inverse=True)pivot_table = np.zeros((len(rows), len(cols)), dtype=data.dtype)pivot_table[row_pos, col_pos] = data[:, 2]>>> pivot_tablearray([[1374, 759, 96, 0, 0, 0], [ 0, 0, 0, 146, 154, 244]])>>> rowsarray([ 4057, 89205])>>> colsarray([ 8, 9, 11, 16, 17, 18])
这种方法有一些局限性,主要是,如果您对相同的行/列组合重复输入,则不会将它们加在一起,而只会保留一个(可能是最后一个)。如果您想将它们全部加在一起,尽管有些麻烦,但是您可能会滥用scipy的稀疏模块:
data = np.array([[ 4057, 8, 1374], [ 4057, 9, 759], [ 4057, 11, 96], [89205, 16, 146], [89205, 17, 154], [89205, 18, 244], [ 4057, 11, 4]])rows, row_pos = np.unique(data[:, 0], return_inverse=True)cols, col_pos = np.unique(data[:, 1], return_inverse=True)pivot_table = np.zeros((len(rows), len(cols)), dtype=data.dtype)pivot_table[row_pos, col_pos] = data[:, 2]>>> pivot_table # the element at [0, 2] should be 100!!!array([[1374, 759, 4, 0, 0, 0], [ 0, 0, 0, 146, 154, 244]])import scipy.sparse as spspivot_table = sps.coo_matrix((data[:, 2], (row_pos, col_pos)), shape=(len(rows), len(cols))).A>>> pivot_table # now repeated elements are added togetherarray([[1374, 759, 100, 0, 0, 0], [ 0, 0, 0, 146, 154, 244]])
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)