Cythonize字符串的所有分割列表_随笔

Cythonize字符串的所有分割列表

Cython往往不会帮助您解决这类问题。它使用切片，其最终速度与纯Python相同（即实际上还不错）。

使用100个字符的长字节字符串（

b'0'*100

）和10000次迭代，

timeit

我得到：

您编写的代码-0.37秒
您的代码已编写但已用Cython编译-0.21s
您的代码行
```
cdef int i
```
并在Cython中编译-0.20s（可重复地是一个很小的改进。对于更长的字符串更重要）
您的
```
cdef int i
```
和参数键入为
```
bytes text
```
-0.28s（即更糟）。
直接使用Python C API可获得最佳速度（请参见下面的代码）-0.11s。为了方便起见，我选择主要在Cython中执行此 *** 作（但我自己调用API函数），但是您可以直接在C中编写非常相似的代码，而需要进行更多的手动错误检查。我已经为Python 3 API编写了此代码，并假设您使用的是字节对象（即
```
PyBytes
```
而不是
```
PyString
```
），因此，如果您使用的是Python 2，Unipre和Python 3，则必须对其进行一些更改。
```
from cpython cimport *
```
cdef extern from “Python.h”:
# This isn’t included in the cpython definitions
# using PyObject rather than object lets us control refcounting
PyObject
Py_BuildValue(const char*,…) except NULL

def split(text):
cdef Py_ssize_t l,i
cdef char* s
# Cython automatically checks the return value and raises an error if
# these fail. This provides a type-check on text
PyBytes_AsStringAndSize(text,&s,&l)
output = PyList_New(l)
for i in range(l):
# PyList_SET_ITEM steals a reference
# the casting is necessary to ensure that Cython doesn’t
# decref the result of Py_BuildValue
PyList_SET_ITEM(output,i,
如果您不想一路使用C API，那么预分配列表
```
output = [None]*len(text)
```
并进行for循环而不是列表理解的版本比原始版本的效率要高-0.18s

总而言之，仅在Cython中进行编译即可使您获得不错的速度（略低于2倍），并

稍微设置了帮助类型。传统上，这是使用Cython可以真正实现的所有功能。为了获得最快的速度，您基本上需要直接使用Python
C API。这样会使您的速度提高4倍以下，我认为这相当不错。

欢迎分享，转载请注明来源：内存溢出

原文地址: https://outofmemory.cn/zaji/5648121.html

Cythonize字符串的所有分割列表

发表评论

评论列表（0条）