- 问题描述
- 硬件及软件平台介绍
- 错误介绍
- 最终的解决方法
- 总结
最近在调试pytracking
的代码,打算运行一下里面的ATOM
和Dimp
算法,结果在编译里面的PrPooling的时候遇到了玄学报错,整整2天都没有解决,甚至一度打算用ROIAlign代替PrPooling,好在后来发现了问题的所在,解决了编译报错的问题。如果有遇到了同样错误的同学并且不想看分析过程的话可以直接跳转总结部分。
我是在Ubuntu18.04的基础上调试pytracking
代码的,所使用的显卡为GTX1660ti。
开发环境是基于torch1.2.0,torchvision0.4.0,cuda10.0和gcc7.5.0(之所以选用torch1.2.0和cuda10.0是因为之前看pytracking
的issue
中说过配置pytracking
的时候最好选用torch1.2.0和cuda10.0作为配置环境)
在编译PrPooling的时候,一直出现如下的报错:
Using /tmp/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/_prroi_pooling/build.ninja...
Building extension module _prroi_pooling...
[1/2] c++ -MMD -MF prroi_pooling_gpu.o.d -DTORCH_EXTENSION_NAME=_prroi_pooling -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/TH -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /home/zhihao/anaconda3/envs/pytracking/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c -o prroi_pooling_gpu.o
FAILED: prroi_pooling_gpu.o
c++ -MMD -MF prroi_pooling_gpu.o.d -DTORCH_EXTENSION_NAME=_prroi_pooling -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/TH -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /home/zhihao/anaconda3/envs/pytracking/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c -o prroi_pooling_gpu.o
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c: In function ‘at::Tensor prroi_pooling_forward_cuda(const at::Tensor&, const at::Tensor&, int, int, float)’:
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:37:35: error: expected primary-expression before ‘float’
stream, features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:37:59: error: expected primary-expression before ‘float’
stream, features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:37:85: error: expected primary-expression before ‘float’
stream, features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c: In function ‘at::Tensor prroi_pooling_backward_cuda(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float)’:
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:81:27: error: expected primary-expression before ‘float’
features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(), output_diff.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:81:51: error: expected primary-expression before ‘float’
features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(), output_diff.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:81:77: error: expected primary-expression before ‘float’
features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(), output_diff.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:81:108: error: expected primary-expression before ‘float’
features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(), output_diff.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:82:32: error: expected primary-expression before ‘float’
features_diff.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c: In function ‘at::Tensor prroi_pooling_coor_backward_cuda(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float)’:
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:112:27: error: expected primary-expression before ‘float’
features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(), output_diff.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:112:51: error: expected primary-expression before ‘float’
features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(), output_diff.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:112:77: error: expected primary-expression before ‘float’
features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(), output_diff.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:112:108: error: expected primary-expression before ‘float’
features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(), output_diff.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:113:28: error: expected primary-expression before ‘float’
coor_diff.data_ptr<float>(),
^~~~~
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 960, in _build_extension_module
check=True)
File "/home/zhihao/anaconda3/envs/pytracking/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/zhihao/anaconda3/envs/pytracking/lib/python3.7/code.py", line 90, in runcode
exec(code, self.locals)
File "", line 1, in <module>
File "/home/zhihao/Download/pycharm-community-2022.1/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/home/zhihao/Download/pycharm-community-2022.1/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/zhihao/code/PreciseRoIPooling/pytorch/tests/test_prroi_pooling2d.py", line 65, in <module>
test.test_forward()
File "/home/zhihao/code/PreciseRoIPooling/pytorch/tests/test_prroi_pooling2d.py", line 36, in test_forward
out = pool(features, rois)
File "/home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/prroi_pool.py", line 28, in forward
return prroi_pool2d(features, rois, self.pooled_height, self.pooled_width, self.spatial_scale)
File "/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/functional.py", line 44, in forward
_prroi_pooling = _import_prroi_pooling()
File "/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/functional.py", line 33, in _import_prroi_pooling
verbose=True
File "/home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 658, in load
is_python_module)
File "/home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 827, in _jit_compile
with_cuda=with_cuda)
File "/home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 880, in _write_ninja_file_and_build
_build_extension_module(name, build_directory, verbose)
File "/home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 973, in _build_extension_module
raise RuntimeError(message)
RuntimeError: Error building extension '_prroi_pooling'
当时以为是torch版本的问题,后来先后装了torch1.1.0,torch1.3.0,torch1.4.0这几个版本的torch,后来都没有解决问题。
后来看了一篇文章pytracking跟踪算法的配置(ubuntu版本)说是要用conda进行安装pytorch和torchvision,后来用conda进行安装,发现依然报错。
最后甚至将cuda的路径添加到./bashrc中,发现仍出现报错。
后来我仔细看了看报错的情况,发现主要报这样的错误
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:37:35: error: expected primary-expression before ‘float’
stream, features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(),
^~~~~
/home/zhihao/code/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c:37:59: error: expected primary-expression before ‘float’
stream, features.data_ptr<float>(), rois.data_ptr<float>(), output.data_ptr<float>(),
^~~~~
主要的报错就是error: expected primary-expression before ‘float’
,我突然想到难道是pytorch/prroi_pool/src/prroi_pooling_gpu.c
这个.c
文件有问题?
后来我又看到了赖子大佬的一篇文章
3090(30系显卡)编译prpooling出错的解决办法
。他在解决prpooling报错的时候主要是将所有的data
换为data_ptr
。而我又看了看
pytorch/prroi_pool/src/prroi_pooling_gpu.c
中的定义,发现里面全部都是data_ptr
,于是我就突然想到,如果将所有的data_ptr
全部改为data
,会不会有效果。于是我就将所有的data_ptr
全部改为data
,接着运行代码,发现代码成功运行。
Using /tmp/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /tmp/torch_extensions/_prroi_pooling/build.ninja...
Building extension module _prroi_pooling...
[1/3] /usr/local/cuda-10.0/bin/nvcc -DTORCH_EXTENSION_NAME=_prroi_pooling -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/TH -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /home/zhihao/anaconda3/envs/pytracking/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -std=c++11 -c /home/zhihao/code/pytracking/ltr/external/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu_impl.cu -o prroi_pooling_gpu_impl.cuda.o
[2/3] c++ -MMD -MF prroi_pooling_gpu.o.d -DTORCH_EXTENSION_NAME=_prroi_pooling -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/TH -isystem /home/zhihao/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/THC -isystem /usr/local/cuda-10.0/include -isystem /home/zhihao/anaconda3/envs/pytracking/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++11 -c /home/zhihao/code/pytracking/ltr/external/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c -o prroi_pooling_gpu.o
[3/3] c++ prroi_pooling_gpu.o prroi_pooling_gpu_impl.cuda.o -shared -L/usr/local/cuda-10.0/lib64 -lcudart -o _prroi_pooling.so
Loading extension module _prroi_pooling...
总结
修改上面报错的解决方法就是:将pytorch/prroi_pool/src/prroi_pooling_gpu.c
中所有的data_ptr
全部改为data
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)