Traceback (most recent call last): File "train_pairwise.py", line 238, in解决方法do_train() File "train_pairwise.py", line 116, in do_train paddle.distributed.init_parallel_env() File "/root/anaconda3/lib/python3.7/site-packages/paddle/distributed/parallel.py", line 196, in init_parallel_env parallel_helper._init_parallel_ctx() File "/root/anaconda3/lib/python3.7/site-packages/paddle/fluid/dygraph/parallel_helper.py", line 42, in _init_parallel_ctx __parallel_ctx__clz__.init() OSError: (External) Nccl error, unhandled cuda error (at /paddle/paddle/fluid/platform/collective_helper.cc:100)
我的cuda是10.2的 ,paddle版本是2.1.3
apt-get install libnccl2=2.5.6-1+cuda10.2 libnccl-dev=2.5.6-1+cuda10.2 find / -name "libnccl.so*" ln -s /usr/lib/x86_64-linux-gnu/libnccl.so.2.5.6 /usr/local/bin/libnccl.so export LD_LIBRARY_PATH=/usr/local/bin/:$LD_LIBRARY_PATH参考文献
[1].OSError: (External) Nccl error, unhandled cuda error (at /paddle/paddle/fluid/platform/collective_helper.cc:100). https://issueexplorer.com/issue/PaddlePaddle/PaddleDetection/4139
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)