System Version:CentOS 7.9.2009
内核版本:Linux localhost.localdomain 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
条件禁止 Nouveau
安装 kernel-devel 或 kernel-source、kernel-header;
安装 binutils,路径/usr/bin/ld;
1.安装依赖yum -y install epel-release
yum -y install gcc binutils wget
yum -y install kernel-devel
2.禁用Nouveau 2.1.检查是否开启Nouveaulsmod | grep nouveau
注意:无信息输出表示已被禁用无需在 *** 作以下步骤;
2.2.修改配置echo -e “blacklist nouveau\noptions nouveau modeset=0” > /etc/modprobe.d/blacklist.conf
2.3.备份imgmv /boot/initramfs- ( u n a m e − r ) . i m g / b o o t / i n i t r a m f s − (uname -r).img /boot/initramfs- (uname−r).img/boot/initramfs−(uname -r).img.bak
2.4.重建dracut /boot/initramfs-$(uname -r).img $(uname -r)
2.5.重启系统reboot
2.6.检查是否关闭lsmod | grep nouveau
注意:无任何信息输出表示禁用成功;
3.检查驱动 3.1.安装elrepo源rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
或者
yum -y install https://www.elrepo.org/elrepo-release-7.0-4.el7.elrepo.noarch.rpm
3.2.安装nvidia-detectyum -y install nvidia-detect
3.3.检测显卡驱动nvidia-detect -v
Probing for supported NVIDIA devices...
[10de:1b06] NVIDIA Corporation GP102 [GeForce GTX 1080 Ti]
This device requires the current 510.60.02 NVIDIA driver kmod-nvidia
4.驱动安装
4.1.下载驱动
wget https://us.download.nvidia.cn/XFree86/Linux-x86_64/510.68.02/NVIDIA-Linux-x86_64-510.68.02.run
注意:如果检测出来版本号与我的不一致可以自行替换部分
建议:自行在英伟达官网下载到U盘,拷贝到服务器
注意:英伟达只会提供最新版本的,可以向下兼容所以我这里安装的是510.68.02,不是510.60.02
4.2.授权chmod +x NVIDIA-Linux-x86_64-510.68.02.run
这里会报错,需要关闭 X 服务
# 查看是否为gdm(一共有两种,本无服务器属于gdm)
systemctl --all|grep gdm
whereis gdm
systemctl stop gdm.service
# 安装驱动
systemctl start gdm.service
4.3.安装
sh ./NVIDIA-Linux-x86_64-510.68.02.run -s
4.4.查看显卡信息nvidia-smi
注意:信息输出表示显卡驱动已经安装完成
另外:我还配套安装了
python 3.9.11
pytorch1.11.0
tensorflow-gpu 2.7.0
transformers 4.18.0
cuda 11.3
cudnn 8.2.0
都是最新版,并成功试运行了
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02 Driver Version: 510.68.02 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 49% 82C P2 246W / 250W | 8944MiB / 11264MiB | 99% Default |
| | | N/A |
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 10400 G /usr/bin/X 84MiB |
| 0 N/A N/A 23147 G /usr/bin/gnome-shell 84MiB |
| 0 N/A N/A 29312 C python 8771MiB |
+-----------------------------------------------------------------------------+
5.卸载驱动
5.1.卸载安装
nvidia-uninstall
5.2.清理安装dkms remove
注意:需要安装“yum -y install dkms”
6.常见错误1.安装时报错“ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on
Red Hat Linux systems, for example, be sure you have the ‘kernel-source’ or ‘kernel-devel’ RPM installed. If you know the correct kernel source files are installed, you may specify the
kernel source path with the ‘–kernel-source-path’ command line option.”
解决办法:安装内核库
yum -y install epel-release
yum -y install kernel-devel
内核版本对比
rpm -qa |grep kernel
uname -r
安装驱动
./NVIDIA-Linux-x86_64-510.68.02.run --kernel-source-path=/usr/src/kernels/3.10.0-1160.42.2.el7.x86_64 -k $(uname -r)
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)