实验室GPU *** 作记录20211020

实验室GPU *** 作记录20211020,第1张

实验室GPU *** 作记录20211020 1013维修后, *** 作记录 第一次运行

我用的是用户xxxy2080_10

首先,用我的电脑VNCviewer打开服务器,发现只能显示一部分屏幕,而且,当我把窗口放大发现,这部分屏幕也会被放大。总之,就是看不了完整的整张屏幕。解决方法是,带参数启动vncviewer:

需要先将VNCviewer加入path。然后

vncviewer --FullScreen=1

然后,根据教程来

下载python,Anconada3
// 创建screen会话
screen -S python

// 我看到Home文件夹中已经有了,所以有些不用再下了
// 下载python3.7.10
wget -c https://www.python.org/ftp/python/3.7.10/Python-3.7.10.tar.xz

// 下载anaconda3
wget -c https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
安装python
// 下面不空行的,是解压python,安装python
tar vxf Python-3.7.10.tar.xz 
cd Python-3.7.10/
// 注意,下面prefix的参数使用命令 pwd 获取
./configure --prefix=/xxxy2080hppc/xxxy2080_10/Python-3.7.10
make &&  make altinstall 

// 添加环境变量
vim ~/.bash_profile
// 下面写入之前,依次: i(切换输入模式)-写入下面的-esc(切换命令模式)-:wq(保存退出)
:/xxxy2080hppc/xxxy2080_10/Python-3.7.10/bin 
// 使环境变量生效
source ~/.bash_profile 
python3.7 -V 		// Python 3.7.10
pycharm

看到系统路径里已经有了pycharm,于是试了一下

// 首先,系统路径里有这些:
PATH=$PATH:$HOME/.local/bin:$HOME/bin:/usr/local/cuda-10.2/bin:/usr/local/TensorRT-7.1.3.4/lib:/xxxy2080hppc/xxxy2080_10/pycharm-community-2021.2.2/bin/:/xxxy2080hppc/xxxy2080_10/Python-3.7.10/bin

使用教程给出的命令是:
cd pycharm-community-2021.2.2/bin/ 
sh pycharm.sh 
安装anaconda
// 安装anaconda
// 首先修改Anaconda3-2021.05-Linux-x86_64.sh脚本为可执行脚本。 
chmod u+x Anaconda3-2021.05-Linux-x86_64.sh 

./Anaconda3-2021.05-Linux-x86_64.sh
出现提示"Permission denied"
进入文件存放地址,我的在Home中,右击设置属性-Permission,将后面两个属性改为读和写
再次运行上面的命令,就可以了

Please answer 'yes' or 'no':'
>>> yes

// 接下来这个命令好像是创建安装目录
/share/nishome/20070104_5/anaconda3
报错:mkdir: cannot create directory u2018/shareu2019: Permission denied

// 上面这一步是使用共享文件安装,但是没有权限创建文件夹,故报错,解决方法是
[/xxxy2080hppc/xxxy2080_10/anaconda3] >>> (直接按Enter)

(这一步没有成功,探索性的)获取权限,参考这个帖子,建议不光要读博文,也要读评论,慎重!!!

出现Permission denied的解决办法(750权限谨慎使用)

// 获取权限
sudo chmod -R 750 share
//这时会显示以下内容,这里直接翻译一下:
我们相信您已经收到了当地系统管理员的常规讲座。 通常归结为以下三点:
#1) 尊重他人的隐私。
#2) 打字前请三思。
#3) 能力越大,责任越大。
// 紧接着,需要输入密码,然后,并没有成功。它提示
xxxy2080_10 不在 sudoers 文件中。 此事件将被报告。

添加anaconda环境变量:

// 添加环境变量
vim ~/.bash_profile
// 下面写入之前,依次: i(切换输入模式)-写入下面的-esc(切换命令模式)-:wq(保存退出)
:/xxxy2080hppc/xxxy2080_10/anaconda3/bin 
// 使环境变量生效
source ~/.bash_profile 

// 下面这句: 安装完conda一定要执行,否则会导致VNC黑屏
(base) [xxxy2080_10@xxxy2080 ~]$ conda config --set auto_activate_base false
(base) [xxxy2080_10@xxxy2080 ~]$ anaconda -V
anaconda Command line client (version 1.7.2)

至此,python 3.7.10,anaconda 3,pycharm 安装完毕!

安装tf-1
// 这儿可能需要先添加一下源,清华源.参照教程中来

// 安装TensorFlow 1.15
conda create -n tf_1 python=3.7.10 tensorflow-gpu=1.15.0

安装完毕后,测试tf -1 环境是否可用

(base) [xxxy2080_10@xxxy2080 ~]$ conda env list
# conda environments:
#
base                  *  /xxxy2080hppc/xxxy2080_10/anaconda3
tf_1                     /xxxy2080hppc/xxxy2080_10/anaconda3/envs/tf_1

(base) [xxxy2080_10@xxxy2080 ~]$ conda activate tf_1
(tf_1) [xxxy2080_10@xxxy2080 ~]$ python
Python 3.7.10 (default, Jun  4 2021, 14:48:32) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'1.15.0'
>>> tf.test.is_gpu_available()
2021-10-19 15:53:58.720085: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
2021-10-19 15:53:58.749006: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2021-10-19 15:53:58.760656: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a5ff66e410 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-10-19 15:53:58.760754: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-10-19 15:53:58.763249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-10-19 15:53:59.777727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:18:00.0
2021-10-19 15:53:59.779281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:3b:00.0
2021-10-19 15:53:59.780800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 2 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:86:00.0
2021-10-19 15:53:59.782295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 3 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:af:00.0
2021-10-19 15:53:59.782762: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-10-19 15:53:59.784861: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-10-19 15:53:59.787020: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-10-19 15:53:59.787481: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-10-19 15:53:59.789899: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-10-19 15:53:59.791726: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-10-19 15:53:59.797119: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-10-19 15:53:59.807201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1, 2, 3
2021-10-19 15:53:59.807265: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-10-19 15:53:59.814790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-10-19 15:53:59.814839: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1 2 3 
2021-10-19 15:53:59.814892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N N N N 
2021-10-19 15:53:59.814906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   N N N N 
2021-10-19 15:53:59.814938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 2:   N N N N 
2021-10-19 15:53:59.814971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 3:   N N N N 
2021-10-19 15:53:59.822682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 10312 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:18:00.0, compute capability: 7.5)
2021-10-19 15:53:59.826405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:1 with 10312 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:3b:00.0, compute capability: 7.5)
2021-10-19 15:53:59.829359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:2 with 10312 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:86:00.0, compute capability: 7.5)
2021-10-19 15:53:59.832894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:3 with 10312 MB memory) -> physical GPU (device: 3, name: GeForce RTX 2080 Ti, pci bus id: 0000:af:00.0, compute capability: 7.5)
2021-10-19 15:53:59.836824: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a60163e770 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-10-19 15:53:59.836863: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2021-10-19 15:53:59.836878: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5
2021-10-19 15:53:59.836892: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): GeForce RTX 2080 Ti, Compute Capability 7.5
2021-10-19 15:53:59.836906: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): GeForce RTX 2080 Ti, Compute Capability 7.5
True

至此,tf 1.15.0安装成功

传输文件

XFTP新建会话后,设置如下:

连接名及远端主机: 问我我告诉你

后面两项的账号密码: 使用的用户服务器账号和密码

查看配置信息

















服务器维修前, *** 作记录 登录

ssh登录,使用cmd

输入yes:

输入密码:

(选做)查看:版本

查看NVIDIA版本:

查看GPU状态:

拷贝conda本地环境至实验室Gpu:

参考链接:conda环境迁移到其他机器上
anaconda使用教程+直接环境拷贝移植所遇到的问题解决博文最后

导出本地环境

本地端 *** 作,先是激活本地使用的环境。

然后,导出conda安装的包记录。

然后,导出pip安装的包记录。

将导出环境传输给server

server端部署环境 (选做)查看:出现问题 磁盘不足了

偶然发现:pytorch直接可用

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/4682632.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-11-07
下一篇 2022-11-07

发表评论

登录后才能评论

评论列表(0条)

保存