皇冠新体育APP

IT技术之家

皇冠新体育APP > 人工智能

人工智能

2021-07-12 在 GeForce RTX 3090上配置深度学习环境 cuda 11.1 + tensorflow2.5.0 + python3.8.3_device interconnect

推送时光:2023-08-24 16:41:36 人工智能 45次 标签:皇冠新体育APP:tensorflow 深度学习 pytorch
GeForce RTX 3090配置环境的过程遇到了很多问题,最后成功配置的版本如下tensorflow-gpu 2.5.0cudnn 8.1.0.77python 3.8.3cuda 11.1参考的版本对应关系如图成功安装的细节安装tensorflow-gpu 2.5.0conda activate 虚拟环境名字pip install tensorflow-gpu==2.5.0 # conda install tensorflow-gpu=2.5.0 会找不到...

本博客配置成功的环境已经导出 至
//download.csdn.net/download/Julse/20687132?spm=1001.2014.3001.5501

文章目录

成功安装的细节安装tensorflow-gpu 2.5.0安装keras安装 cudnn 问题1 -测试tensorflow是否安装成功问题2 tensorflow 和tensorlow-gpu问题3 conda 的多个数据源里面都没有 tensorflow-gpu=2.5.0,但是pip里面有问题4 tensorflow是gpu版本,keras是否也要指定gpu版本呢?问题5 tensorflow2.5和keras2.4.3可能不兼容问题6 cudnn 报错安装其他版本cuda未解决的问题其他问题:
GeForce RTX 3090
配置环境的过程遇到了很多问题,最后成功配置的版本如下,亲测可用

tensorflow-gpu 2.5.0cudnn 8.1.0.77python 3.8.3cuda 11.1

参考的版本对应关系如图
//www.tensorflow.org/install/source

成功安装的细节

安装tensorflow-gpu 2.5.0

conda activate 虚拟环境名字
pip install tensorflow-gpu==2.5.0 # conda install tensorflow-gpu==2.5.0 如果找不到 
检查报告有没连接取得胜利,显现了/device:GPU:0 可爱字眼,安心连接下一个步骤
>>> tf.__version__
'2.5.0'
>>> tf.test.gpu_device_name()

出现如下字样
'/device:GPU:0'
没有再出现skip gpu...


2021-11-21 09:11:11.576578: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-11-21 09:11:11.586861: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-11-21 09:11:12.301775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: 
pciBusID: 0000:3b:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-11-21 09:11:12.302507: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties: 
pciBusID: 0000:5e:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-11-21 09:11:12.303282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 2 with properties: 
pciBusID: 0000:b1:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-11-21 09:11:12.303954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 3 with properties: 
pciBusID: 0000:d9:00.0 name: GeForce RTX 3090 computeCapability: 8.6
coreClock: 1.695GHz coreCount: 82 deviceMemorySize: 23.70GiB deviceMemoryBandwidth: 871.81GiB/s
2021-11-21 09:11:12.304010: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-11-21 09:11:12.322885: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-11-21 09:11:12.322999: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-11-21 09:11:12.337252: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-11-21 09:11:12.342694: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-11-21 09:11:12.348923: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-11-21 09:11:12.354146: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-11-21 09:11:12.356096: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-11-21 09:11:12.362607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0, 1, 2, 3
2021-11-21 09:11:12.363212: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-11-21 09:11:17.044150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-11-21 09:11:17.044213: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0 1 2 3 
2021-11-21 09:11:17.044240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N N N N 
2021-11-21 09:11:17.044245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 1:   N N N N 
2021-11-21 09:11:17.044249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 2:   N N N N 
2021-11-21 09:11:17.044254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 3:   N N N N 
2021-11-21 09:11:17.050196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:0 with 3793 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3090, pci bus id: 0000:3b:00.0, compute capability: 8.6)
2021-11-21 09:11:17.053391: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:1 with 3665 MB memory) -> physical GPU (device: 1, name: GeForce RTX 3090, pci bus id: 0000:5e:00.0, compute capability: 8.6)
2021-11-21 09:11:17.054353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:2 with 3663 MB memory) -> physical GPU (device: 2, name: GeForce RTX 3090, pci bus id: 0000:b1:00.0, compute capability: 8.6)
2021-11-21 09:11:17.055315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/device:GPU:3 with 3771 MB memory) -> physical GPU (device: 3, name: GeForce RTX 3090, pci bus id: 0000:d9:00.0, compute capability: 8.6)
'/device:GPU:0'

安装keras

pip install keras
在激活码conda虚拟游戏区域的具体条件下,tensorflow用pip命令提示符布置,keras也用pip布置,以免conda会再布置一款 tensorflow,引起摩擦 编码中有的keras调成tensorflow.keras, keras包但是不就用干了,这些包没这个必要再装了。

安装 cudnn

conda install -c nvidia cudnn=8.1.0

问题1 -测试tensorflow是否安装成功

或许有博客日志说这一个报错能真接被忽视,只是亲测gpu无非采用,说明书没了配置好
import tensorflow as tf
tf.test.gpu_device_name()
报错资料
I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags

解决思路,先查询了一下oneDNN是什么
//01.org/onednn
后来发现其实就是tensorfow没有安装正确,需要卸载重新安装

看到有文章说可以忽略,但是gpu无法成功使用,仅仅是不把报错信息显示出来而已
//blog.csdn.net/qq_39096123/article/details/100575784

问题2 tensorflow 和tensorlow-gpu

游戏管方网建设中谈及,最早的时候ios的版本任何事物图片软件包是错开的,由于就看作直接性进行怎么安装tensorlow 2.5 ios的版本就说了,实际情况上发现了,用cpu编译的tensorflow,gpu上进行怎么安装难以成功创业

对比 //www.jianshu.com/p/e772b880b4d2 手机查看tensorflow需不需要能赋值gpu
tf.config.list_physical_devices('GPU')
能够有一个空的文件列表,介绍没能遇到GPU
tf.test.is_built_with_cuda

遇到进行按照的tensorflow,如果不是用cuda编译的,也就没能跳转gpu 因该装有tensorflow-gpu

问题3 conda 的多个数据源里面都没有 tensorflow-gpu=2.5.0,但是pip里面有

此时此刻最新版本信息查询
conda 4.9.2
pip 21.1.3 from /home/username/miniconda3/envs/envnames/lib/python3.8/site-packages/pip (python 3.8)


用pip的的安装随后,相对应的的cuda发行版就没有自主的的安装好 结合python微信版本指定的tensorflow

pip install //storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-2.4.0-cp38-cp38-manylinux2010_x86_64.whl

安装之后会看到有:tensorflow-gpu 2.4.0

依然无法成功使用GPU

问题4 tensorflow是gpu版本,keras是否也要指定gpu版本呢?

keras-gpu
安装keras-gpu用如下指令

安装之后tensorflow会被conda自动更新
也就是,直接安装keras-gpu就可以了,对应tensorflow-gpu也就自动装好了
但是进入python控制台,发现tensorflow不能用了,可能是因为pip装了一个tensorlow,conda又装了一个

此外,安装的keras-gpu并不能通过import keras导入,无法满足当前程序,因此摈弃这种安装方式

问题5 tensorflow2.5和keras2.4.3可能不兼容

运营代码是什么时间报错,报错的是keras

keras和tf.keras关系

避免:把代碼下列有的keras换成tensorflow.keras

问题6 cudnn 报错

Failed to get convolution algorithm. 
This is probably because cuDNN failed to initialize, 
so try looking to see if a warning log message was printed above.
	 
简略短信
Loaded runtime CuDNN library: 8.0.5 
but source was compiled with: 8.1.0.  

CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

进行安装cudnn 8.1.0就好处理
pip install cudnn
ERROR: Could not find a version that satisfies the requirement cudnn (from versions: none)
ERROR: No matching distribution found for cudnn

conda找到了对应版本
但是默认的版本不符合要求

最后发现应该 输入下面的命令安装正确版本的cudnn
//anaconda.org/nvidia/cudnn

conda install -c nvidia cudnn

安装其他版本cuda

服务器配置多版本CUDA、CUdnn(不同Linux账户使用不同CUDA、CUdnn版本)
//www.cnblogs.com/sddai/p/10278005.html

下载链接
//developer.nvidia.com/cuda-toolkit-archive

在官网下载cuda,然后解压,配置环境变量
即是:在用户目录下面的.bashrc 文件末尾,加上这几句,然后source .bashrc 即可

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/user/cuda/lib64
export PATH=$PATH:/home/user/cuda/bin
export CUDA_HOME=$CUDA_HOME:/home/user/cuda

未解决的问题

全为N的矩阵 与 部分为Y的矩阵 表示的含义,训练模型的时候有无影响

之前的理解是Y是表示两两之间可以通讯,但是目前全部是N,一个程序能成功调用多块GPU,Y与N目前没有造成影响

其他问题:

安装tensorflow-gpu==2.4的时候找不到文件
在这个网站上找到之后,点击文件详情,复制source_url

conda install <source_url> 
只能那一瞬间按照好tensorflow-gpu

//anaconda.org/anaconda/tensorflow-gpu/files

安装使用好的效果好

重新安装好此后,import报错

下载这个文件发现,里面只有一些基本信息,没有内容

不能走这个捷径