码迷,mamicode.com
首页 > 其他好文 > 详细

CUDNN v3例程演示

时间:2015-08-06 11:11:35      阅读:4432      评论:0      收藏:0      [点我收藏+]

标签:cudnn   cuda   cnn   nvidia   mnist   

首先到NVIDIA官网下载CUDNN v3安装包和例程,如下图中红框所示:

技术分享

安装CUDNN v3之前需要先安装CUDA 7.0或更高版本,不再赘述。

将下载后的两个tgz包拷贝到已安装CUDA 7.0的目标机器某个路径(我这里为/home/yongke.zyk/local_install),解压,得到三个子目录include/,lib64/和samples/。其中include/下为cudnn.h头文件,需要在编译时包含;lib64/下为cudnn静态库libcudnn_static.a和动态库libcudnn.so.7.0.58及其软链接libcudnn.so  libcudnn.so.7.0,在程序链接时加入。samples/下面为使用cudnn的简单例程mnistCUDNN,实现了简单的数字识别。 


修改mnistCUDNN/Makefile的第41-46行为:

else
CUDA_PATH = /usr/local/cuda
CUDA_LIB_PATH = $(CUDA_PATH)/$(CUDA_LIBSUBDIR)
CUDNN_LIB_PATH = /home/yongke.zyk/local_install/lib64
CUDNN_INCLUDE_PATH = /home/yongke.zyk/local_install/include
endif

执行make,编译链接得到可执行的mnistCUDNN。接下来运行,输出如下:

$ ./mnistCUDNN
cudnnGetVersion() : 3002 , CUDNN_VERSION from cudnn.h : 3002 (3.0.02)
Host compiler version : GCC 4.4.6
There are 2 CUDA capable devices on your machine :
device 0 : sms 24  Capabilities 5.2, SmClock 1076.0 Mhz, MemSize (Mb) 12287, MemClock 3505.0 Mhz, Ecc=0, boardGroupID=0
device 1 : sms 24  Capabilities 5.2, SmClock 1076.0 Mhz, MemSize (Mb) 12287, MemClock 3505.0 Mhz, Ecc=0, boardGroupID=1
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.045056 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.055328 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.060416 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.142336 time requiring 207360 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999819 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.036896 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.039936 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.078880 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.131104 time requiring 207360 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000709 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!



版权声明:本文为博主原创文章,未经博主允许不得转载。

CUDNN v3例程演示

标签:cudnn   cuda   cnn   nvidia   mnist   

原文地址:http://blog.csdn.net/kkk584520/article/details/47312261

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!