标签:cudnn cuda cnn nvidia mnist
首先到NVIDIA官网下载CUDNN v3安装包和例程,如下图中红框所示:
安装CUDNN v3之前需要先安装CUDA 7.0或更高版本,不再赘述。
将下载后的两个tgz包拷贝到已安装CUDA 7.0的目标机器某个路径(我这里为/home/yongke.zyk/local_install),解压,得到三个子目录include/,lib64/和samples/。其中include/下为cudnn.h头文件,需要在编译时包含;lib64/下为cudnn静态库libcudnn_static.a和动态库libcudnn.so.7.0.58及其软链接libcudnn.so libcudnn.so.7.0,在程序链接时加入。samples/下面为使用cudnn的简单例程mnistCUDNN,实现了简单的数字识别。
修改mnistCUDNN/Makefile的第41-46行为:
else CUDA_PATH = /usr/local/cuda CUDA_LIB_PATH = $(CUDA_PATH)/$(CUDA_LIBSUBDIR) CUDNN_LIB_PATH = /home/yongke.zyk/local_install/lib64 CUDNN_INCLUDE_PATH = /home/yongke.zyk/local_install/include endif
$ ./mnistCUDNN cudnnGetVersion() : 3002 , CUDNN_VERSION from cudnn.h : 3002 (3.0.02) Host compiler version : GCC 4.4.6 There are 2 CUDA capable devices on your machine : device 0 : sms 24 Capabilities 5.2, SmClock 1076.0 Mhz, MemSize (Mb) 12287, MemClock 3505.0 Mhz, Ecc=0, boardGroupID=0 device 1 : sms 24 Capabilities 5.2, SmClock 1076.0 Mhz, MemSize (Mb) 12287, MemClock 3505.0 Mhz, Ecc=0, boardGroupID=1 Using device 0 Testing single precision Loading image data/one_28x28.pgm Performing forward propagation ... Testing cudnnGetConvolutionForwardAlgorithm ... Fastest algorithm is Algo 1 Testing cudnnFindConvolutionForwardAlgorithm ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.045056 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.055328 time requiring 3464 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.060416 time requiring 57600 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.142336 time requiring 207360 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory Resulting weights from Softmax: 0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 Loading image data/three_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 Loading image data/five_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999819 0.0000154 0.0000000 0.0000012 0.0000006 Result of classification: 1 3 5 Test passed! Testing half precision (math in single precision) Loading image data/one_28x28.pgm Performing forward propagation ... Testing cudnnGetConvolutionForwardAlgorithm ... Fastest algorithm is Algo 1 Testing cudnnFindConvolutionForwardAlgorithm ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.036896 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.039936 time requiring 3464 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.078880 time requiring 28800 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.131104 time requiring 207360 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory Resulting weights from Softmax: 0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 Loading image data/three_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000709 0.0000000 0.0000000 0.0000000 0.0000000 Loading image data/five_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 Result of classification: 1 3 5 Test passed!
版权声明:本文为博主原创文章,未经博主允许不得转载。
标签:cudnn cuda cnn nvidia mnist
原文地址:http://blog.csdn.net/kkk584520/article/details/47312261