标签:time mat data block let fine gen std enum
? 书上的计算圆周率的简单程序,主要是使用了自定义函数
1 #include <stdio.h> 2 #include <stdlib.h> 3 #include <math.h> 4 #include <openacc.h> 5 6 #define N 100 7 8 #pragma acc routine seq 9 float ff(const float x) 10 { 11 return 4.0f / (1.0f + x * x); 12 } 13 14 int main() 15 { 16 const float h = 1.0f / N; 17 float sumf = 0, result; 18 19 #pragma acc parallel loop reduction(+:sumf) 20 for (int i = 0; i < N; i++) 21 sumf += ff(h * (i - 0.5f)); 22 23 result = h * sumf; 24 printf("\nN = %d, myPi = %f, diff = %e\n", N, result, result / 3.141592653589793238 - 1); 25 //getchar(); 26 return 0; 27 }
● 输出结果
D:\Code\OpenACC\OpenACCProject\OpenACCProject>pgcc main.c -acc -Minfo -o main_acc.exe ff: 10, Generating acc routine seq Generating Tesla code 11, FMA (fused multiply-add) instruction(s) generated main: 19, Accelerator kernel generated Generating Tesla code 20, #pragma acc loop gang, vector(100) /* blockIdx.x threadIdx.x */ Generating reduction(+:sumf) 19, Generating implicit copy(sumf) D:\Code\OpenACC\OpenACCProject\OpenACCProject>main_acc.exe launch CUDA kernel file=D:\Code\OpenACC\OpenACCProject\OpenACCProject\main.c function=main line=19 device=0 threadid=1 num_gangs=1 num_workers=1 vector_length=100 grid=1 block=100 shared memory=1024 launch CUDA kernel file=D:\Code\OpenACC\OpenACCProject\OpenACCProject\main.c function=main line=19 device=0 threadid=1 num_gangs=1 num_workers=1 vector_length=256 grid=1 block=256 shared memory=1024 N = 100, myPi = 3.161500, diff = 6.336546e-03 PGI: "acc_shutdown" not detected, performance results might be incomplete. Please add the call "acc_shutdown(acc_device_nvidia)" to the end of your application to ensure that the performance results are complete. Accelerator Kernel Timing data D:\Code\OpenACC\OpenACCProject\OpenACCProject\main.c main NVIDIA devicenum=0 time(us): 11 19: compute region reached 1 time 19: kernel launched 1 time grid: [1] block: [100] elapsed time(us): total=1000 max=1000 min=1000 avg=1000 19: reduction kernel launched 1 time grid: [1] block: [256] device time(us): total=0 max=0 min=0 avg=0 19: data region reached 2 times 19: data copyin transfers: 1 device time(us): total=4 max=4 min=4 avg=4 23: data copyout transfers: 1 device time(us): total=7 max=7 min=7 avg=7
标签:time mat data block let fine gen std enum
原文地址:https://www.cnblogs.com/cuancuancuanhao/p/9419429.html