cuda编程实践

时间：2015-06-07 17:32:23 阅读：1099 评论：0 收藏：0 [点我收藏+]

昨天晚上弄到12. 终于迈出了c++到cuda的第一步，经过测试，基本可以确定是真的实现了。表示很兴奋，可是别人并不懂，没地方去庆祝啊。。。。

废话不多说，下面来介绍一下基本的实现过程。（国人在这方面的资料真的是...少的可怜）

源代码：一串关于基因序列的高大上的东西，是外国友人的开源码，这个工程或者说这个产品的功能是讲基因格式转换？我记得老师是这样跟我讲的。。。。然而知道这些并不能提高我的b格，老师的意思是我们不需要了解他是干嘛的，我们只要能搞定语法以及函数，把CPU的运算改写到GPU，一切就万事大吉，那么我们的目标是，代码执行速度提高三倍！！！！！

目标代码：OKOK，移植成功的CUDA代码咯

因为源代码是c++程序，是boost框架下的高大上，代码比较苦涩难懂，听老师讲，这个程序是生成无数个实例对象放入栈中，然后通过用户指定的CPU数量吧，一次性拿出多少个实例来处理，如果全部放进GPU中~~执行效率可想而知，想想都兴奋啊。

可是，无线百度就是找不到国人的例子，毕竟是要将对象放到GPU中啊，这个这个，竟然没有例子，好吧，最终在老师的帮助下，找到了一个实例，也算是确切的告诉我们，cuda是支持对象实例滴。下面贴出老师提供给我们的英伟达官方列程：

#include <thrust/random.h>
#include <thrust/device_vector.h>
#include <helper_cuda.h>

////////////////////////////////////////////////////////////////////////////////
// A structure of 2D points (structure of arrays).
////////////////////////////////////////////////////////////////////////////////
class Points
{
float *m_x;
float *m_y;

    public:
        // Constructor.
        __host__ __device__ Points() : m_x(NULL), m_y(NULL) {}

// Constructor.
__host__ __device__ Points(float *x, float *y) : m_x(x), m_y(y) {}

        // Get a point.
        __host__ __device__ __forceinline__ float2 get_point(int idx) const
        {
            return make_float2(m_x[idx], m_y[idx]);
        }

        // Set a point.
        __host__ __device__ __forceinline__ void set_point(int idx, const float2 &p)
        {
            m_x[idx] = p.x;
            m_y[idx] = p.y;
        }

        // Set the pointers.
        __host__ __device__ __forceinline__ void set(float *x, float *y)
        {
            m_x = x;
            m_y = y;
        }
};

////////////////////////////////////////////////////////////////////////////////
// A 2D bounding box
////////////////////////////////////////////////////////////////////////////////
class Bounding_box
{
        // Extreme points of the bounding box.
        float2 m_p_min;
        float2 m_p_max;

    public:
        // Constructor. Create a unit box.
        __host__ __device__ Bounding_box()
        {
            m_p_min = make_float2(0.0f, 0.0f);
            m_p_max = make_float2(1.0f, 1.0f);
        }

        // Compute the center of the bounding-box.
        __host__ __device__ void compute_center(float2 &center) const
        {
            center.x = 0.5f * (m_p_min.x + m_p_max.x);
            center.y = 0.5f * (m_p_min.y + m_p_max.y);
        }

        // The points of the box.
        __host__ __device__ __forceinline__ const float2 &get_max() const
        {
            return m_p_max;
        }

        __host__ __device__ __forceinline__ const float2 &get_min() const
        {
            return m_p_min;
        }

        // Does a box contain a point.
        __host__ __device__ bool contains(const float2 &p) const
        {
                                                                                                                                                                89,9           5%
    host_points.set(thrust::raw_pointer_cast(&x_h[0]), thrust::raw_pointer_cast(&y_h[0]));

    // Copy nodes to CPU.
    Quadtree_node *host_nodes = new Quadtree_node[max_nodes];
    checkCudaErrors(cudaMemcpy(host_nodes, nodes, max_nodes *sizeof(Quadtree_node), cudaMemcpyDeviceToHost));

    // Validate the results.
    bool ok = check_quadtree(host_nodes, 0, num_points, &host_points, params);
    std::cout << "Results: " << (ok ? "OK" : "FAILED") << std::endl;

// Free CPU memory.
delete[] host_nodes;

    // Free memory.
    checkCudaErrors(cudaFree(nodes));
    checkCudaErrors(cudaFree(points));

return ok;
}

////////////////////////////////////////////////////////////////////////////////
// Main entry point.
////////////////////////////////////////////////////////////////////////////////
int main(int argc, char **argv)
{
    // Find/set the device.
    // The test requires an architecture SM35 or greater (CDP capable).
    int cuda_device = findCudaDevice(argc, (const char **)argv);
    cudaDeviceProp deviceProps;
    checkCudaErrors(cudaGetDeviceProperties(&deviceProps, cuda_device));
    int cdpCapable = (deviceProps.major == 3 && deviceProps.minor >= 5) || deviceProps.major >=4;

printf("GPU device %s has compute capabilities (SM %d.%d)\n", deviceProps.name, deviceProps.major, deviceProps.minor);

    if (!cdpCapable)
    {
        std::cerr << "cdpQuadTree requires SM 3.5 or higher to use CUDA Dynamic Parallelism. Exiting...\n" << std::endl;
        exit(EXIT_WAIVED);
    }

bool ok = cdpQuadtree(deviceProps.warpSize);

    // cudaDeviceReset causes the driver to clean up all state. While
    // not mandatory in normal operation, it is good practice. It is also
    // needed to ensure correct operation when the application is being
    // profiled. Calling cudaDeviceReset causes all profile data to be
    // flushed before the application exits
    cudaDeviceReset();

return (ok ? EXIT_SUCCESS : EXIT_FAILURE);
}

通过对上述实例的模拟，我们完成了最开始的第一步，感谢老师，感谢英伟达！！！~~~

cuda编程实践

标签：c++ cuda 对象 boost 英伟达

原文地址：http://blog.csdn.net/u012839187/article/details/46401773

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行