Next, fetch upstream changes. If you have committed local changes, merge conflicts should be minimal
[~]$ cd ~/cs40/examples [examples]$ git fetch upstream [examples]$ git merge upstream/master [examples]$ git push
Open up a terminal and navigate to the w08-cuda-pt1 directory.
In a second terminal, navigate to the build folder and run make -j8 to get the w08-cuda-pt1 folder.
[~]$ cd ~/cs40/examples/build/ [build]$ make -j8 [build]$ cd w08-cuda-pt1 [w08-cuda-pt1]$
In CUDA, we make the notions of data I/O and computation more general. We will copy data between the CPU and GPU using cudaMalloc and cudaMemcpy. CUDA uses the terms host and device to refer to the CPU and GPU, respectively.
To run a program on the GPU (device), we write a special function called a kernel. A kernel is just a normal C function with the following restrictions/additions
__global__ is not a valid C keyword or token and while CUDA looks a lot like C, it has some extra features and keywords. But we should be familiar with this. GLSL is a C-like shading language for OpenGL that adds some extra keywords/features. Qt also adds some extensions to C++ to handle signals and slots. And just as Qt code is compiled by a Meta Object Compiler (MOC) and GLSL shaders are compiled/link by special OpenGL functions, CUDA is compiled with a special compiler, nvcc in /usr/local/cuda/bin.
First, since kernels must have a void return type, how can we ever get output from the GPU? We can use a pointer that points to a buffer of memory and write our output to that buffer. We have to be a little careful though, since there are two memory spaces: CPU/host memory and GPU/device memory. Since the kernel runs on the GPU, this pointer should point to device memory. In fact, any pointer passed to a kernel should be a pointer to device memory.
We can allocate memory on the GPU using the function cudaMalloc. We pass cudaMalloc an address of a pointer dev_c and the requested size. The function then allocates space on the GPU and writes the value of the GPU pointer to dev_c. We now have a GPU address stored in a pointer variable on the CPU. You can pass this pointer value around, including passing it to a kernel, but you cannot dereference this point on the CPU because it refers to a GPU address.
Calling a kernel
cudaMemcpy
cd /usr/local/cuda-7.5-samples/NVIDIA_CUDA-7.5_Samples/bin/x86_64/linux/release