Introduction
This lab will have you practice writing 2D CUDA kernels for generating fractal images, in particular the
Julia Set. By experimenting with the grid and block dimensions of the kernel, you will see how these parameters effect performance on multiple GPU configurations.
Project Goals:
- Designing and implementing a CUDA program: kernel functions,
memory copy, grid, block and thread layout.
- Using an openGL library to visualize CUDA computation.
- Practice debugging CUDA programs.
Getting Started
We will start with the
julia.cu demo provided with the
CUDAVis library. For the moment, we are more interested in timing a static image, so we will modify the
animate_julia function to use a single seed value and to not update the ticks for animating. A replacement function is given below with changes indicated.
static void animate_julia(uchar3 *devPtr, void *my_data) {
my_cuda_data *data = (my_cuda_data *)my_data;
dim3 blocks(data->size, data->size);
float im = data->im; //CHANGED
float re = data->re; //CHANGED
GPUTimer timer;
timer.start();
julia_kernel<<>>(devPtr, data->size, re, im);
//USE NEWLINES INSTEAD OF \r
printf("Frame generation time: %7.2f ms\n", timer.elapsed());
//REMOVED A LINE HERE
}
As written, this kernel assigns an entire CUDA block to each pixel. Is this optimal, or can this be improved? You assignment is to modify the CUDA kernel
julia_kernel or add additional CUDA kernels that can leverage both CUDA threads and block. Your kernels should not assume the total number of CUDA compute elements (threads/blocks) matches the number of pixels as the initial example assumes. You should handle the cases when some blocks or threads may need to process multiple pixels, as well as cases where some threads in a block may need to be idle while other threads are working.
After making modifications to your kernels, modify your animate_julia function to call your kernel appropriately.
Experiments
Run experiments with your code to answer the following questions:
- What block/thread configuration offered the best performance?
- If you have multiple GPU models available, does the best configuration depend on the model? If so, how?
- For this application, would it be better to use a grid of 16x16 blocks or a single block of 16x16 threads? Explain.
- For this application, would it be better to use a grid of 512x512 blocks or a single block of 512x512 threads? Explain.
Extensions
Once you have completed the basic modifications and answered the questions above, feel free to modify the program to explore extensions that interest you. Below are some possibilities.
- Turn the animation computation back on as was the case in the original demo.
- Make a new animation.
- Come up with a better color scheme.
- Generate a new fractal pattern.