CS40: CUDA Fractal Generation

Introduction

This lab will have you practice writing 2D CUDA kernels for generating fractal images, in particular the Julia Set. By experimenting with the grid and block dimensions of the kernel, you will see how these parameters effect performance on multiple GPU configurations.

Project Goals:

Designing and implementing a CUDA program: kernel functions, memory copy, grid, block and thread layout.
Using an openGL library to visualize CUDA computation.
Practice debugging CUDA programs.

Getting Started

We will start with the julia.cu demo provided with the CUDAVis library. For the moment, we are more interested in timing a static image, so we will modify the animate_julia function to use a single seed value and to not update the ticks for animating. A replacement function is given below with changes indicated.

static void animate_julia(uchar3 *devPtr, void *my_data) {

  my_cuda_data *data = (my_cuda_data *)my_data;
  dim3 blocks(data->size, data->size);

  float im = data->im; //CHANGED
  float re = data->re; //CHANGED
  GPUTimer timer;
  timer.start();
  julia_kernel<<>>(devPtr, data->size, re, im);
  //USE NEWLINES INSTEAD OF \r
  printf("Frame generation time: %7.2f ms\n", timer.elapsed());
  //REMOVED A LINE HERE
}

As written, this kernel assigns an entire CUDA block to each pixel. Is this optimal, or can this be improved? You assignment is to modify the CUDA kernel julia_kernel or add additional CUDA kernels that can leverage both CUDA threads and block. Your kernels should not assume the total number of CUDA compute elements (threads/blocks) matches the number of pixels as the initial example assumes. You should handle the cases when some blocks or threads may need to process multiple pixels, as well as cases where some threads in a block may need to be idle while other threads are working.

After making modifications to your kernels, modify your animate_julia function to call your kernel appropriately.

Experiments

Run experiments with your code to answer the following questions:

What block/thread configuration offered the best performance?
If you have multiple GPU models available, does the best configuration depend on the model? If so, how?
For this application, would it be better to use a grid of 16x16 blocks or a single block of 16x16 threads? Explain.
For this application, would it be better to use a grid of 512x512 blocks or a single block of 512x512 threads? Explain.

Extensions

Once you have completed the basic modifications and answered the questions above, feel free to modify the program to explore extensions that interest you. Below are some possibilities.

Turn the animation computation back on as was the case in the original demo.
Make a new animation.
Come up with a better color scheme.
Generate a new fractal pattern.