#include <qtViewer.h> #include "gradientVis.h" int main(int argc, char *argv[]) { QTViewer viewer(argc, argv, 10, 10, "QtCPU"); int width = 50; int height = 50; DataVis* vis = new GradientVis(width,height); viewer.setAnimation(vis); return viewer.run(); }
QTViewer(int argc, char* argv[], int w, int h, QString title="Demo");requires the command line arguments needed, but typically not used in this application by Qt. The next two parameters are to specify the desired width and height of the application window. Note: currently, these parameters are not used and this feature is not yet implemented. An optional fifth string parameter specifies the title of the animation window.
Aside from the constructor, the QtViewer contains only two other public methods. The setAnimation(DataVis* vis) method, connects a user defined animation to the viewer. The viewer will repeatedly call the update() method of this animation to visualize the demo.
The run() method enters the primary graphical user interface (GUI) event loop of the viewer and start running the animation. This method only returns when the application quits.
The primary role of the GradientVis class, or any demo class is to implement the update() method. Our implementation will add one additional member variable m_ticks to keep track of how many times the update() method has been called. The resulting header file, gradientVis.h is quite small:
#pragma once #include <dataVisCPU.h> /* A single threaded gradient visualization */ class GradientVis: public DataVisCPU { public: GradientVis(int w, int h); virtual ~GradientVis(); void update(); private: int m_ticks; };
The implementation of the GradientVis class is provided in gradientVis.cpp. The constructor simply needs to initialize the base class and m_ticks.
GradientVis::GradientVis(int w, int h) : DataVisCPU(w,h), m_ticks(0){ /* do nothing */ };There is no dynamic memory allocation in the derived class, so the destructor can be empty.
GradientVis::~GradientVis(){ /* do nothing */ }
All the real work is in the update() method, shown below:
void GradientVis::update() { int off; unsigned char val; int c; for(int r=0; r<m_height; r++){ for(c=0; c<m_width; c++){ off = r*m_width+c; val = (unsigned char) (128. * r / m_height); val = (val+m_ticks)%128; m_image.buffer[off].r=val; m_image.buffer[off].g=0; m_image.buffer[off].b=128-val; } } m_ticks += 1; }The GradientVis class and its underlying base class DataVisCPU contains a color buffer of a size specified during construction. In our example above in main.cpp, this was set to be 50 by 50 pixels. These dimensions are exposed by protected member variables in the DataVis base class as m_height and m_width and can be used by our update() method. Additionally, each DataVis object stores an ImageBuffer object, m_image, which is a simple struct containing the dimensions of the color buffer and the buffer itself, which is an array of m_height*m_width color3 objects stored in row major order. Row 0 is the bottom of the image. The individual red, green, and blue components of each pixel can be accessed with the r,g and b variables in the color3 struct. Each component is an unsigned char in the range 0 to 255.
Our initial demo uses the row number to compute the color and the rotates the color through the rows using the m_ticks variable to animate the scene.
For this step, we are assuming you have built and installed the QtVis library. If this is the case, compiling a new demo is fairly easy. We just ask CMake to find the QtVis library, then compile our source and link against the library and all of its dependencies. The demos file CMakeLists.txt has a more complicated example for supporting multiple demos and multiple, optional, parallel frameworks. For this simple demo, the following CMakeLists.txt should suffice.
cmake_minimum_required(VERSION 3.11) project(qtdemos LANGUAGES CXX) find_package(QtVis REQUIRED) #Add user generated files here and name of executable add_executable(cpuDemo main.cpp gradientVis.cpp) #Link executable to vis library target_link_libraries(cpuDemo QtVis::qtvis)
The first few lines are boilerplate. If you are not using CUDA, you may be able to use a slightly earlier version of CMake, but it would probably need to be at least version 3.5. You can change the project name to anything you would like.
The primary step needed for your demo is an add_executable(...) step that specifies the desired name of the compiled executable and the list of source files that need to be compiled, in our case, main.cpp and gradientVis.cpp. This command will generate the needed commands to compile your source.
The final step is to link your compiled source to third party libraries. Again, we specify the name of the executable (matching the line above) and then the list of libraries. By including the QtVis::qtvis name, this will include the qtvis library and all its dependencies, so we do not need to list them explicitly.
openMPVis.h follows the same setup as gradientVis.h in inheriting from DataVisCPU and adding an m_ticks member variable. The constructor and destructor implementation in openMPVis.cpp also follow the same pattern. The only difference in update() is the addition of the line
#pragma omp parallel for private(c, off, val)in front of the outer for loop. This line instructs the compiler to parallelize the loop over rows. By default, OpenMP will only make private copies of the loop variable for each thread. But in this example, each parallel thread will need a private copy of c, off and val, so we add the directive private(c, off, val). As a fun example of how this library can be used a debugging tool, try leaving this private(...) option off, and see how the output changes.
If the QtVis was compiled with OpenMP support, the CMakeLists.txt file is similar to example above. We simply need to change the name of the executable and source files.
#...header same as single threaded example... add_executable(openMPDemo openMPDemo.cpp openMPVis.cpp) target_link_libraries(openMPDemo QtVis::qtvis)
DataVis* vis = new PThreadVis(2, width, height);In this demo, each thread will create its own vertical red to blue gradient in a portion of the overall image buffer, so you can see how many threads are running when you execute the demo.
As we move to the phtreadsVis.h header file, we begin to see a little more complexity than the previous examples. We still inherit from DataVisCPU, but there are more member variables to maintain. For each thread, we will keep track of its handle through the pthread_t* m_threads array, and any info needed by each thread is stored in a separate threadInfo* m_tinfo. We also keep track of the total number of threads as int m_numThreads. To coordinate the writing of a single buffer by all thread, we use a pthread_barrier_t m_barrier.
The pthread_barrier_t is not provided by OSX implementations of pthreads, but it is provided by the QtVis library if you want to use the library on OSX. Our complete header file phtreadsVis.h for linux and OSX is shown below:
#pragma once #include <dataVisCPU.h> #include <pthread.h> #ifdef __APPLE__ #include <osx/pthread_barrier.h> #endif typedef struct { int nThreads; int id; ImageBuffer* img; pthread_barrier_t* barrier; } threadInfo; /* A PThreads Demo */ class PThreadVis: public DataVisCPU { public: PThreadVis(int numThreads, int w, int h); virtual ~PThreadVis(); void update(); private: int m_numThreads; pthread_t* m_threads; threadInfo* m_tinfo; pthread_barrier_t m_barrier; };Since each thread will need access to the barrier and the image buffer, we make these elements part of the threadInfo and show how they are populated below.
Since the m_threads an m_tinfo arrays are allocated dynamically in the constructor, we delete them in the destructor.
PThreadVis::PThreadVis(int numThreads, int w, int h) : DataVisCPU(w,h), m_numThreads(numThreads), m_threads(nullptr), m_tinfo(nullptr) { int i; m_threads = new pthread_t[m_numThreads]; m_tinfo = new threadInfo[m_numThreads]; m_tinfo[0].nThreads=m_numThreads; m_tinfo[0].img=&m_image; m_tinfo[0].barrier=&m_barrier; pthread_barrier_init(&m_barrier, nullptr, m_numThreads+1); for(i=0;i<m_numThreads;i++){ m_tinfo[i]=m_tinfo[0]; m_tinfo[i].id=i; pthread_create(&m_threads[i], nullptr, threadUpdate, (void*)&m_tinfo[i]); } }; PThreadVis::~PThreadVis(){ pthread_barrier_wait(&m_barrier); pthread_barrier_destroy(&m_barrier); delete [] m_threads; m_threads=nullptr; delete [] m_tinfo; m_tinfo=nullptr; }
Once created, each thread independently executes the threadUpdate function described below.
threadUpdate receives the information it needs through the void* info, which was originally a threadInfo pointer during the pthread_create call in the constructor. We begin by casting the info struct and extracting the needed information. We use the image size, the total number of threads, and the current thread id to compute which rows of the images this thread should process. This information is stored in the variables rowstart and rowstop.
void *threadUpdate(void* info){ threadInfo* tinfo = (threadInfo*) info; int off; int w,h; int rowstart, rowstop, maxrows; int ticks = 0; unsigned char val; w= tinfo->img->width; h= tinfo->img->height; maxrows = h/tinfo->nThreads; if(h%tinfo->nThreads > 0) { maxrows++; } rowstart=maxrows*tinfo->id; rowstop=rowstart+maxrows; if(rowstop > h) { rowstop = h; } /* do stuff */We now update only the rows between rowstart and rowstop similar to the approach used in the two previous examples.
/* update subimage */ for(int r=rowstart; r<rowstop; r++){ for(int c=0; c<w; c++){ off = r*w+c; val = (unsigned char) (128. * r /maxrows); val = (val+ticks)%128; tinfo->img->buffer[off].r=val; tinfo->img->buffer[off].g=0; tinfo->img->buffer[off].b=128-val; } }After completing the update of this subimage, this thread must wait until other threads have completed their updates and the update() is called again to refresh the entire image. We can coordinate most of this with a pthread barrier in our threadUpdate function by adding a barrier wait inside a while loop as follows
while(true){ /* update subimage as above */ /* wait until next update time */ pthread_barrier_wait(tinfo->barrier); /* increment local tick counter */ ticks++; }This completes the threadUpdate implementation, but how do we coordinate the writing of subimages by other threads with the primary QTViewer update loop? If we are careful, we can use the same barrier as shown below.
pthread_barrier_init(&m_barrier, nullptr, m_numThreads+1); for(...){ ... }Since only m_numThreads are calling wait in threadUpdate(...), we add an additional wait call inside our implementation of update() to sync the update loop with the worker threads. This is the only line we need in update() as threadUpdate(...) does all the image writing.
void PThreadVis::update() { pthread_barrier_wait(&m_barrier); }
We can clean up the barrier in the destructor with
pthread_barrier_wait(&m_barrier); /* one last wait sync with workers */ pthread_barrier_destroy(&m_barrier);
While considerably more complicated than the single threaded or OpenMP examples, the pthreads example outline can be readily modified for other pthreads demos by changing the threadInfo struct and the threadUpdate(...) function for your particular application. The use of the barrier for proper synchronization should be a common pattern for a wide range of applications.
If the QtVis was compiled with pthreads support, the CMakeLists.txt file is similar to examples above. We simply need to change the name of the executable and source files.
#... header same as the examples above ... add_executable(threadDemo threadDemo.cpp pthreadVis.cpp) target_link_libraries(threadDemo QtVis::qtvis)
Additionally, because CUDA code is compiled with separate compiler options, we occasionally need to be careful when combining CUDA code with other complex C++ code like that found in Qt. For this reason, we introduce a new special virtual class Animator that contains one pure virtual method virtual void update(ImageBuffer* img). Instead of users writing a new class that derives directly from DataVisGPU class as was the case in previous examples, CUDA users will instead write a class that derives from Animator and connect this class to a non-virtual DataVisGPU instance.
The QTViewer class deletes the DataVis object upon exiting, but neither QTViewer nor DataVisCUDA delete the Animator instance, so the user is responsible for this final cleanup. The new main() program is shown below for a ripple animation we will describe next.
#include "rippleKernel.h" #include <qtViewer.h> #include <dataVisCUDA.h> /* for CUDA applications */ int main(int argc, char *argv[]) { QTViewer viewer(argc, argv, 10, 10, "QtCUDA"); int width = 800; int height = 800; /* Note: DataVisCUDA instead of DataVis or DataVisCPU */ DataVisCUDA* vis = new DataVisCUDA(width,height); /* Note: create animator and connect to DataVisCUDA object */ Animator* kern = new RippleKernel(); vis->setAnimator(kern); /* Same as other demos */ viewer.setAnimation(vis); int res = viewer.run(); /* Cleanup */ /* The viewer only returns once the OpenGL context has been destroyed currently, the viewer will delete the vis object on the user behalf, since attempting to delete it after the OpenGL context has been destroyed causes problems. For CUDA users, however, the Animator is not automatically destroyed and must be cleaned up manually */ delete kern; kern=nullptr; return res; }
#pragma once #include <animator.h> #include <cuda.h> class RippleKernel: public Animator { public: RippleKernel(): m_ticks(0){ }; ~RippleKernel(){ /* do nothing */}; void update(ImageBuffer* img); private: int m_ticks; /* number of timesteps in animation */ };The only thing left to do is provide an implementation of the update(ImageBuffer* img) in rippleKernel.cu that uses CUDA to populate the color buffer.
Given a pointer to the color buffer in GPU memory, the dimensions of the image, and the number of ticks, the CUDA kernel can execute in parallel with the following code (Example originally from section 5.2.2 of CUDA by Example by Sanders and Kandrot):
__global__ void kernel(color3 *ptr, int w, int h, int ticks) { // map from threadIdx/BlockIdx to pixel position int x = threadIdx.x + blockIdx.x * blockDim.x; int y = threadIdx.y + blockIdx.y * blockDim.y; int offset = x + y * w; // compute distance from center of image float fx = x - w / 2; float fy = y - h / 2; float d = sqrtf(fx * fx + fy * fy); // use distance to modulate grey value intensity unsigned char grey = (unsigned char)(128.0f + 127.0f * cos(d / 10.0f - ticks / 7.0f) / (d / 10.0f + 1.0f)); if(x<w && y<h){ ptr[offset].r = grey; ptr[offset].g = grey; ptr[offset].b = grey; } }
The implementation of update(ImageBuffer* img) simply needs to unpack the necessary information from img and call the kernel on each update. The DataVisCUDA class automatically allocates the color buffer on the GPU and stores the GPU pointer in img->buffer. The image dimensions are also part of the ImageBuffer struct. A full implementation of update is below:
void RippleKernel::update(ImageBuffer* img) { int tdim = 8; // number of threads in x/y direction per block int w = img->width; int h = img->height; /* set up grid dimension */ dim3 blocks((w+(tdim-1)) / tdim, (h+(tdim-1)) / tdim); /* set up block dimension */ dim3 threads_block(tdim, tdim); /* call the CUDA kernel with grid dimension */ kernel<<<blocks, threads_block>>>(img->buffer, w, h, m_ticks); /* step size controls speed of animation */ m_ticks += 2; }
project(qtdemos LANGUAGES CXX CUDA)Now we can compile CUDA simply by specifying the source files with a .cu extension. Since this CUDA code depends on some includes and libraries that are part of the QtVis, we note this dependency with the target_link_libraries command.
add_library(cudademos rippleKernel.cu) target_link_libraries(cudademos QtVis::qtviscuda)This creates a small library for our cudademos. The final step is to add our main executable as in the previous examples and link it to our new library along with the primary QtVis library.
add_executable(cudaDemo cudaDemo.cpp) target_link_libraries(cudaDemo cudademos QtVis::qtvis)If you know you have the QtVis library compiled with CUDA support, your CMakeLists.txt can be simply:
cmake_minimum_required(VERSION 3.11) project(qtdemos LANGUAGES CXX CUDA) find_package(QtVis REQUIRED) add_library(cudademos rippleKernel.cu) target_link_libraries(cudademos QtVis::qtviscuda) add_executable(cudaDemo cudaDemo.cpp) target_link_libraries(cudaDemo cudademos QtVis::qtvis)
VERBOSE=1 makethen do some stuff... TODO