John Pormann, Ph.D. jbp1@duke.edu
Overview
Basic Introduction Intro to the Operational Model Simple Example ! Memory Allocation and Transfer ! GPU-Function Launch Grids of Blocks of Threads GPU Programming Issues Performance Issues/Hints
CUDA and NVIDIA
CUDA is an NVIDIA product and only runs on NVIDIA GPUs ! AMD/ATI graphics chips will NOT run CUDA ! Older NVIDIA GPUs may not run CUDA either
! *Some* laptops may be capable of running CUDA
"
Not sure what this will do to battery life
! All current and future display drivers from NVIDIA will include
support for CUDA
"
You don’t need to download anything else to run a CUDA program
! To see if your GPU is CUDA-enabled, go to:
"
http://www.nvidia.com/object/cuda_learn_products.html
Why GPU programming?
Parallelism ! CPUs recently moved to dual- and quad-core chips ! The current G100 GPU has 240 cores Memory bandwidth ! CPU (DDR-400) memory can go 3.2GB/sec ! GPU memory system can go 141.7GB/sec Speed ! CPUs can reach 20GFLOPS (per core) ! GPUs can reach 933GFLOPS (single-precision or integer) ! ... 78GFLOPS (double-precision) Cost ... $400-1000
Yesterday’s Announcement
NVIDIA recently held their annual developer conference and released info on the next generation of GPUs ... “Fermi”
3B transistors, 40nm 512 compute elements 8x increase in DP performance (~700GFLOPS) GDDR5 memory (230GB/sec) ECC memory L1 and L2 Cache memory (“configurable”?)
Operational Model
CUDA assumes a heterogeneous architecture -- both CPUs and GPUs -- with separate memory pools
! CPUs are “masters” and GPUs are the “workers”
" " "
CPUs launch computations onto the GPU CPUs can be used for other computations as well GPUs have limited communication back to CPU
! CPU must initiate data transfers to the GPU memory
" "
Synchronous Xfer -- CPU waits for xfer to complete