In this talk, I will cover three topics: 1) The NVIDIA Tesla Architecture, 2) The CUDA programming language, and 3) and recent work on N-Body simulation, concluding with examples of similarly accelerated computing for broader audiences. The Tesla Architecture supports both graphics and non-graphics computation, using an array of custom processors on a single chip. The programming model is neither SIMD nor MIMD, but somewhere in between, where we can exploit the advantages of each. The current performance part has 240 processors running at 1.3 - 1.6 GHz. With dual-issue capabilities, this places the peak performance over 1 TFLOP. CUDA is the C programming language with a few extensions for programming the Tesla chips. These include thread launch/terminate, synchronization, sharing, and atomic operations.
In a collaborative effort with Jan Prins (UNC CS) and Mark Harris (NVIDIA), we have written an N-Body simulator using CUDA that runs on Tesla hardware. We achieve a sustained computational rate over 400 GFLOPS, or 30k bodies interacting at nearly 30 steps/second. This is substantially faster than any conventional CPU, as the core of the computation relies on 1/sqrt(x), which is implmented as an optimized instruction, because it is required in graphics (and physics) for normalizing vectors. I'll summarize with thoughts about the availability of accelerated computing.
Lars Nyland is a senior architect in the compute group at NVIDIA, where he designs, develops and tests architectural features to support non-traditional uses of graphics processors. Prior to joining NVIDIA, Lars was an associate professor of computer science at the Colorado School of Mines in Golden, Colorado. He ran the Thunder Graphics Lab, where demanding computational applications were coupled with immersive, 3D graphics. Prior to that, he was a member of the research faculty at UNC, Chapel Hill, where he was a member of the high-performance computing and image-based rendering groups. Some notable achievements were the development of the DeltaSphere scene digitizer and its use at Monticello to provide an immersive experience for visitors to the New Orleans Museum of Art's Jefferson and Napoleon exhibit. He also spent considerable time studying N-Body algorithms, parallelizing N-Body algorithms for Molecular Dynamics, and parallel programming languages. Lars earned his PhD at Duke University in 1991 under the direction of John Reif, exploring high-level parallel programming languages.