Home »

CUDA is cool

10. September 2008 by Thad Scheer 0 Comments

SPHERE OF INFLUENCE, INC.software studios and services

Thad Scheer

 

CUDA is cool

                         

I must admit, high-performance computing (HPC) on commodity hardware is one of the coolest shows in town right now. Hopefully I can get our lead investigator, Dwight Corely, to blog about this.  In the meantime, I’ll offer this summary:

We all know the silicon that drives most high-end 3D graphics chips implements a massively parallel computer. For years researchers have used graphics APIs to access this raw compute power for general purpose number crunching. That approach provides access to insanely fast memory bus speeds and dozens (if not hundreds) of parallel arithmetic and logic cores that yield gigaflops of juice. The problem is that graphics APIs are frightening to learn and use.

Everything has changed!

 

There are several players in the commodity high performance computing space now, most notably are NVIDIA, ATI/AMD, and IBM. However, the most exciting stuff is happening on the NVIDIA silicon.

 

There are two major developments you should be aware of:

1)    It’s not about graphics processors anymore; the silicon itself is finally dedicated to HPC.  NVIDIA, for example, has a product line called Tesla which packages a high performance memory bus with a gigantic array of parallel ALU cores. There is no graphics circuitry to get in the way, just massive compute power.

2)    API accessibility has arrived! NVIDIA developed the visionary CUDA library, which is a C library for accessing HPC horsepower.  CUDA opens the door for easy integration into C++, Java, and .Net applications, as well as plug-ins for MatLab and other software.

These two refinements have opened so many doors that it’s safe to say commodity high-performance supercomputing is truly available to the masses.

Anyone with experience doing algorithm work knows that shaving a few µs of performance from a doubly nested loop can have dramatic effect on system performance.  With CUDA we see performance improvements that make that kind of optimization seem irrelevant.  Not just 5x or 10x improved in overall performance, but 50x or 100x when compared to an Intel quad-core Duo 2. The latest generation of silicon can crank out almost 1 TFLOP of number crunching ecstasy.  How well that power gets utilized is a factor of two things: 1) how you parallelize the solution; 2) how the data sets align to bus and memory boundaries. There’s still a lot of engineering required, but the basic software abstractions and development tools are there.

You can use CUDA on an ordinary graphics card or a dedicated Tesla device. Tesla devices range from PCIe cards  to rack mount servers.  It’s very evolved!

 

CUDA/Tesla makes it practical to consider better quality algorithms that might have been discarded because they run to slow for production. It also enables certain aspects of test-driven development that were inaccessible to developers. For example, on one project, the system acceptance tests require massive algorithms just to verify the regression.  This means our continuous integration environment needs to run in overnight batches on big iron hardware, not with each check-in. However, with CUDA and Tesla we can rack supercomputer power cheaply and allow software developers to run full system regression tests while they debug their code, even prior to check-in. Once implemented, this will be a major fail-fast improvement in our Agile projects and will allow us to incorporate system regression tests into continuous integration.

 

I apologize if this post reads like an NVIDIA pitch, we don’t have any partnership with NVIDIA…we just like their stuff…a lot!!!

Comments are closed