Nvidia Fermi Tesla GPU Insights

Nvidia ain’t holding back in the Graphics race. Last week, AMD knocked Nvidia to the second place by introducing ATI Radeon HD 5970 – World’s Fastest Graphics Card. Nvidia didn’t show anything on the typical consumer side,  but rather for Supers (HPCs), they gave AMD a big hit.

At this year’s SC09 supercomputing trade show in Portland, Oregon, Nvidia unveils its plan for the next-gen Chip ‘Fermi’. Overall, the GPU co-processors aimed at personal supers and massive clusters were the stars of the show. This is something much more powerful than AMD’s Fusion.

The video card versions of the Fermi chips will be called the GeForce 300 M line. Nvidia  is keeping the Tesla brand for its next generation GPU co-processors for workstations and servers. The Fermi chips will be sold under the Tesla 20 brand.

Last month El Reg talked about the architecture.

The SM design contains 32 basic Cuda cores – four times as many found in previous generations of SM – each comprising one integer and one floating-point maths unit. It is able to schedule two groups of 32 threads – a group Nvidia calls a “warp” – at once. The networked cores connect to 64KB of shared L1 cache, also used by four Special Function Units (SFUs) which handle complex maths formulae such as sines and cosines.

One of the 32-core Stream Multiprocessors

image credit

Flavors and Specs:

As per Andy Keane, GM at Tesla supercomputing (Nvidia), the Tesla 20 cards will come in 2 flavors. Nvidia will sell co-processor systems that can plug right into HPC clusters and link to servers through PCI-Express 2.0 links – and at around 130 watts. Keane bristles at anyone who claims that a fully burdened heat budget for a server – not just a microprocessor, but its memory controller (if it is not integrated), its chipset, and its memory – will be any lower.

With the Fermi family of GPUs, Nvidia is adding L1 and L2 caches to the co-processors and is putting ECC memory scrubbing on internal GDDR5 video memory on the card as well as accesses to external server memory. This ECC support, as it turns out, is as important as anything else in the chip if you want to sell GPUs to nuke labs.

The Fermi chip has 512 cores, which is more than twice the cores of the first Tesla GPUs. The Fermis bundle 32 cores together into a streaming multiprocessor that has 64 KB of shared L1 cache. All 512 cores have access to a shared 768 KB L2 cache, and they support the IEEE 754-2008 double precision floating point standard.

In theory, the Fermi chip can address up to 1 TB of memory, but the Tesla C2050 GPU co-processor has 3 GB of GDDR5 memory and double precision floating point performance of 520 gigaflops, costs $2,499. The Tesla C2070 GPU has 6 GB of GDDR5 memory and is rated at 630 gigaflops, costs $3,999. The bang for the buck is best with the smaller unit, which weighs in at $4.81 per gigaflops compared to the $6.35 per gigaflops of the faster GPU.

On the higher Front, a 1U appliance with four of the faster C2070 GPUs delivers 2.52 teraflops of double-precision floating point performance and costs $18,995, or $7.54 per gigaflops.

GPU and Programming

There are two secret stuff in the Fermi GPUs that are going to get HPC freaks thinking about using GPUs.

First one, they will now support Nvidia’s C++ compiler (not just C).

Secondly, a set of new InfiniBand and Tesla drivers that InfiniBand chip maker Mellanox and Nvidia have got together to streamline the movement of data from the InfiniBand ports, to the CPU’s main memory, and then down through the PCI-Express bus to the GPU card.

The way it works now: Data comes in over InfiniBand, works its way into main memory and is copied; before it is moved down to the GPU, it is copied again and that copy is what is moved. The driver changes allow for the data moved into memory to be moved down to the GPU in one fell swoop. The improvements? They ahve been able to demonstrate a 30 percent speedup.

Finally, Nvidia this week released the beta of the CUDA Toolkit 3.0, which exploits the Fermi GPU’s features.

Availability

The Tesla 20 GPU co-processors & the appliances based on them will be available in the 2nd quarter of 2010. The GeForce graphics cards based on the same GPU chips will be seen around first quarter.

Need more Tech? Subscribe below:

GD Star Rating
loading...
GD Star Rating
loading...

3 thoughts on “Nvidia Fermi Tesla GPU Insights”

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.