Posts Tagged ‘comparison’

h1

Quick Take: Nehalem/Istanbul Comparison at AnandTech

October 7, 2009

Johan De Gelas and crew present an interesting comparison of Dunnington, Shanghai, Istanbul and Nehalem in a new post at AnandTech this week. In the test line-up are the “top bin” parts from Intel and AMD in 4-core and 6-core incarnations:

  • Intel Nehalem-EP Xeon, X5570 2.93GHz, 4-core, 8-thread
  • Intel “Dunnington” Xeon, X7460, 2.66GHz, 6-core, 6-thread
  • AMD “Shanghai” Opteron 2389/8389, 2.9GHz, 4-core, 4-thread
  • AMD “Istanbul” Opteron 2435/8435, 2.6GHz, 6-core, 6-thread

Most importantly for virtualization systems architects is how the vCPU scheduling affects “measured” performance. The telling piece comes from the difference in comparison results where vCPU scheduling is equalized:

AnandTech's Quad Sockets v. Dual Sockets Comparison. Oct 6,  2009.

AnandTech's Quad Sockets v. Dual Sockets Comparison. Oct 6, 2009.

When comparing the results, De Gelas hits on the I/O factor which chiefly separates VMmark from vAPUS:

The result is that VMmark with its huge number of VMs per server (up to 102 VMs!) places a lot of stress on the I/O systems. The reason for the Intel Xeon X5570’s crushing VMmark results cannot be explained by the processor architecture alone. One possible explanation may be that the VMDq (multiple queues and offloading of the virtual switch to the hardware) implementation of the Intel NICs is better than the Broadcom NICs that are typically found in the AMD based servers.

Johan De Gelas, AnandTech, Oct 2009

This is yet another issue that VMware architects struggle with in complex deployments. The latency in “Dunnington” is a huge contributor to its downfall and why the Penryn architecture was a dead-end. Combined with 8 additional threads in the 2P form factor, Nehalem delivers twice the number of hardware execution contexts than Shanghai, resulting in significant efficiencies for Nehalem where small working data sets are involved.

When larger sets are used – as in vAPUS – the Istanbul’s additional cores allows it to close the gap to within the clock speed difference of Nehalem (about 12%). In contrast to VMmark which implies a 3:2 advantage to Nehalem, the vAPUS results suggest a closer performance gap in more aggressive virtualization use cases.

SOLORI’s Take: We differ with De Gelas on the reduction in vAPUS’ data set to accommodate the “cheaper” memory build of the Nehalem system. While this offers some advantages in testing, it also diminishes one of Opteron’s greatest strengths: access to cheap and abundant memory. Here we have the testing conundrum: fit the test around the competitors or the competitors around the test. The former approach presents a bias on the “pure performance” aspect of the competitors, while the latter is more typical of use-case testing.

We do not construe this issue as intentional bias on AnandTech’s part, however it is another vector to consider in the evaluation of the results. De Gelas delivers a report worth reading in its entirety, and we view this as a primer to the issues that will define the first half of 2010.

h1

AMD’s New Opteron

April 23, 2009

AMD’s announcement yesterday came with some interesting technical tidbits about its new server platform strategy that will affect its competitiveness in the virtualization marketplace. I want to take a look at the two new server platforms and contrast them with what is available today and see what that means for our AMD-based eco-systems in the months to come.

Initially, the introduction of more cores to the mix is good for virtualization allowing us to scale more gracefully and confidently as compared to hyper-threading. While hyper-threading is reported to increase scheduling efficiency in vSphere, it is not effectively a core. Until Nehalem-EX is widely available and we can evaluate 4P performance of hyper-threading in loaded virtual environments I’m comfortable awarding hyper-threading a 5% performance bonus – all things being equal.

AMD's Value Shift

AMD's Value Shift

What’s Coming?

That said, where is AMD going with Opteron in the near future and how will that affect Opteron-based eco-systems? At least one thing is clear: compatibility is assured and performance – at the same thermal footprint – will go up. So let’s look at the ramifications of the new models/sockets and compare them to our well-known 2000/8000 series to glimpse the future.

A fundamental shift away from DDR2 and towards DDR3 for the new sockets is a major difference. Like the Phenom II, Core i7 and Nehalem processors, the new Opteron will be a DDR3 specimen. Assuming DDR3 pricing continues to trend down and the promise of increased memory bandwidth is realized in the HT3/DCA2 and Opteron, DDR3 will deliver solid performance in 4000 and 6000 configurations.

Opteron 6000: Socket G34

From the announcement, G34 is analogous to the familiar 8000-series line with one glaring exception: no 8P on the road-map. In the 2010-2011 time frame, we’ll see 8-core, 12-core and 16-core variants with a new platform being introduced in 2012. Meanwhile, the 6000-series will support 4-channels of “unbuffered” or “registered” DDR3 across up to 12DIMMs per socket (3 banks by 4 channels). Assuming 6000 will support DDR3-1600, the theoretical bandwidth of a 4 channel design would yield memory bandwidths in the 40-50GB/sec range per link (about twice Istanbul’s).

AMD 2010-2013 Road-map

AMD 2010-2013 Road-map

With a maximum module density of 16GB, a 12-DIMM by 4-socket system could theoretically contain 768GB of DDR3 memory. In 2011, that equates to 12GB/core in a 4-way, 64-core server. At 4:1 consolidation ratios for typical workloads, that’s 256 VM/host at 3GB/VM (4GB/VM with page sharing) and an average of 780MB/sec of memory bandwidth per VM. I think the math holds-up pretty well against today’s computing norms and trends. Read the rest of this entry ?