Not satisfied with a landmark VMmark score that crossed the 30 tile mark for the first time, HP’s performance team went back to the benches two weeks later and took another swing at the performance crown. Well, the effort paid off, and HP significantly out-paced their two-week-old record with a score of 53.73@35 tiles in the heavy weight, 48-core category.
Using the same 8-processor HP ProLiant DL785 G6 platform as in the previous run – complete with 2.8GHz AMD Opteron 8439 SE 6-core chips and 256GB DDR2/667 – the new score comes with significant performance bumps in the javaserver, mailserver and database results achieved by the same system configuration as the previous attempt – including the same ESX 4.0 version (164009). So what changed to add an additional 5 tiles to the team’s run? It would appear that someone was unsatisfied with the storage configuration on the mailserver run.
Given that the tile ratio of the previous run ran about 6% higher than its 24-core counterpart, there may have been a small indication that untapped capacity was available. According to the run notes, the only reported changes to the test configuration – aside from the addition of the 5 LUNs and 5 clients needed to support the 5 additional tiles – was a notation indicating that the “data drive and backup drive for all mailserver VMs” we repartitioned using AutoPart v1.6.
The change in performance numbers effectively reduces the virtualization cost of the system by 15% to about $257/VM – closing-in on its 24-core sibling to within $10/VM and stretching-out its lead over “Dunnington” rivals to about $85/VM. While virtualization is not the primary application for 8P systems, this demonstrates that 48-core virtualization is definitely viable.
SOLORI’s Take: HP’s performance team has done a great job tuning its flagship AMD platform, demonstrating that platform performance is not just related to hertz or core-count but requires balanced tuning and performance all around. This improvement in system tuning demonstrates an 18% increase in incremental scalability – approaching within 3% of the 12-core to 24-core scaling factor, making it actually a viable consideration in the virtualization use case.
In recent discussions with AMD about the SR5690 chipset applications for Socket-F, AMD re-iterated that the mainstream focus for SR5690 has been Magny-Cours and the Q1/2010 launch. Given the close relationship between Istanbul and Magny-Cours – detailed nicely by Charlie Demerjian at Semi-Accurate – the bar is clearly fixed for 2P and 4P virtualization systems designed around these chips. Extrapolating from the similarities and improvements to I/O and memory bandwidth, we expect to see 2P VMmarks besting 32@23 and 4P scores over 54@39 from HP, AMD and Magny-Cours.
SOLORI’s 2nd Take: Intel has been plugging away with its Nehalem-EX for 8-way systems and – delivering 128-threads – promises to deliver some insane VMmarks. Assuming Intel’s EX scales as efficiently as AMD’s new Opterons have, extrapolations indicate performance for the 4P, 64-thread Nehalem-EX shoud fall between 41@29 and 44@31 given the current crop of speed and performance bins. Using the same methods, our calculus predicts an 8P, 128-thread EX system should deliver scores between 64@45 and 74@52.
With EX expected to clock at 2.66GHz with 140W TDP and AMD’s MCM-based Magny-Cours doing well to hit 130W ACP in the same speed bins, CIO’s balancing power and performance considerations will need to break-out the spreadsheets to determine the winners here. With both systems running 4-channel DDR3, there will be no power or price advantage given on either side to memory differences: relative price-performance and power consumption of the CPU’s will be major factors. Assuming our extrapolations are correct, we’re looking at a slight edge to AMD in performance-per-watt in the 2P segment, and a significant advantage in the 4P segment.