November 3, 2009
Yesterday Jeff Bonwick (Sun) announced that deduplication is now officially part of ZFS – Sun’s Zettabyte File System that is at the heart of Sun’s Unified Storage platform and NexentaStor. In his post, Jeff touched on the major issues surrounding deduplication in ZFS:
Deduplication in ZFS is Block-level
ZFS provides block-level deduplication because this is the finest granularity that makes sense for a general-purpose storage system. Block-level dedup also maps naturally to ZFS’s 256-bit block checksums, which provide unique block signatures for all blocks in a storage pool as long as the checksum function is cryptographically strong (e.g. SHA256).
Deduplication in ZFS is Synchronous
ZFS assumes a highly multithreaded operating system (Solaris) and a hardware environment in which CPU cycles (GHz times cores times sockets) are proliferating much faster than I/O. This has been the general trend for the last twenty years, and the underlying physics suggests that it will continue.
Deduplication in ZFS is Per-Dataset
Like all zfs properties, the ‘dedup’ property follows the usual rules for ZFS dataset property inheritance. Thus, even though deduplication has pool-wide scope, you can opt in or opt out on a per-dataset basis. Most storage environments contain a mix of data that is mostly unique and data that is mostly replicated. ZFS deduplication is per-dataset, which means you can selectively enable dedup only where it is likely to help.
Deduplication in ZFS is based on a SHA256 Hash
Chunks of data — files, blocks, or byte ranges — are checksummed using some hash function that uniquely identifies data with very high probability. When using a secure hash like SHA256, the probability of a hash collision is about 2^-256 = 10^-77. For reference, this is 50 orders of magnitude less likely than an undetected, uncorrected ECC memory error on the most reliable hardware you can buy.
Deduplication in ZFS can be Verified
[If you are paranoid about potential "hash collisions"] ZFS provies a ‘verify’ option that performs a full comparison of every incoming block with any alleged duplicate to ensure that they really are the same, and ZFS resolves the conflict if not.
Deduplication in ZFS is Scalable
ZFS places no restrictions on your ability to dedup. You can dedup a petabyte if you’re so inclined. The performace of ZFS dedup will follow the obvious trajectory: it will be fastest when the DDTs (dedup tables) fit in memory, a little slower when they spill over into the L2ARC, and much slower when they have to be read from disk — but the point I want to emphasize here is that there are no limits in ZFS dedup. ZFS dedup scales to any capacity on any platform, even a laptop; it just goes faster as you give it more hardware.
Jeff Bonwick’s Blog, November 2, 2009
What does this mean for ZFS users? That depends on the application, but highly duplicated environments like virtualization stand to gain significant storage-related value from this small addition to ZFS. Considering the various ways virtualization administrators deal with virtual machine cloning, even the basic VMware template approach (not using linked-clones) will now result in significant storage savings. This restores parity between storage and compute in the virtualization stack.
What does it mean for ZFS-based storage vendors? More main memory and processor threads will be necessary to limit the impact on performance. With 6-core and 8-thread CPU’s available in the mainstream, this problem is very easily resolved. Just like the L2ARC tables consume main memory, the DDT’s will require an increase in main memory for larger datasets. Testing and configuration convergence will likely take 2-3 months once dedupe is mainstream.
When can we expect to see dedupe added to ZFS (i.e. OpenSolaris)? According to Jeff, “in roughly a month.”
Updated: 11/04/2009 – Link to Nexenta corrected. Was incorrectly linked to “nexent.com” – typo – now correctly linked to “http://www.nexenta.com”
Posted in Nexenta, Open Source Storage, Sun Storage | Tagged deduplication, nexenta, open storage, sha256, sun storage, unified storage, zfs | Leave a Comment »
October 9, 2009
This week Red Hat and Microsoft announced support of certain of their OSes as guests in their respective hypervisor implementations: Kernel Virtual Machine (KVM) and Hyper-V, respectively. This comes on the heels of Red Hat’s Enterprise Server 5.4 announcement last month.
KVM is Red Hat’s new hypervisor that leverages the Linux kernel to accelerate support for hardware and capabilities. It was Red Hat and AMD that first demonstrated live migration between AMD and Intel-based hypervisors using KVM late last year – then somewhat of a “Holy Grail” of hypervisor feats. With nearly a year of improvements and integration into their Red Hat Enterprise Server and Fedora “free and open source” offerings, Red Hat is almost ready to strike-out in a commercially viable way.
Microsoft now officially supports the following Red Hat guest operating systems in Hyper-V:
Red Hat Enterprise Linux 5.2, 5.3 and 5.4
Red Hat likewise officially supports the following Microsoft quest operating systems in KVM:
Windows Server 2003, 2008 and 2008 R2
The goal of the announcement and associated agreements between Red Hat and Microsoft was to enable a fully supported virtualization infrastructure for enterprises with Red Hat and Microsoft assets. As such, Microsoft and Red Hat are committed to supporting their respective products whether the hypervisor environment is all Red Hat, all Hyper-V or totally heterogeneous – mixing Red Hat KVM and Microsoft Hyper-V as necessary.
“With this announcement, Red Hat and Microsoft are ensuring their customers can resolve any issues related to Microsoft Windows on Red Hat Enterprise Virtualization, and Red Hat Enterprise Linux operating on Microsoft Hyper-V, regardless of whether the problem is related to the operating system or the virtualization implementation.”
- Red Hat press release, October 7, 2009
Many in the industry cite Red Hat’s adoption of KVM as a step backwards [from Xen] requiring the re-development of significant amount of support code. However, Red Hat’s use of libvirt as a common management API has allowed the change to happen much more rapidly that critics assumptions had allowed. At Red Hat Summit 2009, key Red Hat officials were keen to point out just how tasty their “dog food” is:
Tim Burke, Red Hat’s vice president of engineering, said that Red Hat already runs much of its own infrastructure, including mail servers and file servers, on KVM, and is working hard to promote KVM with key original equipment manufacturer partners and vendors.
And Red Hat CTO Brian Stevens pointed out in his Summit keynote that with KVM inside the Linux kernel, Red Hat customers will no longer have to choose which applications to virtualize; virtualization will be everywhere and the tools to manage applications will be the same as those used to manage virtualized guests.
- Xen vs. KVM, by Pam Derringer, SearchDataCenter.com
For system integrators and virtual infrastructure practices, Red Hat’s play is creating opportunities for differentiation. With a focus on light-weight, high-performance, I/O-driven virtualization applications and no need to support years-old established processes that are dragging on Xen and VMware, KVM stands to leap-frog the competition in the short term.
SOLORI’s Take: This news is good for all Red Hat and Microsoft customers alike. Indeed, it shows that Microsoft realizes that its licenses are being sold into the enterprise whether or not they run on physical hardware. With 20+:1 consolidation ratios now common, that represents a 5:1 license to hardware sale for Microsoft, regardless of the hypervisor. With KVM’s demonstrated CPU agnostic migration capabilities, this opens the door to an even more diverse virtualization infrastructure than ever before.
On the Red Hat side, it demonstrates how rapidly Red Hat has matured its offering following the shift to KVM earlier this year. While KVM is new to Red Hat, it is not new to Linux or aggressive early adopters since being added to the Linux kernel as of 2.6.20 back in September of 2007. With support already in active projects like ConVirt (VM life cycle management), OpenNebula (cloud administration tools), Ganeti, and Enomaly’s Elastic Computing Platform, the game of catch-up for Red Hat and KVM is very likely to be a short one.
Posted in AMD, Cloud, Fedora, Hyper-V, Hypervisors, Intel, KVM, New Products, Operating Systems, Quick Take, Red Hat Enterprise Linux, Servers, Virtualization, Windows Server 2008 | Tagged AMD, Fedora, guest support, Intel, KVM, live migration, Microsoft, red hat | Leave a Comment »
October 7, 2009
Johan De Gelas and crew present an interesting comparison of Dunnington, Shanghai, Istanbul and Nehalem in a new post at AnandTech this week. In the test line-up are the “top bin” parts from Intel and AMD in 4-core and 6-core incarnations:
- Intel Nehalem-EP Xeon, X5570 2.93GHz, 4-core, 8-thread
- Intel “Dunnington” Xeon, X7460, 2.66GHz, 6-core, 6-thread
- AMD “Shanghai” Opteron 2389/8389, 2.9GHz, 4-core, 4-thread
- AMD “Istanbul” Opteron 2435/8435, 2.6GHz, 6-core, 6-thread
Most importantly for virtualization systems architects is how the vCPU scheduling affects “measured” performance. The telling piece comes from the difference in comparison results where vCPU scheduling is equalized:

AnandTech's Quad Sockets v. Dual Sockets Comparison. Oct 6, 2009.
When comparing the results, De Gelas hits on the I/O factor which chiefly separates VMmark from vAPUS:
The result is that VMmark with its huge number of VMs per server (up to 102 VMs!) places a lot of stress on the I/O systems. The reason for the Intel Xeon X5570’s crushing VMmark results cannot be explained by the processor architecture alone. One possible explanation may be that the VMDq (multiple queues and offloading of the virtual switch to the hardware) implementation of the Intel NICs is better than the Broadcom NICs that are typically found in the AMD based servers.
Johan De Gelas, AnandTech, Oct 2009
This is yet another issue that VMware architects struggle with in complex deployments. The latency in “Dunnington” is a huge contributor to its downfall and why the Penryn architecture was a dead-end. Combined with 8 additional threads in the 2P form factor, Nehalem delivers twice the number of hardware execution contexts than Shanghai, resulting in significant efficiencies for Nehalem where small working data sets are involved.
When larger sets are used – as in vAPUS – the Istanbul’s additional cores allows it to close the gap to within the clock speed difference of Nehalem (about 12%). In contrast to VMmark which implies a 3:2 advantage to Nehalem, the vAPUS results suggest a closer performance gap in more aggressive virtualization use cases.
SOLORI’s Take: We differ with De Gelas on the reduction in vAPUS’ data set to accommodate the “cheaper” memory build of the Nehalem system. While this offers some advantages in testing, it also diminishes one of Opteron’s greatest strengths: access to cheap and abundant memory. Here we have the testing conundrum: fit the test around the competitors or the competitors around the test. The former approach presents a bias on the “pure performance” aspect of the competitors, while the latter is more typical of use-case testing.
We do not construe this issue as intentional bias on AnandTech’s part, however it is another vector to consider in the evaluation of the results. De Gelas delivers a report worth reading in its entirety, and we view this as a primer to the issues that will define the first half of 2010.
Posted in AMD, Ethics and Technology, Intel, Quick Take, Servers, VMWare, Virtualization | Tagged 4-core, 6-core, AMD, anandtech, comparison, dunnington, Intel, istanbul, nehalem, opteron, penryn, vapus, vmmark | Leave a Comment »
October 2, 2009
SOLORI’s top blog posts of Q3/2009
- In-the-Lab: Full ESX Test Lab in a Box – 18%
- Part 1, Setup and Getting Started with ESXi
- Part 2, Selecting a Virtual Storage Appliance (VSA)
- Part 3, Building and Provisioning the VSA
- Part 4, Creating the Cluster-in-a-Box
- Part 5, Deploying vCenter, Update Manager, et al
- Installing FreeNAS to USB Flash: Easy as 1, 2, 3 - 17%
- Preview: Installing vSphere ESXi to Flash – 11%
- Installing ESXi on the Tyan Transport GT28 – 4%
- In-the-Lab: vSphere DPM, Quirky but Functional – 3%
SOLORI’s top search engine keywords for Q3/2009
- USB flash install - 5.6%
- FreeNAS - 4.7%
- ESXi - 1.7%
- Virtual SAN - 0.6%
- AMD – 0.4%
Summary and Comments
With about 17K visits this quarter, FreeNAS and ESXi related posts are clearly the most popular. We’ve seen a great deal of traffic generated by the ESX-on-ESX series, but the popular FreeNAS project comes a close second. Judging by the search engine results, nearly 6% of our traffic find the SolutionOriented Blog trying to locate tips on installing FreeNAS or ESXi to USB flash. We’ll take that as a hint for next quarter to deliver more information on alternative and open storage solutions that fit virtualization use cases: stayed tuned.
Posted in Uncategorized | Tagged top posts | Leave a Comment »
September 28, 2009
In Part 4 of this series we created two vSphere virtual machines – one running ESX and one running ESXi – from a set of master images we can use for rapid deployment in case we want to expand the number of ESX servers in our lab. We showed you how to use NexentaStor to create snapshots of NFS and iSCSI volumes and create ZFS clone images from them. We then showed you how to stage the startup of the VSA and ESX hosts to “auto-start” the lab on boot-up.
In this segment, Part 5, we will create a VMware Virtual Center (vCenter) virtual machine and place the ESX and ESXi machines under management. Using this vCenter instance, we will complete the configuration of ESX and ESXi using some of the new features available in vCenter.
Part 5, Managing our ESX Cluster-in-a-Box
With our VSA and ESX servers purring along in the virtual lab, the only thing stopping us from moving forward with vMotion is the absence of a working vCenter to control the process. Once we have vCenter installed, we have 60-days to evaluate and test vSphere before the trial license expires.
Prepping for vCenter Server for vSphere
We are going to install Microsoft Windows Server 2003 STD for the vCenter Server operating system. We chose Server 2003 STD since we have limited CPU and memory resources to commit to the management of the lab and because our vCenter has no need of 64-bit resources in this use case.
Since one of our goals is to have a fully functional vMotion lab with reasonable performance, we want to create a vCenter virtual machine with at least the minimum requirements satisfied. In our 24GB lab server, we have committed 20GB to ESX, ESXi and the VSA (8GB, 8GB and 4GB, respectively). Our base ESXi instance consumes 2GB, leaving only 2GB for vCenter – or does it?
Memory Use in ESXi
VMware ESX (and ESXi) does a good job of conserving resources by limiting commitments for memory and CPU. This is not unlike any virtual memory capable system that puts a premium on “real” memory by moving less frequently used pages to disk. With a lot of idle virtual machines, this ability alone can create significant over-subscription possibilities for VMware; this is why it could be possible to run 32GB worth of VM’s to run on a 16-24GB host.
Do we really want this memory paging to take place? The answer – for the consolidation use cases – is usually “yes.” This is because consolidation is born out of the need to aggregate underutilized systems in a more resource efficient way. Put another way, administrators tend to provision systems based on worst case versus average use, leaving 70-80% of those resources idle in off-peak times. Under ESX’s control those underutilized resources can be re-tasked to another VM without impacting the performance of either one.
On the other hand, our ESX and VSA virtual machines are not the typical use case. We intend to fully utilized their resources and let them determine how to share them in turn. Imagine a good number of virtual machines running on our virtualized ESX hosts: will they perform well with the added hardship of memory paging? Also, when begin to use vMotion those CPU and memory resources will appear on BOTH virtualized ESX servers at the same time.
It is pretty clear that if all of our lab storage is committed to the VSA, we do not want to page its memory. Remember that any additional memory not in use by the SAN OS in our VSA is employed as ARC cache for ZFS to increase read performance. Paging memory that is assumed to be “high performance” by NexentaStor would result in poor storage throughput. The key to “recursive computing” is knowing how to anticipate resource bottlenecks and deploy around them.
This brings the question: how much memory is left after reserving 4GB for the VSA? To figure that out, let’s look at what NexentaStor uses at idle with 4GB provisioned:

NexentaStor's RAM footprint with 4GB provisioned, at idle.
As you can see, we have specified a 4GB reservation which appears as “4233 MB” of Host Memory consumed (4096MB+137MB). Looking at the “Active” memory we see that – at idle – the NexentaStor is using about 2GB of host RAM for OS and to support the couple of file systems mounted on the host ESXi server (recursively).
Additionally, we need to remember that each VM has a memory overhead to consider that increases with the vCPU count. For the four vCPU ESX/ESXi servers, the overhead is about 220MB each; the NexentaStor VSA consumes an additional 140MB with its two vCPU’s. Totaling-up the memory plus overhead identifies a commitment of at least 21,828MB of memory to run the VSA and both ESX guests – that leaves a little under 1.5GB for vCenter if we used a 100% reservation model.
Memory Over Commitment
The same concerns about memory hold true for our ESX and ESXi hosts – albeit in a less obvious way. We obviously want to “reserve” memory for required by the VMM – about 2.8GB and 2GB for ESX and ESXi respectively. Additionally, we want to avoid over subscription of memory on the host ESXi instance – if at all possible – since it will already be working running our virtual ESX and ESXi machines.
Read the rest of this entry »
Posted in In-the-Lab, Nexenta, Open Source Storage, VMWare, Virtualization | Tagged clone, esx, esx as a virtual machine, esx on esx, esxi, esxi on esxi, esxi shell, free esxi, nexenta, nexentastor, recursive computing, snapshot, unsupported shell, vcenter, virtual center server, virtual machine, VSA, vsphere, vsphere cli | 2 Comments »
September 25, 2009
HP’s ProLiant BL490c G6 server blade now tops the VMware VMmark table for 8-core systems – just squeaking past rack servers from Lenovo and Dell with a score of 24.54@17 tiles: a new 8-core record. The half-height blade was equipped with two, quad-core Intel Xeon X5570 (Nehalem-EP, 130W TDP) and 96GB ECC Registered DDR3-1333 (12x 8GB, 2-DIMM/channel) memory.
In our follow-up, we found that HP’s on-line configuration tool does not allow for DDR3-1333 memory so we went to the street for a comparison. For starters, we examined the on-line price from HP with DDR3-1066 memory and the added QLogic QMH2462 Fiber Channel adapter ($750) and additional NC360m dual-port Gigabit Ethernet controller ($320) which came to a grand total of $28,280 for the blade (about $277/VM, not including Blade chassis or SAN storage).
Stripping memory from the build-out results in a $7,970 floor to the hardware, sans memory. Going to the street to find 8GB sticks with DDR3-1333 ratings and HP support yielded the Kingston KTH-PL313K3/24G kit (3x 8GB DIMMs) of which we would need three to complete the build-out. At $4,773 per kit, the completed system comes to $22,289 (about $218/VM, not including chassis or storage) which may do more to demonstrate Kingston’s value in the market place rather than HP’s penchant for “over-priced” memory.
Now, the interesting disclosure from HP’s testing team is this:

Notes from HP's VMmark submission.
While this appears to boost memory performance significantly for HP’s latest run (compared to the 24.24@17 tiles score back in May, 2009) it does so at the risk of running the Nehalem-EP memory controller out of specification – essentially, driving the controller beyond the rated load. It is hard for us to imagine that this specific configuration would be vendor supported if used in a problematic customer installation.
SOLORI’s Take:Those of you following closely may be asking yourselves: “Why did HP choose to over-clock the memory controller in this run by pushing a 1066MHz, 2DPC limit to 1333MHz?” It would appear the answer is self-evident: the extra 6% was needed to put them over the Lenovo machine. This issue raises a new question about the VMmark validation process: “Should out of specification configurations be allowed in the general benchmark corpus?” It is our opinion that VMmark should represent off-the-shelf, fully-supported configurations only – not esoteric configuration tweaks and questionable over-clocking practices.
Should there be as “unlimited” category in the VMmark arena? Who knows? How many enterprises knowingly commit their mission critical data and processes to systems running over-clocked processors and over-driven memory controllers? No hands? That’s what we thought… Congratulations anyway to HP for clawing their way to the top of the VMmark 8-core heap…
Posted in Ethics and Technology, Intel, Quick Take, Servers, VMWare, Virtualization | Tagged 2dpc limit, bl490c G6, DDR3, hp proliant, nehalem-ep, over-clock ddr3, questionable, top score, vmmark | 2 Comments »
September 21, 2009
The Channel Register is reporting on the launch of AMD’s motherboard chipsets which will drive new socket-F based Fiorano and Kroner platforms as well as the socket G34 and C32 based Maranello and San Marino platforms. The Register also points out that no tier one PC maker is announcing socket-F solutions based on the new chipsets today. However, motherboard and “barebones” maker Supermicro is also announcing new A+ server, blade and workstation variants using the new AMD SR5690 and SP5100 chipsets, enabling:
- GPU-optimized designs: Support up to four double-width GPUs along with two CPUs and up to 3 additional high-performance add-on cards.
- Up to 10 quad-processor (MP) or dual-processor (DP) Blades in a 7U enclosure: Industry-leading density and power efficiency with up to 240 processor cores and 640GB memory per 7U enclosure.
- 6Gb/s SAS 2.0 designs: Four-socket and two-socket server and workstation solutions with double the data throughput of previous generation storage architectures.
- Universal I/O designs: Provide flexible I/O customization and investment protection.
- QDR InfiniBand support option: Integrated QDR IB switch and UIO add-on card solution for maximum I/O performance.
- High memory capacity: 16 DIMM models with high capacity memory support to dramatically improve memory and virtualization performance.
- PCI-E 2.0 Slots plus Dual HT Links (HT3) to CPUs: Enhance motherboard I/O bandwidth and performance. Optimal for QDR IB card support.
- Onboard IPMI 2.0 support: Reduces remote management costs.
Eco-Systems based on Supermicro’s venerable AS2021M – based on the NVidia nForce Pro 3600 chipset – can now be augmented with the Supermicro AS2021A variant based on AMD’s SR5690/SP5100 pairing. Besides offering HT3.0 and on-board Winbond WPCM450 KVM/IP BMC module, the new iteration includes support for the SR5690’s IOMMU function (experimentally supported by VMware), 16 DDR2 800/667/533 DIMMs, and four PCI-E 2.0 slots – all in the same, familiar 2U chassis with eight 3.5″ hot-swap bays.
AMD’s John Fruehe outlines AMD’s market approach for the new chipsets in his “AMD at Work” blog today. Based on the same basic logic/silicon, the SR5690, SR5670 and SR5650 all deliver PCI-E 2.0 and HT3.0 but at differing levels of power consumption and PCI Express lanes to their respective platforms. Paired with appropriate “power and speed” Opteron variant, these platforms offer system designers, virtualization architects and HPC vendors greater control over price-performance and power-performance constraints that drive their respective environments.
AMD chose the occasion of the Embedded Systems Conference in Boston to announce its new chipset to the world. Citing performance-per-watt advantages that could enhance embedded systems in the telecom, storage and security markets, AMD’s press release highlighted three separate vendors with products ready to ship based on the new AMD chipsets.
Posted in AMD, New Products, Servers | Tagged AMD, c32, chipset, Fiorano, g34, HT3.0, Kroner, Maranello, PCI-E 2.0, San Marino, sp5100, SR5650, SR5670, sr5690 | Comments Off
September 14, 2009
As anticipated, global DRAM prices have continued their upward trend through September, 2009. We reported on August 4, 2009 about the DDR3 and DDR2 price increases that – coupled with a short-fall in DDR3 production – have caused a temporary shift of the consumer market towards DDR2-based designs.
Last week, the Inquirer also reported that DRAM prices were on the rise and that the trend will result in parity between DDR2 and DDR3 prices. MaximumPC ran the Inquirer’s story urging its readers to buy now as the tide rises on both fronts. DRAMeXchange is reporting a significant revenue gain to the major players in the DRAM market as a result of this well orchestrated ballet of supply and demand. The net result for consumers is higher prices across the board as the DDR2/DDR3 production cross-over point is reached.

SOLORI’s Take: DDR2 is a fading bargain in the server markets, and DIMM vendors like Kingston are working to maintain a stable source of DDR2 components through the end of 2009. While still Looking at our benchmark tracking components, we project 8GB DIMMs to average $565/DIMM by the end of 2009. In the new year, expect 8GB/DDR2 to hit $600/DIMM by the end of H2/2010 with lower pricing on 8GB/DDR3-1066 – in the $500/DIMM range (if supply can keep up with new system demands created by continued growth in the virtualization market.)
| Benchmark Server Memory Pricing |
| DDR2 Series (1.8V) |
Price Jun ‘09 |
Price Sep ‘09 |
DDR3 Series (1.5V) |
Price Jun ‘09 |
Price Sep ‘09 |
4GB 800MHz DDR2 ECC Reg with Parity CL6 DIMM Dual Rank, x4 (5.4W)
|
|
$100.00 |
$117.00
up 17% |
4GB 1333MHz DDR3 ECC Reg w/Parity CL9 DIMM Dual Rank, x4 w/Therm Sen (3.96W)
|
|
$138.00 |
$151.00
up 10% |
4GB 667MHz DDR2 ECC Reg with Parity CL5 DIMM Dual Rank, x4 (5.94W)
|
|
$80.00 |
$103.00
up 29% |
4GB 1066MHz DDR3 ECC Reg w/Parity CL7 DIMM Dual Rank, x4 w/Therm Sen (5.09W)
|
|
$132.00 |
$151.00
up 15% |
8GB 667MHz DDR2 ECC Reg with Parity CL5 DIMM Dual Rank, x4 (7.236W)
|
|
$396.00 |
$433.00
up 9% |
8GB 1066MHz DDR3 ECC Reg w/Parity CL7 DIMM Dual Rank, x4 w/Therm Sen (6.36W)
|
|
$1035.00 |
$917.00
down 11.5% |
SOLORI’s 2nd Take: Samsung has been driving the DRAM roller coaster in an effort to dominate the market. With Samsung’s 40-nm 2Gb DRAM production ramping by year end, the chip maker’s infulence could create a disruptive position in the PC and server markets by driving 8GB/DDR3 prices into the sub-$250/DIMM range by 2H/2010. Meanwhile Hynix, the #2 market leader, chases with 40-nm 1Gb DDR3 giving Samsung the opportunity to repeat its 2008/2009 gambit in 2010 making it increasingly harder for competitors to get a foot-hold in the DDR3 market.
Samsung has their eye on the future with 16GB and 32GB DIMMs already exhibited with 50-nm 2Gb parts claiming a 20% power savings over the current line of memory. With 40-nm 2Gb parts, Samsung is claiming up to 30% additional power savings. To put this into perspective, eight 32GB DIMMs would could about 60% of the power consumed by 32 8GB DIMMs (requiring a 4P+ server). In a virtualization context, this is enough memory to enable 100 virtual machines with 2.5GB of memory each without over subscription. Realistically, we expect to see 16GB DDR3 DIMMs at $1,200/DIMM by 2H/2010 – if everything goes according to plan.
Posted in AMD, Intel, Quick Take, Servers | Tagged ddr2, DDR3, dram, memory pricing bias, price comparison | Comments Off
September 13, 2009
Andreas Galistel at NordicHardware posted an article showing a system running a pair of engineering samples of the Magny-Cours processor running at 3.0GHz. Undoubtedly these images were culled from a report “leaked” on XtremeSystems forums showing a “DINAR2″ motherboard with SR5690 chipset – in single and dual processor installation – running Magny-Cours at the more typical pre-release speed of 1.7GHz.
We know that Magny-Cours is essentially a MCM of Istanbul delivered in the rectangular socket G34 package. One thing illuminating about the two posts is the reported “reduction” in L3 cache from 12MB (6MB x 2 in MCM) to 10MB (2 x 5MB in MCM). Where did the additional cache go? That ’s easy: since a 2P Magny-Cours installation is essentially a 4P Istanbul configuration, these processors have the new HT Assist feature enabled – giving 1MB of cache from each chip in the MCM to HT Assist.
“wPrime uses a recursive call of Newton’s method for estimating functions, with f(x)=x2-k, where k is the number we’re sqrting, until Sgn(f(x)/f’(x)) does not equal that of the previous iteration, starting with an estimation of k/2. It then uses an iterative calling of the estimation method a set amount of times to increase the accuracy of the results. It then confirms that n(k)2=k to ensure the calculation was correct. It repeats this for all numbers from 1 to the requested maximum.”
- wPrime site
Another thing intriguing about the XtremeSystems post in particular is the reported wPrime 32M and 1024M completion times. Compared to the hyper-threading-enabled 2P Xeon W5590 (130W TDP) running wPrime 32M at 3.33GHz (3.6GHz turbo) in 3.950 seconds, the 2P 3.0GHz Magny-Cours completed wPrime 32M in an unofficial 3.539 seconds – about 10% quicker while running a 10% slower clock. From the myopic lens of this result, it would appear AMD’s choice of “real cores” versus hyper-threading delivers its punch.
SOLORI’s Take: As a “reality check” we can compared the reigning quad-socked, quad-core Opteron 8393 SE result in wPrime 32M and wPrime 1024M at 3.90 and 89.52 seconds, respectively. Adjusted for clock and core count versus its Shanghai cousin, the Magny-Cours engineering samples – at 3.54 and 75.77 seconds, respectively – turned-in times about 10% slower than our calculus predicted. While still “record breaking” for 2P systems, we expected the Magny-Cours/Istanbul cores to out-perform Shanghai clock-per-clock – even at this stage of the game.
Due to the multi-threaded nature of the wPrime benchmark, it is likely that the HT Assist feature – enabled in a 2P Magny-Cours system by default – is the cause of the discrepancy. By reducing the available L3 cache by 1MB per die – 4MB of L3 cache total – HT Assist actually could be creating a slow-down. However, there are several things to remember here:
- These are engineering samples qualified for 1.7GHz operation
- Speed enhancements were performed with tools not yet adapted to Magny-Cours
- The author indicated a lack of control over AMD’s Cool ‘n Quiet technology which could have made “as tested” core clocks somewhat lower than what CPUz reported (at least during the extended tests)
- It is speculated that AMD will release Magny-Cours at 2.2GHz (top bin) upon release, making the 2.6+ GHz results non-typical
- The BIOS and related dependencies are likely still being “baked”
Looking at the more “typical” engineering sample speed tests posted on the XtremeSystems’ forum tracks with the 3.0GHz overclock results at a more “typical” clock speed of 2.6GHz for 2P Magny-Cours: 3.947 seconds and 79.625 seconds for wPrime 32M and 1024M, respectively. Even at that speed, the 24-core system is on par with the 2P Nehalem system clocked nearly a GHz faster. Oddly, Intel reports the W5590 as not supporting “turbo” or hyper-threading although it is clear that Intel’s marketing is incorrect based on actual testing.
Assuming Magny-Cours improves slightly on its way to market, we already know how 24-core Istanbul stacks-up against 16-thread Nehalem in VMmark and what that means for Nehalem-EP. This partly explains the marketing shift as Intel tries to position Nehalep-EP as a destined for workstations instead of servers. Whether or not you consider this move a prelude to the ensuing Nehalem-EX v. Magny-Cours combat to come or an attempt to keep Intel’s server chip power average down by eliminating the 130W+ parts from the “server” list, Intel and AMD will each attempt win the war before the first shot is fired. Either way, we see nothing that disrupts the price-performance and power-performance comparison models that dominate the server markets.
[Ed: The 10% difference is likely due to the fact that the author was unable to get "more than one core" clocked at 3.0GHz. Likewise, he was uncertain that all cores were reliably clocking at 2.6GHz for the longer wPrime tests. Again, this engineering sample was designed to run at 1.7GHz and was not likely "hand picked" to run at much higher clocks. He speculated that some form of dynamic core clocking linked to temperature was affecting clock stability - perhaps due to some AMD-P tweaks in Magny-Cours.]
Posted in AMD, Ethics and Technology, Intel, New Products, Quick Take, Servers | Tagged 8393 SE, AMD SR5690, DINAR2, istanbul, magny-cours, nehalem-ep, nehalem-ex, opteron, shangai, unofficial benchmarks, wprime, xeon | Comments Off
September 9, 2009
The new 1st runner-up spot for VMmark in the “8 core” category was taken yesterday by Dell’s R710 – just edging-out the previous second spot HP ProLiant BL490 G6 by 0.1% – a virtual dead heat. Equipped with a pair of Xeon X5570 ($1386/ea, bulk list) and 96GB registered DDR3/1066 (12×8GB), the 2U, rack mount R710 weighs-in with a tile ratio of 1.43 over 102 VMs. :
- Dell R710 w/redundant high-output power supply, ($18,209)
- 2 x Intel Xeon X5570 Processors (included)
- 96GB ECC DDR3/1066 (12×8GB) (included)
- 2 x Broadcom NexXtreme II 5709 dual-port GigabitEthernet w/TOE (included)
- 1 x Intel PRO 1000VT quad-port GigabitEthernet (1x PCIe-x4 slot, $529)
- 3 x QLogic QLE2462 FC HBA (1x PCIe slot, $1,219/ea)
- 1 x LSI1078 SAS Controller (on-board)
- 8 x 15K SAS OS drive, RAID10 (included)
- Required ProSupport package ($2,164)
- Total as Configured: $24,559 ($241/VM, not including storage)
Three Dell/EMC CX3-40f arrays were used as the storage backing of the test. The storage system included 8GB cache, 2 enclosures and 15, 15K disks per array delivering 19 LUNs at about 300GB each. Intel’s Hyper-Threading and “Turbo Boost” were enabled for 8-thread, 3.33GHz core clocking as was VT; however embedded SATA and USB were disabled as is common practice.
At about $1,445/tile ($241/VM) the new “second dog” delivers its best at a 20% price premium over Lenovo’s “top dog” – although the non-standard OS drive configuration makes-up a half of the difference, with Dell’s mandatory support package making-up the remainder. Using a simple RAID1 SAS and eliminating the support package would have droped the cost to $20,421 – a dead heat with Lenovo at $182/VM.
Comparing the Dell R710 the 2P, 12-core benchmark HP DL385 G6 Istanbul system at 15.54@11 tiles:
- HP DL385 G6 ($5,840)
- 2 x AMD 2435 Istanbul Processors (included)
- 64GB ECC DDR2/667 (8×8GB) ($433/DIMM)
- 2 x Broadcom 5709 dual-port GigabitEthernet (on-board)
- 1 x Intel 82571EB dual-port GigabitEthernet (1x PCIe slot, $150/ea)
- 1 x QLogic QLE2462 FC HBA (1x PCIe slot, $1,219/ea)
- 1 x HP SAS Controller (on-board)
- 2 x SAS OS drive (included)
- $10,673/system total (versus $14,696 complete from HP)
Direct pricing shows Istanbul’s numbers at $1,336/tile ($223/VM) which is a 7.5% savings per-VM over the Dell R710. Going to the street – for memory only – changes the Istanbul picture to $970/tile ($162/VM) representing a 33% savings over the R710.
SOLORI’s Take: Istanbul continues to offer a 20-30% CAPEX value proposition against Nehalem in the virtualization use case – even without IOMMU and higher memory bandwidth promised in upcoming Magny-Cours. With the HE parts running around $500 per processor, the OPEX benefits are there for Istanbul too. It is difficult to understand why HP wants to charge $900/DIMM for 8GB PC-5300 sticks when they are available on the street for 50% less – that’s a 100% markup. Looking at what HP charges for 8GB DDR3/1066 – $1,700/DIM – they are at least consistent. HP’s memory pricing practice makes one thing clear – customers are not buying large memory configurations from their system vendors…
On the contrary, Dell appears to be happy to offer decent prices on 8GB DDR3/1066 with their R710 at approximately $837/DIMM – almost par with street prices. Looking to see if this parity held up with Dell’s AMD offerings, we examined the prices offered with Dell’s R805: while – at $680/DIMM – Dell’s prices were significantly better than HP’s, they still exceeded the market by 50%. Still, we were able to configure a Dell R805 with AMD 2435’s for much less than the equivalent HP system:
- Dell R805 w/redundant power ($7,214)
- 2 x AMD 2435 Istanbul Processors (included)
- 64GB ECC DDR2/667 (8×8GB) ($433/ea, street)
- 4 x Broadcom 5708 GigabitEthernet (on-board)
- 1 x Intel PRO 100oPT dual-port GigabitEthernet (1x PCIe slot, included)
- 1 x QLogic QLE2462 FC HBA (1x PCIe slot, included)
- 1 x Dell PERC SAS Controller (on-board)
- 2 x SAS OS drive (included)
- $10,678/system total (versus $12,702 complete from Dell)
This offering from Dell should be able to deliver equivalent performance with HP’s DL385 G6 and likewise savings/VM compared to the Nehalem-based R710. Even at the $12,702 price as delivered from Dell, the R805 represents a potential $192/VM price point – about $50/VM (25%) savings over the R710.
Posted in AMD, Ethics and Technology, Intel, Quick Take, Servers, VMWare, Virtualization | Tagged 2435, AMD, benchmark, ddr2, DDR3, Intel, istanbul, memory pricing bias, nehalem, vmmark, vmware, x5570 | 2 Comments »