The component of a hybrid system that gets most of the attention is the flash layer. This makes sense since the high performance part of the hybrid promise relies on being able to serve a large majority of the data from this memory-based storage area. Hybrid storage system vendors will try to differentiate based on their ability to provide cache accuracy. There are several ways that they can accomplish this.
First, they can provide excess flash capacity. The greater the percentage of flash in relation to overall capacity of the hard disk storage area the greater the likelihood of data being served from cache. This is an expensive and potentially wasteful method for achieving a high cache hit rate but is conceivably the easiest path to getting there.
While the effective cache capacity will vary between environments, 3X the actual physical size is a safe estimate and in many environments, a 5X to10X improvement is realistic. This means that a 1TB flash area could potentially act like 5TB, bringing potentially a 5X or greater chance of a cache hit. This caching would need to be done inline so that flash capacity could be always optimized.
Finally, the hybrid storage vendors can improve the technology surrounding the flash implementation to increase cache performance. This goes beyond improving cache algorithms themselves and looks at the way the cache is implemented.
The Hard Disk LayerThe other layer of the hybrid storage system is the hard disk drives themselves. The goal with this layer is to make them as attractive as possible from a cost per GB perspective. These can be done with two methods. First, large capacity hard drives can be used to minimize the cost per GB. Today 2TB drives are commonplace and 3TB drives are being introduced, and it won't be long before 4TB drives are the standard. With each increase in capacity the price per GB comes down substantially.
The second method to drive down the cost of secondary storage is leveraging deduplication. The relatively low starting price per GB that these large capacity drives bring means that this data reduction efficiency won't be as rewarding on an incremental basis as it has historically been. But because of the gross capacity savings it can still be significant.
Hybrid Storage System Futures:Of course there is always a demand for greater performance and leveraging DRAM to handle more of the read and write cache responsibilities would be an ideal way to deliver it. While flash is significantly faster than hard disks, DRAM is significantly faster than flash, especially on writes. DRAM also doesn't have the endurance or 'wear out' issues that flash does. But, the challenge is that unlike flash, DRAM is volatile and won't survive a power failure, so using it to cache uncommitted writes is risky.
Creating a faster HDD TierEven with a highly optimized flash area at some point data with have to be read from and written to HDD. When that occurs the latency difference between a DRAM enhanced flash storage area and a high capacity disk drive is significant. Something will need to be done to close this gap and increase the performance of the HDD tier.
The only truly effective method for improving hard drive performance is to have more of them; the more spindles there are the more I/O requests that can be serviced in parallel. The problem is that most hybrid systems have chosen an appliance approach, which can limit capacity scaling options. This choice makes sense as it keeps costs down and is simpler to implement but does limit the ability to add hard drives to address performance.
Hybrid storage systems are likely to become the next wave of storage solutions that medium to large sized data centers use for most of their business critical workloads. Because these systems are built from the ground up to leverage flash they can strike the right balance of price and performance. Not all hybrid storage systems are created equal and this article should give some guidance as to what to look for in this important market. Most important is to understand the vendor's cache optimization strategies and how severe of a impact will non-cache read or write be.