Datacenter SSDs……why the disappointment?

November 13, 2019

What’s Changed?

   Increasingly we are seeing the legacy SSD suppliers echo what Burlywood has been saying since it began:  Cloud workloads are complex and one size fits all SSDs do not address the requirements of these workloads.  No longer can the “hero” specs (IOPs, Throughput, and latency) touted by the legacy suppliers be counted on exclusively to assist with choosing an SSD for a particular environment.  Real workloads do not consist of singular block sizes, only random or only sequential data.  Western Digital puts it this way:

“Workloads are no longer just about traditional enterprise applications or databases. With the advent of cloud storage and the proliferation of analytics, edge devices and machine learning, IT organizations have to optimize their applications and infrastructure for new types of workloads.”

What to do?   

     Solutions that have been offered typically involve evaluating the workload at the host, then implementing some type of host-based software to optimize the SSD performance, cost or endurance.  A couple of problems come to mind with this type of approach. 

     First, this approach begs the question “are we unfairly burdening the host….and so affecting something else?”  The assumption that host-based compute, memory and other resources are bottomless will eventually prove naïve.

     Second, and probably more important, is another question “Is the workload I’m measuring (or assuming) the same workload my SSD is seeing?”.  The answer to this is a resounding “No”.  Between the application and the physical SSD level are numerous layers and opportunities to change the nature of the IO that comes from the application or is returned from the storage device.  Caches in the system, file systems, various kernel mode or hardware drivers can all affect what is seen by the SSD.  Virtual Machine implementations and other middleware can change workloads in an unpredictable fashion as well.  Coalescing small block writes into larger block writes, or caching small block random data are examples of the effect the hardware and software stack can have on workloads.

The Storage Networking Industry Association (SNIA) inside their Solid State Storage Initiative (SSSI) has begun a Workload Capture IO Program (WCIOP)…how’s that for a lot of acronyms?   While the workload capture methodology used does not reach down completely to the flash level, some of the results they are seeing give testament to the discussion above.  In one case, they claim 40,000 IOPS were seen at the file system level while only 1,700 IOPS were observed ahead of the SSDs.  So…somewhere in the HW/SW stack the workload was changed.  Wouldn’t it be interesting to see what the workload was that was seen by the Flash….and then optimize your SSDs to accommodate it?  Burlywood’s approach to tuning our TrueFlash software starts there.

       Burlywood has implemented a capability called TrueFlash Insight in each of the SSDs developed with TrueFlash software.  TruFlash Insight captures every IO over hours, days or weeks.  This is then analyzed to enable recommendations to be made so that optimized SSDs can be deployed.

      Implementing these recommendations in TrueFlash software (which is then loaded on a Burlywood controller enabled SSD) yields performance appropriate to the true workload and the endurance required for a particular environment.  These recommendations also often lead to hard cost reductions through the use of the appropriate grade or vendor of flash, smaller overprovisioning requirements or reduced DRAM requirements on the SSD.

      Most of the elements of today’s datacenter architectures have been designed to accommodate evolving requirements… the fundamental storage components like SSDs.  In one of their Blogs, Micron notes, “As hardware continues to progress in leaps and bounds, getting full utilization of your hardware with legacy software can be a challenge”.  Yet the response to this challenge seems to fall on the shoulders of other elements of the datacenter (host-based software or ???).  Burlywood’s response is to address utilization of the SSD inside the SSD. 

What’s happening with my IO?

      So, you are encouraged to ask yourself the following:

  • As my data makes its way to the flash media, how does its characteristics change?
  • How often is data written to my flash?
  • How often is my data erased?
  • What background processes (ie garbage collection) are being run and when?
  • What effect is overprovisioning having on my performance or SSD life?

And most importantly

  • Am I paying too much for what I am getting out of my current SSD choice?

Burlywood’s team is ready to help answer these questions.