* SSD levels Initially SSDs stored a single level of voltage in each cell. This was called a Single Level Cell (SLC). Then, vendors figured out how to store 4 levels and read them reliably: that is called a Multi-Level Cell (MLC). Then came Three-Level Cell (TLC), and QLC (Quad-level Cells). SLC vs. QLC Reliability: SLC is more reliable. You can erase SLC cells millions of times before they wear out; QLC cells can be erased only a few 100s of times. Example: Samsung 8TB QLC. Assume each cell can be erased only 100 times. If the FTL works well, then you'll have to have overwritten the single device 100 times, or written 8TB * 100 == 800TB. Speed: SLCs are faster. QLCs, have to work harder to distinguish b/t different levels, and also require more ECC to correct mistakes. Complexity: QLCs require more complex firmware, more chance for bugs. Cost: QLCs are cheaper per gig than SLC MLC and TLC are the same, in b/t SLC and QLC. * erasing or discarding blocks HDDs are "write in place" so erasing a file doesn't require coordinating w/ the HDD (modulo "secure erase"). With SSDs, must tell device when blocks are no longer used by any file, so the SSD can reclaim them as garbage late on. Otherwise, those blocks will remain valid indefinitely (a form of "memory leak"). For that reason, the SCSI protocol was enhanced to include a new commands called "TRIM": the TRIM command informs a device that one or more blocks are no longer needed. Modern f/s had to be modified, so that ops that delete objects, will issue TRIM commands to the dev driver. * read disturb If you read the same value over and over many times, it could "disturb" adjacent bits (bits may flip). Usually SSD f/w handles this. Higher level s/w may do "scrubbing" (periodic reading and rewriting). * Optimizations HDDs behave best when written to sequentially, terrible for random reads/writes. SSDs inherently write to random internal LBAs, and can do so much faster than HDDs. So SSDs will be even faster for sequential workloads. If you write randomly to SSDs, it leads to more fragmentation and the need for GC. GC can result in very high tail latencies (several seconds). Best is if workloads using SSDs try to write to it sequentially. * Special coding Erasing an SSD bank turns all bits to 1s. To "write" we only need to turn some 1s to 0s. To turn 0s back to 1s, need to re-erase bank. But, any other cells that are still 1s, can still be turned into 0s, thus encoding additional information, and allowing for more effective capacity. * SMR SMR: Shingled Magnetic Recording Using smaller read head than write head. Writes to adjacent tracks (shingles) may eventually destroy old data, so have to copy it before it's lost. But can still read old data before it's destroyed, b/c the read head is much smaller. SMRs use "zones" that can be appended to only. When a zone (several gigs) is full, it has to be copied elsewhere -- just like SSD GC. Thus, need an indirection layer: Singled Translation Layer (STL). Part of an SMR drive uses conventional (non shingled) zones for storing m-d like the STL itself, and caching new writes. SMR f/w is much more complex than Conventional Magnetic Recording drives (CMR). SMRs promised at least 4-10x more capacity in the same form factor... around mid 2000s. SMRs suffer from mechanical latencies as well as GC tail latencies, hence they're not getting much use commercially. * long term storage Project Silica: Microsoft, storing data into glass, using a femto-second laser. A WORM device (Write Once, Read Many). DNA: jointly w/ Microsoft, but others too. Encoding data in DNA strands, See papers in recent special issue in ACM Transaction on Storage (TOS). * Next time Network storage