* SSD levels

Initially SSDs stored a single level of voltage in each cell.  This was
called a Single Level Cell (SLC).

Then, vendors figured out how to store 4 levels and read them reliably: that
is called a Multi-Level Cell (MLC).

Then came Three-Level Cell (TLC), and QLC (Quad-level Cells).

SLC vs. QLC

Reliability: SLC is more reliable.  You can erase SLC cells millions of
times before they wear out; QLC cells can be erased only a few 100s of
times.

Example: Samsung 8TB QLC.  Assume each cell can be erased only 100 times.
If the FTL works well, then you'll have to have overwritten the single
device 100 times, or written 8TB * 100 == 800TB.

Speed: SLCs are faster.  QLCs, have to work harder to distinguish b/t
different levels, and also require more ECC to correct mistakes.

Complexity: QLCs require more complex firmware, more chance for bugs.

Cost: QLCs are cheaper per gig than SLC

MLC and TLC are the same, in b/t SLC and QLC.

* erasing or discarding blocks

HDDs are "write in place" so erasing a file doesn't require coordinating w/
the HDD (modulo "secure erase").

With SSDs, must tell device when blocks are no longer used by any file, so
the SSD can reclaim them as garbage late on.  Otherwise, those blocks will
remain valid indefinitely (a form of "memory leak").

For that reason, the SCSI protocol was enhanced to include a new commands
called "TRIM": the TRIM command informs a device that one or more blocks are
no longer needed.

Modern f/s had to be modified, so that ops that delete objects, will issue
TRIM commands to the dev driver.

* read disturb

If you read the same value over and over many times, it could "disturb"
adjacent bits (bits may flip).  Usually SSD f/w handles this.  Higher level
s/w may do "scrubbing" (periodic reading and rewriting).

* Optimizations

HDDs behave best when written to sequentially, terrible for random
reads/writes.

SSDs inherently write to random internal LBAs, and can do so much faster
than HDDs.  So SSDs will be even faster for sequential workloads.

If you write randomly to SSDs, it leads to more fragmentation and the need
for GC.  GC can result in very high tail latencies (several seconds).

Best is if workloads using SSDs try to write to it sequentially.

* Special coding

Erasing an SSD bank turns all bits to 1s.

To "write" we only need to turn some 1s to 0s.

To turn 0s back to 1s, need to re-erase bank.

But, any other cells that are still 1s, can still be turned into 0s, thus
encoding additional information, and allowing for more effective capacity.

* SMR

SMR: Shingled Magnetic Recording

Using smaller read head than write head.  Writes to adjacent tracks
(shingles) may eventually destroy old data, so have to copy it before it's
lost.  But can still read old data before it's destroyed, b/c the read head
is much smaller.

SMRs use "zones" that can be appended to only.  When a zone (several gigs)
is full, it has to be copied elsewhere -- just like SSD GC.

Thus, need an indirection layer: Singled Translation Layer (STL).

Part of an SMR drive uses conventional (non shingled) zones for storing m-d
like the STL itself, and caching new writes.

SMR f/w is much more complex than Conventional Magnetic Recording drives
(CMR).

SMRs promised at least 4-10x more capacity in the same form factor... around
mid 2000s.

SMRs suffer from mechanical latencies as well as GC tail latencies, hence
they're not getting much use commercially.

* long term storage

Project Silica: Microsoft, storing data into glass, using a femto-second
laser.  A WORM device (Write Once, Read Many).

DNA: jointly w/ Microsoft, but others too.  Encoding data in DNA strands,

See papers in recent special issue in ACM Transaction on Storage (TOS).

* Next time

Network storage