* SAS/SATA/SCSI Control codes commands include "read" and "write". control codes: 1. SMART 2. spin up/down Lots of s/w and research to spin down HDDs, saves energy, extends lifetime. But, if need to spin back up, may have to wait as much as 60s. Lots of predictive analytics work (pre-AI) to determine when to spin back up. 3. WRITE BARRIER: tells the HDD to flush all writes in HDD's cache and return a message when all are committed to persistent media. Useful to ensure that all dirty data is committed and not lost. Normally, writes to disk keep dirty data in HDD's cache. firmware will pick up those dirty writes eventually and flush them to media. What happens on power failures? Assuming no UPS or backup generator, or they've run out of power. Eventually no power is fed to internal components. Some systems/devices try to protect themselves from sudden power brownouts by using a battery (laptop) or a large capacitor (HDDs, etc.). If HDD detects power failure, it has to flush dirty buffers quickly before the platters lose enough speed and the head can't be moved. As power is draining in a computer system, the contents of DRAM tends to fluctuate. DRAM contents could get corrupted AS you're flushing them: worst case is to write corrupted data persistently. Write barriers are used by apps/OS to ensure that data up to time point T is flushed. users/apps can use fsync/sync/fflush/etc. This tells OS to flush its data, all the way down to disks, and disks should flush theirs. Popular uses of write barriers: - databases to store their transaction logs - file systems to write their journal - apps that need to write important meta-data If you overuse flushing commands, system will be very slow: lesson, there's always a tradeoff b/t performance and reliability/availability * failure modes in disks 1. sectors going bad, can be remapped 2. electronics dies, or motors fail, actuator that moves head 3. bits are lost over time, called "bitrot": use internal ECC to recover, or use external integrity means (e.g., RAID TBD) 4. disks don't flush data: bad firmware, manufacturer issues. 5. firmware bugs can result in a "lost write" or "misdirected write". Consumer disks, cheaper, more likely to have such issues. Enterprise drives, more expensive, are less likely. Vendors who buy many disks and repackage them, will do a lot of internal testing, "burn in", and hardening. * scrubbing After time, voltages (in SSDs) and magnetic strength in HDDs can degrade. Device has to determine what is the right threshold to count as '0' vs. '1'. But there's always an in between level where it's unclear. Rather than wait for too much degradation, we'll read the data periodically, and re-write it -- a process called scrubbing. Too much scrubbing can result in performance bottlenecks, and increase wear and tear -- ironically shortening device lifetime. Professional systems often set scrubbing to once a month (configurable). Scrubbing is not built into devices: you need OS-level s/w to do that. * secure deletion overwriting bits in an HDD can leave magnetic "echoes" of previous values. With an appropriate equipment (electron microscope), you may be able to decipher those older bits, and reconstruct (some) old data. Need to wipe data before disposing or selling any storage device. Wrong approaches: 1. drag all your files to the trashbin, but they're still there 2. empty the trashbin: better but info still on disk 3. OS doesn't delete all file content, only name and m-d. rest of info is still on the disk, somewhere. can use one of many file recovery tools or services. Note that reformatting the disk does NOT delete all data, only override some m-d to create a new structure to write new files. 4. better: go overwrite the data in its physical location. This is called "secure deletion". - simple, write all zeros. e.g. dd if=/dev/zero of=/dev/sda bs=4k - better write some "noise", random data dd if=/dev/random of=/dev/sda bs=4k - stronger: multiple passes passes with all zeros, then all 1s, then random, then alternating patterns such as 0x5a5a5a5a There are dept. of defense (DOD) and NIST standards for multiple rounds of secure erasing, with progressively stronger guarantees. Strongest is a 7-pass sequence. Above secure erasing techniques work for HDDs that "write in place", not for SSDs! TBD.