How Do SSDs Work?

This site may earn chapter commissions from the links on this page. Terms of employ.

Here at ExtremeTech, nosotros've often discussed the divergence betwixt different types of NAND structures — vertical NAND versus planar, or multi-level prison cell (MLC) versus triple-level cells (TLC) and quad-level cells (QLC). Now, permit's talk about the more basic relevant question: How practice SSDs piece of work in the kickoff identify, and how practice they compare with newer technologies, like Intel's not-volatile storage technology, Optane?

To understand how and why SSDs are different from spinning discs, we need to talk a little bit about hard drives. A hard drive stores data on a series of spinning magnetic disks called platters. There's an actuator arm with read/write heads attached to it. This arm positions the read-write heads over the correct area of the drive to read or write information.

Because the drive heads must align over an expanse of the deejay in club to read or write data, and the disk is constantly spinning, there'due south a delay before data can exist accessed. The drive may need to read from multiple locations in order to launch a program or load a file, which ways it may have to wait for the platters to spin into the proper position multiple times earlier it can complete the control. If a bulldoze is comatose or in a low-power state, it tin take several seconds more than for the deejay to spin up to full power and begin operating.

From the very kickoff, information technology was clear that hard drives couldn't possibly match the speeds at which CPUs could operate. Latency in HDDs is measured in milliseconds, compared with nanoseconds for your typical CPU. I millisecond is 1,000,000 nanoseconds, and it typically takes a hard drive x-xv milliseconds to find data on the drive and begin reading information technology. The hard drive manufacture introduced smaller platters, on-disk retentivity caches, and faster spindle speeds to counteract this trend, simply there'southward only and so fast drives tin spin. Western Digital's 10,000 RPM VelociRaptor family is the fastest set of drives e'er built for the consumer market, while some enterprise drives spun as quickly as 15,000 RPM. The problem is, fifty-fifty the fastest spinning bulldoze with the largest caches and smallest platters are still achingly slow as far as your CPU is concerned.

How SSDs Are Different

"If I had asked people what they wanted, they would have said faster horses." — Henry Ford

Solid-state drives are called that specifically because they don't rely on moving parts or spinning disks. Instead, information is saved to a pool of NAND flash. NAND itself is made up of what are called floating gate transistors. Unlike the transistor designs used in DRAM, which must be refreshed multiple times per second, NAND flash is designed to retain its charge state even when not powered up. This makes NAND a type of not-volatile memory.

Flash cell structure

Image past Cyferz at Wikipedia, Creative Commons Attribution-Share Alike 3.0.

The diagram above shows a unproblematic flash jail cell design. Electrons are stored in the floating gate, which so reads as charged "0" or not-charged "1." Yes, in NAND flash, a 0 means data is stored in a cell — information technology'south the opposite of how we typically recall of a aught or one. NAND flash is organized in a grid. The unabridged grid layout is referred to every bit a block, while the individual rows that make up the grid are chosen a page. Common page sizes are 2K, 4K, 8K, or 16K, with 128 to 256 pages per block. Block size therefore typically varies between 256KB and 4MB.

One advantage of this system should exist immediately obvious. Considering SSDs have no moving parts, they can operate at speeds far above those of a typical HDD. The following nautical chart shows the access latency for typical storage mediums given in microseconds.

NAND is nowhere near as fast as main memory, merely it's multiple orders of magnitude faster than a hard bulldoze. While write latencies are significantly slower for NAND wink than read latencies, they still outstrip traditional spinning media.

At that place are 2 things to detect in the in a higher place nautical chart. First, note how adding more than bits per prison cell of NAND has a significant impact on the memory'south performance. It's worse for writes as opposed to reads — typical triple-level-cell (TLC) latency is 4x worse compared with single-level jail cell (SLC) NAND for reads, but 6x worse for writes. Erase latencies are likewise significantly impacted. The impact isn't proportional, either — TLC NAND is about twice as dull as MLC NAND, despite belongings just l% more data (three bits per cell, instead of two). This is also true for QLC drives, which store even more bits at varying voltage levels within the same cell.

The reason TLC NAND is slower than MLC or SLC has to do with how data moves in and out of the NAND cell. With SLC NAND, the controller just needs to know if the bit is a 0 or a 1. With MLC NAND, the cell may have four values — 00, 01, 10, or 11. With TLC NAND, the cell tin can take eight values, and QLC has 16. Reading the proper value out of the cell requires the memory controller to use a precise voltage to define whether any particular cell is charged.

Reads, Writes, and Erasure

One of the functional limitations of SSDs is while they can read and write data very quickly to an empty drive, overwriting information is much slower. This is because while SSDs read information at the page level (pregnant from individual rows within the NAND memory grid) and tin can write at the folio level, assuming surrounding cells are empty, they can only erase data at the block level. This is because the deed of erasing NAND flash requires a loftier amount of voltage. While you can theoretically erase NAND at the page level, the corporeality of voltage required stresses the individual cells effectually the cells that are being re-written. Erasing data at the block level helps mitigate this problem.

The simply way for an SSD to update an existing folio is to copy the contents of the unabridged block into memory, erase the cake, and then write the contents of the old block + the updated folio. If the drive is total and there are no empty pages bachelor, the SSD must first scan for blocks that are marked for deletion merely that haven't been deleted yet, erase them, and so write the data to the at present-erased folio. This is why SSDs can become slower as they age — a mostly-empty bulldoze is total of blocks that can exist written immediately, a generally-full drive is more likely to be forced through the entire program/erase sequence.

If y'all've used SSDs, y'all've likely heard of something called "garbage collection." Garbage collection is a groundwork process that allows a drive to mitigate the performance bear upon of the program/erase cycle past performing certain tasks in the background. The following image steps through the garbage drove process.

Garbage collection

Image courtesy of Wikipedia

Note in this example, the bulldoze has taken advantage of the fact that it can write very chop-chop to empty pages by writing new values for the first iv blocks (A'-D'). It's also written 2 new blocks, East and H. Blocks A-D are now marked equally stale, meaning they contain data the drive has marked equally out-of-date. During an idle catamenia, the SSD will move the fresh pages over to a new cake, erase the old cake, and mark it equally free space. This means the side by side time the SSD needs to perform a write, information technology can write direct to the now-empty Block X, rather than performing the program/erase cycle.

The next concept I want to hash out is TRIM. When you delete a file from Windows on a typical hard bulldoze, the file isn't deleted immediately. Instead, the operating organization tells the hard drive it can overwrite the physical expanse of the disk where that data was stored the next time it needs to perform a write. This is why information technology'south possible to undelete files (and why deleting files in Windows doesn't typically clear much physical disk infinite until y'all empty the recycling bin). With a traditional HDD, the Os doesn't need to pay attention to where data is being written or what the relative land of the blocks or pages is. With an SSD, this matters.

The TRIM control allows the operating system to tell the SSD it can skip rewriting certain data the side by side time it performs a block erase. This lowers the total corporeality of information the drive writes and increases SSD longevity. Both reads and writes damage NAND flash, but writes exercise far more damage than reads. Fortunately, cake-level longevity has not proven to be an issue in modern NAND flash. More information on SSD longevity, courtesy of the Tech Report, can be found here.

The last 2 concepts we want to talk about are wear leveling and write amplification. Because SSDs write data to pages merely erase data in blocks, the amount of data existence written to the drive is ever larger than the actual update. If you make a alter to a 4KB file, for instance, the entire block that 4K file sits within must be updated and rewritten. Depending on the number of pages per block and the size of the pages, you might terminate upwardly writing 4MB worth of data to update a 4KB file. Garbage collection reduces the impact of write amplification, as does the TRIM command. Keeping a significant chunk of the drive free and/or manufacturer over-provisioning can also reduce the impact of write amplification.

Wear leveling refers to the do of ensuring certain NAND blocks aren't written and erased more frequently than others. While wear leveling increases a drive's life expectancy and endurance by writing to the NAND every bit, it can actually increase write distension. In other to distribute writes evenly across the disk, it'south sometimes necessary to programme and erase blocks even though their contents haven't actually changed. A expert wearable leveling algorithm seeks to balance these impacts.

The SSD Controller

It should be obvious past now SSDs require much more than sophisticated control mechanisms than difficult drives do. That'south not to diss magnetic media — I actually call back HDDs deserve more than respect than they are given. The mechanical challenges involved in balancing multiple read-write heads nanometers to a higher place platters that spin at 5,400 to 10,000 RPM are nothing to sneeze at. The fact that HDDs perform this claiming while pioneering new methods of recording to magnetic media and somewhen air current upwards selling drives at 3-5 cents per gigabyte is simply incredible.

SSD controller

A typical SSD controller

SSD controllers, however, are in a class by themselves. They oftentimes have a DDR3 or DDR4 retentiveness puddle to help with managing the NAND itself. Many drives also incorporate unmarried-level cell caches that deed equally buffers, increasing drive operation past dedicating fast NAND to read/write cycles. Because the NAND flash in an SSD is typically connected to the controller through a series of parallel memory channels, you can recollect of the drive controller as performing some of the same load-balancing work as a high-end storage array — SSDs don't deploy RAID internally but wear leveling, garbage drove, and SLC enshroud management all accept parallels in the large iron world.

Some drives as well apply information compression algorithms to reduce the full number of writes and ameliorate the drive'due south lifespan. The SSD controller handles error correction, and the algorithms that control for single-bit errors have get increasingly complex as time has passed.

Unfortunately, we can't go into too much detail on SSD controllers considering companies lock down their various cloak-and-dagger sauces. Much of NAND flash'south operation is determined by the underlying controller, and companies aren't willing to lift the chapeau too far on how they do what they do, lest they mitt a competitor an advantage.

Interfaces

In the beginning, SSDs used SATA ports, just like hard drives. In recent years, nosotros've seen a shift to Thou.2 drives — very thin drives, several inches long, that slot directly into the motherboard (or, in a few cases, into a mounting bracket on a PCIe riser carte du jour. A Samsung 970 EVO Plus drive is shown beneath.

NVMe drives offer higher performance than traditional SATA drivers considering they support a faster interface. Conventional SSDs attached via SATA tiptop out at ~550MB/due south in terms of practical read/write speeds. Thousand.2 drives are capable of substantially faster performance into the 3.2GB/s range.

The Road Alee

NAND flash offers an enormous improvement over hard drives, but information technology isn't without its ain drawbacks and challenges. Drive capacities and price-per-gigabyte are expected to keep to rising and autumn respectively, but there's picayune run a risk SSDs volition grab hard drives in price-per-gigabyte. Shrinking process nodes are a significant challenge for NAND flash — while near hardware improves every bit the node shrinks, NAND becomes more fragile. Data retention times and write functioning are intrinsically lower for 20nm NAND than 40nm NAND, even if data density and full capacity are vastly improved. Thus far, we've seen drives with up to 96 layers in-market place, and 128 layers seems plausible at this point. Overall, the shift to 3D NAND has helped improve density without shrinking process nodes or relying on planar scaling.

Thus far, SSD manufacturers have delivered better operation by offering faster data standards, more bandwidth, and more channels per controller — plus the use of SLC caches we mentioned earlier. Nonetheless, in the long run, information technology's assumed NAND will exist replaced past something else.

What that something else volition expect similar is nevertheless open up for debate. Both magnetic RAM and phase modify retention have presented themselves every bit candidates, though both technologies are still in early stages and must overcome pregnant challenges to actually compete as a replacement to NAND. Whether consumers would detect the deviation is an open up question. If you've upgraded from an HDD to an SSD and and then upgraded to a faster SSD, you're likely aware the gap between HDDs and SSDs is much larger than the SSD-to-SSD gap, even when upgrading from a relatively pocket-size drive. Improving access times from milliseconds to microseconds matters a bully deal, merely improving them from microseconds to nanoseconds might fall below what humans tin really perceive in most cases.

Optane Retrenches in the Enterprise Marketplace

From 2017 through early on 2021, Intel offered its Optane memory as an culling for NAND flash in the consumer marketplace. In early on 2021, the company appear it would no longer sell Optane drives in the consumer infinite, except for the H20 hybrid drive. H20 combines QLC NAND with an Optane cache to heave overall performance while reducing drive cost. While the H20 is an interesting and unique product, it doesn't offer the aforementioned kind of pinnacle-end performance Optane SSDs did.

Optane will remain in-market place in the enterprise server segment. While its reach is limited, it'southward still the closest matter to a challenger that NAND has. Optane SSDs don't use NAND — they're built using not-volatile memory believed to be implemented similarly to phase-modify RAM — just they offering similar sequential performance to current NAND flash drives, albeit with ameliorate performance at depression drive queues. Bulldoze latency is as well roughly one-half of NAND flash (10 microseconds, versus 20) and vastly higher endurance (30 full bulldoze-writes per twenty-four hours, compared with x full drive writes per twenty-four hour period for a loftier-end Intel SSD).

Optane1

Intel Optane performance targets

Optane is bachelor in several bulldoze formats and in as a direct replacement for DRAM. Some of Intel'southward high-end Xeon CPUs back up multi-terabyte Optane deployments and support a mix of DRAM and Optane that provides a server with much more RAM than DRAM alone could, at the cost of higher access latencies.

One reason Optane has had problem breaking through in the consumer space is that NAND prices savage dramatically in 2019 and stayed low through 2020, making it difficult for Intel to effectively compete.

Bank check out our ExtremeTech Explains series for more in-depth coverage of today'due south hottest tech topics.

Now Read: