Why Digital Files Are Always (Slightly) Bigger Than You Expect

Computers like everything neat, consistent, and organized. People? Not so much. While our personal lives may be a bit disheveled, the underlying architecture of our computers is a model of organization.

Here’s something I finally understood last week about computer storage that I want to share with you.

This is the key thought: The smallest increment of computer storage organization is a “block.” (While data is stored as bits, it is organized into a block.) The only data that can occupy a block is data from a single file. If the data from that file does not fill the block, the rest of the block is left empty.

This means that, in most cases, the size of a file stored on disk (SSD or HDD) is almost always slightly larger than the actual file itself.

NOTE: Before you start to worry, there’s nothing wrong with this, it is just how computers work.

Let’s use this graphic to illustrate. Each gray square represents 1 KB of storage. (Remember, 1 KB equals 1024 bytes or 8,192 bits.) While data is stored as bits, it is organized into blocks.

With APFS (the file storage system used by all Macs with an internal SSD), that block size is 4 KB, which is represented as a darker block of four squares.

This image illustrates four files stored on an APFS volume. By definition, all files are stored starting at the beginning of a block.

NOTE: The lines are arbitrary, I used them to illustrate different file sizes. Data can be stored in any block in any location. Blocks for the same file do not need to be next to each other. The disk directory tracks which files are stored in which blocks and in which order.

WHAT DOES IT ALL MEAN?

Well, first, there’s nothing nefarious about this. Computers need to organize their workspace in ways that enable them to function. But I’ve often puzzled why a file that is very small – like a text document – can take a lot more space to store. The answer is “block size.”

This insight came about this week when a reader asked me to defend a statement I made about block sizes for RAID volumes. So, I contacted Tim Standing, VP of Software Engineering for OWC, for clarification.

NOTE: This article covers the details of HFS+ volumes and RAIDs, if you are interested in learning more. It does not require a degree in engineering to read.

EXTRA CREDIT

HFS+ uses variable block sizes, depending upon the size of the HFS+ volume. For HFS+, the block size starts out at 4 KB and then grows by multiples of 2 as the volume gets bigger. For example, the block size changes to 8 KB blocks for HFS+ volumes larger than 17.5 TB, and changes again to 16 KB blocks for volumes above 35 TB, etc.

RAIDs use block sizes which vary depending upon the operating system, the RAID format and the number of drives in the RAID.


Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

Larry Recommends:

FCPX Complete

NEW & Updated!

Edit smarter with Larry’s latest training, all available in our store.

Access over 1,900 on-demand video editing courses. Become a member of our Video Training Library today!

JOIN NOW

Subscribe to Larry's FREE weekly newsletter and save 10%
on your first purchase.