[ Updated Feb. 10, 2019, to bring this current with media today.]
A common question in my email revolves around a question Jerry Thompson asks:
While I am interested in performance and speed between [Thunderbolt and USB 3], I find I am not completely understanding all I need to regarding RAID technology.
Or, as Craig McKenna writes:
[I recently bought] a 120 GB external SSD with Thunderbolt, I’m wondering how you would go about organizing my media.
I’ve spent a lot of time reviewing specific storage products. In this article, I want to take a step back and discuss storage performance in general.
A RAID (Redundant Array of Inexpensive Drives) is a collection of storage devices (hard drives or SSDs) that create a pool of storage this is both very large and very fast. To the computer, and on your desktop, it looks like a single very big, very fast hard drive. Generally, a RAID stores all the drives in a single box with a single connection to the computer.
NOTE: A traditional hard drive is often called “spinning media” to distinguish it from SSDs, which don’t spin.
RAIDs can be controlled using software on your computer or hardware built into the RAID chassis. There are advantages to each. In the past, hardware controllers were the fastest option. Today, they are essentially tied for speed.
To get the best performance from a RAID, it should be attached to your Mac via Thunderbolt. Thunderbolt 2 RAIDs transfer data up to 1,400 MB/sec, while Thunderbolt 3 RAIDs transfer data up to 2,800 MB/sec. RAID performance depends upon the number of drives in the RAID. For the highest speed, use SSDs, for the lowest cost and greatest storage capacity, use spinning media.
RAIDs which are contained in servers are limited by the speed of the Ethernet connection. 1 Gb Ethernet transfers data up to 120 MB/sec. 10 Gb Ethernet transfers data up to 1,200 MB/sec, depending upon the number of drives in the server, the speed of your data switch and cabling, and how many other users are accessing the same server at the same time. In other words, server speed varies.
RAIDs are categorized into “levels,” which describe a combination of speed, redundancy, and price.
NOTE: “Redundancy” is defined as the ability to recover data in the event one, or more, hard drives dies. This won’t protect you if you erase a file.
For the purposes of this example, let’s assume each of the RAIDs below uses 4 TB spinning media drives which transfer data at 150 MB/second. (In general, a single spinning media drive transfers data between 125 – 175 MB/sec, an SSD transfers data around 400 MB/sec, and the new NVMe solid state drives transfer data around 2,800 MB/sec. Faster performance costs more.)
RAID 0 – Fast, inexpensive, but no data redundancy. Requires a minimum of two drives inside the RAID enclosure. The more drives you add, the faster the performance, as performance and storage capacity are the sum of all drives in the RAID. However, if you lose one drive, you’ve lost ALL your data. Most often used when speed combined with low cost are paramount. In our example, a 2-drive RAID 0 would have 8 TB of storage and transfer data around 300 MB/sec.
RAID 1 – Complete data redundancy. Generally only uses two hard drives inside the RAID enclosure. Often called “mirroring,” each drive is a complete copy of the other. Most often used for backing up servers or when on-set for DIT media work. Has the speed and capacity of the slowest single drive in the system. In our example, a 2-drive RAID 1 would have 4 TB of storage and transfer data around 150 MB/sec.
RAID 3 – Medium-fast, data redundancy. Requires a minimum of three drives, as one drive is reserved solely for parity data. Should one drive die, your data is safe. This technology is no longer in common use, replaced by the faster performance of RAID 4 or 5 systems.
RAID 4 – Very-fast, data redundancy. Similar to RAID 3, requires a minimum of three drives, as one drive is reserved solely for parity data. Should one drive die, your data is safe. This is the preferred RAID format for SSD drives because of how the data is stored on the drives. When compared to a RAID 5, RAID 4 with SSDs is about 25% faster on reads. Performance is based on the number of drives in the system.
RAID 5 – Very fast, data redundancy. Requires a minimum of three drives and shares parity data across all drives. Most often found with four or more drives inside. If one drive dies, your data is safe. This is the preferred choice for RAIDs containing spinning media. These are used for both locally-attached storage and servers. Performance is based on the number of drives in the system.
RAID 6 – Fast, extra data redundancy. Requires a minimum of four drives. This version protects your data in the event two hard drives die at the same time. More expensive than RAID 5, but, generally, the same physical size. Like the RAID 5 this is most often used connected to just one computer. Not as fast as a RAID 5. Performance is based on the number of drives in the system.
RAID 10 (or 1+0) – VERY fast, totally redundant. Requires a minimum of four drives, but is more often created by combining two matched RAID 0’s into a RAID 1. This provides the speed equivalent of a RAID 0, with the data redundancy of RAID 1. As RAIDs continue to drop in price, this can be a less-expensive way to create systems that rival the performance of a RAID 50. Performance is based on the number of drives in the system.
RAID 50 – VERY fast, data redundancy. Generally the domain of very large RAIDs, this format combines the speed of RAID 0 with the redundancy of RAID 5 by dividing the RAID into sections, where you can lose a drive in each section without losing data. These systems generally cost more than $10,000 and contain at least twelve drives. Generally used in network and server situations where multiple users need to access the same data.
RAID 60 – VERY fast, extra data redundancy. Generally the domain of very large RAIDs, this format combines the speed of RAID 0 with the redundancy of RAID 5 by dividing the RAID into sections, where you can lose two drives in each section without losing data. These systems generally cost more than $10,000 and contain at least twelve drives. Generally used in network and server situations where multiple users need to access the same data. Performance is based on the number of drives in the system.
NOTE: Drobo is a special case. In general, all RAIDS must use drives of the same size and speed. As well, all drives need to be installed at the time you first power up the system. Drobo, on the other hand, has invented a technology which allows you to add drives, or mix and match drives of different sizes, even after you’ve put the RAID into operation. While Drobo does not provide the fastest RAIDs, this flexibility can be a significant benefit.
SIDEBAR: HOW DATA REDUNDANCY WORKS
This is so cool… This works because all digital data is stored as either a 1 or a 0.
Imagine a 3D checkerboard — let’s make it 5 stories high. Look down on the top left square and count the number of checkers on that square for each of the top four layers.
If they total an odd number, put a checker on the same square on the bottom layer. If they total an even number, don’t put a checker on the same square on the bottom layer.
Now, remove the second layer with all it’s checkers, and put in a new, empty checkerboard to take its place. By counting the number of checkers on the remaining top three layers and comparing the total to the indicator on the bottom layer, you can exactly rebuild all the missing checkers on the second layer. For example, if the total of the other three layers is even, and there’s a checker on the bottom layer, add a checker to the new layer. If the total of the other three layers is odd, and there’s a checker on the bottom layer, don’t add a checker to the new layer.
This is exactly how RAID redundancy works. Except each checkerboard represents a hard drive in the RAID. The bottom layer, which provides data redundancy, doesn’t need to know which drive failed, it only needs to compare the totals on all the different hard disks with the total stored on the redundancy disk in the RAID. This technique works whether you have three drives – the minimum – or twenty drives. The only difference is that more drives take longer to count.
An SSD (Solid State Drive) drive is essentially RAM that has been configured to act like a regular hard disk. You copy and move files around in it the same as a hard disk. Unlike RAM, an SSD remembers your data when the power is turned off. Depending upon which version of the operating system you are using, an SSD drive ranges from “so-so” performance to blinding. Later versions of the Mac operating system do a much better job supporting SSD drives. In fact, the new APFS file system from Apple is specifically designed for SSD storage. (Here’s a link to learn more about APFS.)
The big benefit an SSD provides is speed. Its two big limitations are cost and limited storage size.
While you can put an SSD drive anywhere you can put a “normal” hard disk – which we often call “spinning media,” the best place to put an SSD drive is inside your computer as a replacement for your boot drive. SSDs in RAIDs are very fast, but also much more expensive than spinning media and they don’t hold as much.
If performance is critical, create a RAID using all SSDs. If storage capacity is most important, create a RAID using spinning media.
NOTE: There is a limitation of SSD, however, in that it only allows a certain number of read/writes before the unit starts to fail. While the overall longevity of SSD is still being determined, for now, assume that you will need to replace an SSD drive sooner than a spinning media drive – probably after 3-4 years of normal use.
iCLOUD, and other Internet services like DropBox and YouSendIt, are essentially file servers that store your files outside of your computer.
If we ignore issues like file security, these services are excellent for backing up data, sharing files between devices, and moving files between computer systems. However, they are not good for storing source media files for editing. It isn’t because they don’t store enough. Just the opposite, these services can store a vast amount of data. The problem is that the connection speed – called the “data transfer rate” – between your computer and the iCloud is too slow. Video editing requires data transfer rates far beyond anything supplied by even the fastest DSL or cable modem.
Use the Cloud for sharing, but not for storing or editing source media files.
New Cloud-based services are appearing which allow media editing in the Cloud by using proxy files. These can be helpful to remote editors, but the challenge remains in how long it takes to upload media to the Cloud.
WHAT IS THUNDERBOLT?
Thunderbolt is a method for connecting monitors and hard disks to your system. In this regard it is just like FireWire or USB – its a cable and communication protocol that move data to and from your computer and storage.
The big benefit to Thunderbolt is that it is REALLY fast! More than 2 GB/sec of data transfer speed! However, in order for that speed to be realized, you need a REALLY fast RAID. A two-drive RAID 0 won’t begin to fill a Thunderbolt “pipe.”
Thunderbolt is how you connect your drive to your computer. The speed you get will depend upon the speed of the RAID you have attached. Here are some very general expectations for data transfer:
NOTE: A single drive connected via Thunderbolt will be only marginally faster than the same drive connected via Firewire. In order to see significant performance improvement, you’ll need to use a RAID that contains at least four hard disks.
GETTING THE BEST PERFORMANCE
For best performance, I recommend purchasing your computer with an SSD as the boot drive. (Fusion Drives are a good alternative, where an SSD is combined with a spinning hard drive. This yields excellent performance with extended storage capacity.)
In general, media should not be stored on your boot drive. This means that only applications and the operating system are stored on the boot drive – along with other files that tend to be small, like email or word processing documents. If you have a large iTunes collection, or large iPhoto library, moving them to an external drive may allow better performance.
If I were setting up a new system, I would get a Mac with a SSD drive as the boot drive, and a Thunderbolt 3 RAID 5 drive for media and project files.
My current boot drive uses 148 GB to store all applications and operating system files. I have hundreds of apps which don’t take a lot of storage. So, you don’t need to get a gigantic SSD drive – 250 – 500 GB is more than sufficient. I recommend 500 GB, currently.
My media RAID, though, can’t be big enough. I currently have about 150 TB of storage spread across five RAIDs. I’ve learned that hard drives have two states: empty or full. Any new RAID will be as big as I can afford at the time.
This configuration provides a huge speed boost for the operating system and applications, while providing extremely fast access to huge amounts of media, with full redundancy in case of drive failure. This setup also offers a good balance between price and performance.
FOR MORE INFORMATION
Here is an article that explains hard disk and RAID performance and video formats in more detail. I highly recommend you read this article to understand the speeds you can expect from a storage device, how much space it takes to store media, and the data transfer rates of popular video codecs.
Final Cut Pro X 10.4
Edit smarter with Larry’s brand-new webinars, all available in our store.