For the last several weeks, I’ve explored just how fast our storage actually is. I looked at single hard drives, servers and, now SSDs and RAIDs. What I discovered is that storage “performance,” that is the speed our storage transfers data, is surprisingly variable and complex.
NOTE: For example, this article describes the challenges in measuring storage performance using Blackmagic Design vs. AJA measurement tools. It is currently not possible to know with any assurance that the test results we see are accurate.
To learn more, I emailed a series of questions to Tim Standing, VP of Software Development at OWC. Here is our conversation.
At the start of his interview, Tim noted that when he discusses speed numbers in this conversation, he’s using AJA System Test with a 64 GB test file, file system cache disabled and 16 bit RGBA codec. He is using the full version of AJA System Test and not the Lite version available on Apple’s Mac App Store.
Follow-up comment from Larry: The AJA System Test (full version) is available from the AJA website. When I read Tim’s comment, I went back and re-ran all my tests using AJA System Test (full version). The results were mixed. I sent them to OWC, BMD and AJA for evaluation and all three companies are looking into this further.
Tim Responds: Unlike most video professionals, I tend to use Macs with smaller amounts of RAM (e.g. 8 GB). If I use a Mac with this size RAM, I know that there is no more than 4 – 6 GB available for file system cache. If I then run AJA System Test with a 64 GB test file, I know for certain that the file cannot reside entirely in the macOS file system cache, there just isn’t enough physical RAM.
In addition, many NVMe blades use DRAM caches or SLC flash as a type of cache (SLC flash is much faster than TLC). The size of these caches is usually 2 – 4 GB. So with a RAID 0 volume using 4 of these blades, the first 8 – 16 GB might be going into the cache on the blade. By using a 64 GB test file in AJA System Test, I can ensure that I are not just writing to the file system cache or the faster cache chips on the NVMe blades.
Larry replies: Tim, I plan to redo all my tests when I publish an update to my speed testing article. At that time, I’ll run it using using the largest files that BMD and AJA support. However, BMD only supports up to 5 GB and AJA only supports up to 16 GB test files for ProRes media, which is the most appropriate format for most video editors.
I just did a quick test and, using AJA System Test (full), RAID 4 was about 16% faster than RAID 5 for both read and write.
Larry: The speeds measured by AJA System Test Lite and Blackmagic Disc Speed Tool vary by HUNDREDS of MB/second. Which does OWC believe is accurate?
Tim: I have always avoided using Blackmagic Disk Speed Tool. I think it is really great if you want to see whether your particular format will work on a given volume, but for reliable testing, it is too limited.
I find that it is much harder to get reproducible values from BlackMagic Disk Speed Tool than from AJA System Test. It also lacks the graphing which I have found invaluable for determining if there is an architectural problem in the hardware or software of an OWC storage solution.
In addition, many times when we create a faster storage array we bump up against the maximum speed the tools can handle. AJA revs usually produces a new version in a couple of days and BlackMagic seems to take a month or more. Our current maximum speed measured on macOS is with 2 Accelsior 8M2s in a 2019 Mac Pro. We see 17 GB/sec read and write on a 16-blade RAID 0 volume.
Follow-up comment from Larry: The Accelsior is a plug-in PCIe card that only supports the 2019 Mac Pro for Mac systems. All other Macs require a Thunderbolt solution, which is much slower.
OWC Thunderblade is a 4-blade (a “blade” is the SSD equivalent of a hard drive) NVMe SSD RAID. It can be formatted in a variety of configurations: JBOD, RAID 0, 1, 4, 5, and 1+0. It connects via Thunderbolt 3/4 and has an external power supply.
OWC Envoy Pro is a single NVMe SSD blade that connects via Thunderbolt 3/4 and is bus powered.
Larry: When measured by BMD, write speeds for a 4-drive RAID 4 is ONE-THIRD the speed of a 4-drive RAID 0. This is surprisingly slow! How come?
Tim: I don’t know about this. What is it with AJA System Test?
Follow-up comment from Larry: In additional testing after sending my questions to OWC, AJA System Test (full version) reported RAID 4 speeds at 1/3 the expected throughput for writes and 1/2 expected throughput for Reads. OWC is investigating the problem.
Larry continues: As well, I found BMD Disk Speed Test to deliver much more consistent results. Both BMD and AJA are investigating apparent testing errors in their programs. I’ll report back on what they say in a future article.
Tim: Our tests on a Mac Studio show 2.6 GB/sec read and 2.7 GB/sec write on RAID 0. On RAID 5, we see 2.0 GB/sec for both read and write
Follow-up comment from Larry: However, from what I’ve been told, RAID 5 is optimized for HDDs, while RAID 4 is optimized for SSDs. RAID 5 requires reading and discarding parity data from all four blades, while RAID 4 simply ignores the single parity drive. Theoretically, RAID 5 is slower than RAID 4 simply due to all this data processing. Are you recommending RAID 5 formatting for Thunderblade?
Tim replies: It is true that reading from RAID 4 is faster than RAID 5 in some situations (when the sum of the throughput from each individual blade, or disk, is greater than the throughput of the wire they are connected on). This speed differential can occur with HDDs as well as SDDs as long as the above conditions are met (e.g. RAID 4 volumes are faster than RAID 5 with 4 HDDs connected over USB 3).
Tim continues: Many studios have data handling regulations, much like Netflix requirements for their productions. These requirements often require RAID 5 volumes, not because they are better than RAID 4 but because RAID 5 is what the studio’s engineering department has tested and is confident in.
Larry: Why is read speed so slow (2100 MB/s) compared to write (2800 MB/s) when configured as a RAID 0?
Tim: I don’t have an answer for this. It is not what my team has observed in our testing with production ThunderBlades. I will let our team search deeper to find the explanation for the performance you are observing.
Historically I associate slow read speeds with TRIM being disabled. I don’t know if that is the cause of what you are seeing or not.
Follow-up comment from Larry: When formatting drives using SoftRAID, TRIM is enabled by default. TRIM was on when I ran these tests.
Larry: Based on OWC’s experience, how often does an SSD fail? In other words, how much risk are we taking configuring this as RAID 0?
Tim: We have a server in the office which I finally stopped using when the SSD said 80,000 hours. The SSD was still perfectly fine. I always retire HDDs when they have reached 20,000 hours or more.
I would feel comfortable using an SSD for 3 – 4 years of heavy use before replacing it. This is based on anecdotal evidence not on any studies we have performed or data we have collected from customers.
[Also,] are you following the 3, 2, 1 backup plan? Do you have a reasonable backup policy for your work in progress? If the answer to both things is yes, then you are fine with RAID 0.
[Backups are important.] I was a volunteer firefighter for several years and saw several buildings reduced completely to ashes. It greatly affected how I think about data protection. A couple of times a year, I imagine that everything in one building disappears (either my home or my office) and I ask myself how much work could I lose. If it is more than 4 days, I start to worry.
Larry: What causes the speed variations displayed between multiple tests for the same configuration?
Tim: Each of the blades is a tiny little mini [computer] that controls which blocks of flash [memory] need to be erased and where to write new data which is coming to it from the computer. You might start writing data at just fractionally the wrong time and your write might have to wait a bit or your data might be fragmented enough to require reading from several different flash cells. These slight differences add up and can lead to variability between each test. It’s even more complicated than this because each NVMe blade actually works like a RAID 0 volume and stripes data across multiple chips each time you read or write to the single blade.
We have found that always starting with a freshly created volume and using AJA System Test really helps reproducibility. We also repeat all tests 5 times and use the average in our analysis. (All the numbers I quote here are the average of 5 independent tests.)
Follow-up comment from Larry: In the case of my tests, I started with a newly-formatted RAID drive and repeated each test five times.
Larry: Thunderblade is a RAID composed of four NVMe SSDs. While NVMe provides blinding speeds, my tests show that this unit doesn’t fully saturate a Thunderbolt port using a 16″ M1 MacBook Pro. What speed limits does this device have?
Tim: The Thunderbolt spec says that the maximum transfer speed is 2.8 GB/sec for data transfer to storage. With RAID 0, the NVMe driver has to communicate with 4 individual NVMe devices, the SoftRAID driver has to determine which parts of each file go to which blade, and the APFS file system has to determine you have the correct permission to read or write the data you are requesting. With volumes on a Mac Studio, we see 2.7 GB/sec. On slower computers, software runs slower so the overhead of the file system, the NVMe driver, and SoftRAID will reduce this [transfer speed] somewhat.
Larry: I want to partition this into a single NVMe for Time Machine backups, then use the remaining 3 cards as a RAID 0 for data storage. Since both volumes use NVMe SSDs, will I get the same performance out of both volumes?
Tim: In the ThunderBlade each individual blade is limited to 1 GB/sec. This was done for thermal reasons to keep the generated heat down and allow it to run all day long flat out on a hot location shoot without overheating.
Your Time Machine volume will be limited to 1 GB/sec and your 3 drive RAID 0 volume should go at over 2 GB/sec.
Follow-up comment from Larry: Using AJA System Test (full version), write speeds are around 1500 MB/s, while read speeds are around 1700 MB/s. Given three striped drives running at 1 GB/s, I would expect speeds closer to 2,800 MB/s. Again, OWC is looking into this.
Larry: How is the Thunderblade formatted: HFS+ or APFS?
Tim: The ThunderBlade is formatted by us during manufacturing as a RAID 0 APFS volume. If you attempt to format it as HFS+ at a later time with the SoftRAID application, you will see a warning dialog box suggesting that you want to use APFS and telling you why. APFS is really the future of file systems for macOS, it is much more resilient and takes much better advantage of the properties of flash memory storage.
Unfortunately, APFS has a huge performance problem with HDDs. I have been pushing the Apple file system team to fix this since the fall of 2017 when I first determined how bad HDD performance is with APFS. They still don’t have any interest in addressing the problem.
Larry: My Thunderblade currently holds four 2 TB NVMe cards. If I want to upgrade these later to, say, 4 TB when prices come down, can I do so? The case looks sealed.
Tim: ThunderBlade is not user serviceable. It’s fanless design requires that we use a thermal pad to conduct heat away from the chips to the top part of the case which acts like a giant heat sink. This allows us to have an entirely
fanless design.
This thermal pad is a ~3mm thick piece of flexible material which is sticky on both sides, allowing it to make great thermal contact with both the top of the chips and the underside of the aluminum case. If you open the case and are too aggressive in separating the pad from the chips, you will inadvertently flex the blades.
The flash chips on the blades have an unbelievable number of layers (96 for the current generation of chips), allowing them to pack an incredible density of storage into a small space. So when these chips get flexed, something mechanical happens to them and they can no longer reliably hold data. They can read and write just fine it’s just that they don’t always read the same data that was written. More troubling is that the rate of data corruption is about 1 – 4 bytes out of each 100 GB written, at least in the examples I have seen.
It took us quite a while to track this problem down, but now we know what to look for. In every case when we encounter a ThunderBlade which corrupts data, it has always had its blades swapped, usually by the same individual in our office :^).
So if you want to swap blades, we recommend our Express 4M2, a small enclosure with no thermal pad which relies on a fan to keep it cool. You can swap blades all you want without having to deal with a thermal pad or the threat of bending the blades.
Larry: What do we do if an NVMe card dies? Can this be repaired by the user or do we need to send the system back to OWC?
Tim: If it is in an Express 4M2, it is user serviceable. For ThunderBlades, Accelsior 4M2 and Accelsior 8M2s, it should be sent back to OWC. See my discussion above.
Larry: Why is SoftRAID necessary for this device?
Tim: SoftRAID is not required for the ThunderBlade. You can format it just fine as an AppleRAID stripe (RAID 0) volume. There are several potential problems you will run into however:
Follow-up comment from Larry: Again, to raise an earlier point, RAID 5 requires reading and discarding parity data from all four blades, while RAID 4 simply ignores the single parity drive. Theoretically, RAID 5 is slower than RAID 4 simply due to all this data processing. Is OWC recommending formatting SSDs RAIDs as RAID 5?
Tim replies: We recommend RAID 4 for fault tolerant storage with NVMe blades for the reason you mentioned above.
Larry: The Thunderblade supports loop-through Thunderbolt connections for up to a total of six devices. If I combine multiple Thunderblades, do I get faster transfer speeds or greater storage capacity?
Tim: I know that both Apple and Intel say 6 devices on a chain, but I have yet to get that to work reliably. You might be able to get it to work with a Thunderbolt Hub or our Thunderbolt Dock, but I am still skeptical. I usually limit it to three devices (not including the Mac) on each the Mac’s Thunderbolt ports.
As far as speed is concerned, the spec for Thunderbolt 3, with short passive cables (0.7 m or less) or active cables is a maximum of 2.8 GB/sec for [all] storage devices. In reality, I don’t think I’ve ever seen anything over 2.7 using AJA System Test to a RAID volume on a single Thunderbolt port. With a ThunderBlade, I often see 2.4 GB/sec (my test machine for driver development is a 2018 Mac mini).
If you have 2 ThunderBlades on one port, even if they are connected through a Thunderbolt Hub or Dock, you are sacrificing performance as the port limits you to that 2.8 GB/sec theoretical maximum. If you want more storage capacity however, you can go ahead and daisy chain 3 ThunderBlades together and create a much larger volume.
Remember that on Intel Macs, each pair of ports share a single Thunderbolt router chip so they share the same 2.8 GB/sec of performance. So if you make a RAID 0 volume with 2 ThunderBlades plugged into ports right next to each other, you will still see a maximum performance of 2.6 GB/sec or so.
With ARM based Macs (i.e. Apple silicon), each Thunderbolt port is truly independent. If you attach 2 ThunderBlades to adjacent ports, and create a RAID 0 volume, you should see greater than 4 GB/sec performance with AJA System Test. You can take this even further with a Mac Studio where you can connect 4 ThunderBlades, each to a different port and get much higher performance.
So if you really want high performance, you can attach each ThunderBlade to a different port on a Mac Studio. When you do this performance scales very well for RAID 0 volumes. On a Mac Studio, a 4 blade RAID 0 will read and write at 2.7 and 2.7 GB/sec. A 16-blade RAID 0, where each ThunderBlade is on a different port will give you 9.5 GB/sec reads and over 10 GB/sec writing.
The move to ARM-based Macs gets even more compelling with RAID 4 or 5 volumes. With Intel Macs, even the 2019 Mac Pros totally maxed out, we never see more than 1.7 GB/sec for writing to a RAID 5 volume, even with 4 ThunderBlades (16 blades) on separate ports. The performance differs very little between 4 blades and 16 blades for writing.
With ARM based Macs, the write performance for RAID 5 volumes scales much better with the number of blades on a Mac Studio, a 4-blade RAID 5 volume reads and writes at 2 GB/sec and a 16-blade one reads at 9 GB/sec and writes at 5 GB/sec.
Larry: Why is the 0.7m Thunderbolt cable so short, what limits cable length?
Tim: Thunderbolt cables come in 2 types, active and passive. Active cables correct for signal quality problems each time they are plugged in by tuning the characteristics of the plug to match the actual piece of wire in the cable. Each piece of wire behaves slightly different, they look the same to you and I but the signal travels just a little bit differently. This is sort of like the wood grain on each toothpick, the toothpicks are all the same dimension and shape but the wood grain is slightly different.
In addition, the electrical characteristic of each socket is slightly different. So the plug on each end of an active cables tunes the characteristics of the cable to make the cable more perfectly transmit the analog wave form which carries the digital 0s and 1s. This allows active cables to communicate at full speed, 40 GB/sec, even if they are 2 m long.
Passive cables don’t do anything to tune the cable. Intel and Apple have determined that passive cables can only communicate at 40 GB/sec (which allows 2.8 GB/sec transfers to storage devices) if they are 0.7 m or less in length. If they are longer than that, they must operate at the slower 20 GB/sec speed instead resulting in a maximum data throughput to storage devices of 1.4 GB/sec.
Larry: If data transfer speed is more important than capacity, are we better off with a Thunderblade or a single NVMe device like the Envoy Pro?
Tim: I feel safer with a ThunderBlade than with an Envoy Pro for storage. It has better heat management and is designed to be really abused in terms of sustained data transfer for hours or days at a time. We are currently testing ThunderBlades which have been running 24/7 for 4+ weeks as part of a new driver qualification. This test has been running non-stop with RAID 0 and 5 volumes on 5 separate Macs. We have seen only 2 problems in over 4 weeks of testing. (Our tests run with multiple threads reading and writing data and all checked for data corruption after it is read.)
If portability, size of a device or not requiring a power cable are considerations, an Envoy Pro is an excellent choice.
Larry: What can we do to make the ThunderBlade as fast as possible?
Tim: Connect it to a Mac Studio. What’s important to remember here is the “soft” part of SoftRAID. It is software, and like all software, the more powerful your CPU, the better you will be. I doubt if you could get close to the Mac Studio numbers, I listed above with a 13-inch Mac Book Pro. I know I don’t with my 2018 Mac mini.
Follow-up comment from Larry: If buying a faster computer increases the speed of our storage, should we buy more CPU cores or GPU cores?
Tim replies: The SoftRAID driver does not make use of the GPU cores. The driver processes data in 4 – 16 MB chunks and the cost of setting up a GPU for such a small amount of data is too great to make it increase speed. There are other architectural issues which make [using GPUs] a less than ideal solution.
The things which affect SoftRAID driver write performance on RAID 4 and 5 volumes are exactly what you expect for performing integer calculations on many blocks of data since calculating parity data is similar to many simple addition operations.
So it’s no surprise that the characteristics which make a increase the speed of RAID 4 and 5 writes are are:
Larry: Does the speed of our computer affect the speed of our storage if we are NOT using SoftRAID?
Tim replies: On the speed front, I would expect the speed of an Envoy Pro to be a little faster on a faster computer, but not massively so. I asked one of our testers to test with an Envoy Pro SX and an Electron on a 2018 Mac mini and a Mac Studio and get back to you. You have piqued my curiosity.
Larry: Are data transfer rates the same on Windows as on the Mac? If not, what are the differences?
Tim: The architecture of the Windows and Mac drivers are different. I believe the transfer rates, as measured by AJA System Test, are similar if you use a RAID 0 volume with a large stripe unit size. For RAID 4 or 5 volumes or small stripe unit sizes, the performance will be different.
We are having our QA team gather fresh data on transfers from ThunderBlades to both Mac and Windows. We will have that information ready on Monday.
One interesting thing to note is that our Accelsior 8M2 is actually an PCI Gen 4 card. We did this to future-proof the design. Even though Apple only ships a Mac with PCI Gen 3, the 2019 Mac Pro, they will be shipping a Mac with PCI Gen 4 in the near future. When they do, the Accelsior 8M2 will be able to take advantage of the faster performance.
On Windows, with RAID 0 volumes, we see almost double the performance when using a Accelsior 8M2 and going from a PCI Gen 3 computer to one with PCI Gen 4. I think the current speed record on a PCI Gen 4 Windows computer with a single
Accelsior 8M2 is over 20 GB/sec.
Larry: The OWC website says the Thunderblade can attain speeds approaching 5,000 MB/second. What can we do to achieve that?
Tim: Connect it to a Mac Studio. What’s important [is that] you will need 2 or more ThunderBays connected to separate Thunderbolt ports to get this performance.
Follow-up comment from Larry: It seems somewhat disingenuous to promote a transfer speed on your website that requires buying two Thunderblades and running them on the fastest available computer.
Tim replies: I’ll let someone in marketing answer this one, I just write the code.
Tim: I have spent the past 20 years, actually 22, working on the SoftRAID driver and I won’t be stopping anytime soon. I am always looking for ways to enhance the speed and reliability of our driver. I can’t think of many other pieces of Mac software which have been around in the same form for 2 decades and are still relevant. It is a testament to our commitment to continued testing, bug fixes and to making sure the driver is updated to work correctly with each new major release of macOS.
We have just finished a round of reliability testing with 5 Macs (both ARM and Intel), each with 8 NVMe or 8 HDDs. We tested both RAID 0 and RAID 5 volumes, with Apple’s internal test tools and our own internal tools, flat out 24/7 for 4 weeks. During that time, we uncovered 2 hangs, which are fixed in SoftRAID 7.0.1, one bug in Apple’s test tools and two in our internal ones, and a hang which occurred once. We are now testing to confirm the hang is fixed in SoftRAID 7.0.1.
Larry: Tim, thanks for taking the time to answer my questions. Performance is a complex subject and I appreciate your helping to educate us.
4 Responses to Tips to Maximize the Speed of SoftRAID and SSD RAIDs for Macs
Wow, this is a great article and very insightful. I tried using SoftRAID in the past and abandoned it but it was probably user error. We’ve had an 8TB Thunderblade for a few years now and it is hella fast. The tips and tech explanations, such as cable lengths and RAID 4 vs 5, will be really useful in the near future as we switch to Thunderblades and Mac Studios.
Eric:
Yay! I’m glad you liked it. I always learn a
lot when talking with Tim
Larry
What a superb article packed with information! Thank you so much. It is just what I needed because I have a Mac Studio Ultra with 48 cores and I have 2 4M2 Express enclosures (which I love!) and I was trying to decide if I should use my Softraid or just Apple Disc Utility raid but now I see the Softraid offers much more. The point about multithreading also sold me. You guys are great and so helpful.
Don:
I’m glad you like it. I’ll pass your kind words back to Tim as well. Happy to help.
Larry