[Updated Aug. 8, 2016, to more clearly explain the differences in size between ProRes and H.264.]
I get a lot of questions asking why ProRes files are so big, compared to, say, camera-native H.264 files. While it is true that there is a difference in bit-depth between the files – H.264 is 8-bit, while ProRes is 10-bit – this only accounts for a small portion of the difference. A much bigger reason for the difference in size is that the two files are compressed differently. And, in this compression difference, lies the explanation of why I-frame files are “more efficient” to edit than GOP files are.
Each codec determines whether it will use I-frame, or some form of GOP, compression. This is hardwired into each codec and can’t be changed. So an understanding of the difference can help us make wiser choices about the codecs we use.
GO BACK INTO HISTORY
Go back into the dim reaches of history, back to a time before cell phones, and you may remember a visual recording technology called “film.” (Those of you too young to remember a time before cell phones, ask your grandparents to describe it to you.)
As the illustration above indicates, film recorded each image complete and intact. In fact, if you held a piece of film up to the light, you could see a series of images extending along the film.
Film is the perfect example of recording I-frames. Each image complete. And, if we were to poke a hole in one image, that hole would not impact any of the frames around it.
Now, move forward in time to video tape. Again, as a video tape recorder was capturing images, it was doing so one complete image at a time, though we could no longer see them by holding a piece of video tape in front of a light, because the images were recorded magnetically, not optically.
NOTE: I remember, when I first worked in broadcast television, that video tape recorders did not have the ability to edit. Instead, we had to cut the video tape with a razor blade and scotch-tape the two pieces together. To determine where to cut the tape, we painted the back-side of the tape with a purple solution using a fingernail brush. That purple solution would be attracted to the magnetic lines of force on the tape and, by reading the lines with a microscope, we could determine the start of a frame and, hence, the best place to make the cut.
For both film and video-tape, the entire image was recorded intact, where each frame was not affected by the frames around it.
SHIFT INTO DIGITAL
The good news about film and video tape was that each image was complete and at as high a quality as the recording mechanism could support. The problem, however, was that these files were huge.
NOTE: A one-hour broadcast television program, required 4,800 linear feet of video tape when recorded on either 1″ helical or 2″ quad tape. The 2″ tapes alone weighed close to 30 pounds.
Most digital devices, up until very recently, didn’t have the bandwidth to record even standard-definition media at I-frame speeds. It wasn’t until hard disks were invented that could attach directly to the camera that we had the bandwidth necessary to record uncompressed source images.
NOTE: Uncompressed SD video requires about 35 MB/second for record or playback. While “uncompressed” means many different things in HD, a round number for uncompressed 1080p HD is about 210 MB/second.
To solve the problem of reduced bandwidth, which exists both when recording in the camera or distributing the finished video on the web, a new compression scheme needed to be invented: GOP compression.
THINK ABOUT A CHESS MATCH
To understand GOP (Group of Pictures) compression, it helps to think about a chess match. A chess board always has 64 squares (8 columns and 8 rows). It always has the same pieces, which are always set up the same way on the chess board at the start.
If I wanted to show you a chess match between two grandmasters, I could take a picture of the board after every move. This is what I-frame recording does. Works perfectly, but takes a lot of space to show all those images.
Or, I could take a picture of the board at the very start, so you can see how the pieces are set, then simply describe each individual move. This is what GOP compression does, it only tracks the changes from one frame to the next.
Because we know the starting position of every piece, by following the changes, we can precisely follow the match; provided we follow the match from the very beginning.
The problem with only indicating the changes is that if we join the match in the middle, we are totally lost because reading the changes does not provide the context we need to understand the entire picture. Unless we know the starting image, just reading changes is not sufficient to create the entire image.
To see the entire picture of the game, we need to go back to the beginning, then apply each of the changes until we are caught up.
GOP compression creates very small files, but if we join a file in the middle of a GOP, we can’t see the entire picture.
In simple terms, GOP compression divides a video clip into groups of 7 or 15 frames; 7 for PAL and 15 for NTSC.
NOTE: In this illustration, I’m using a 12-frame GOP just to make the illustration easier to read. There are several different lengths to GOPs, but the structure is the same.
Each group starts with an I-frame, which, you recall, is a complete picture.
Next is a “B” frame, a bi-directional predictive frame, that just lists the changes that have occurred in the image from frame 1 to frame 2.
NOTE: You could think of a B-frame as a word processing document, it simply lists how the pixels have shifted between the first frame and the next. A B-frame, on its own, is not an image, it is a highly-compressible text file of pixel changes.
The next frame is also a “B” frame, again, listing changes between frame 2 and frame 3.
Next, comes a “P” frame, a “predictive” frame, which is used to make sure that in listing these changes, we haven’t gone too far off track. Each “P” frame looks back to the “I” frame and forward to the next “I” frame and lists the changes that have occurred between them.
This alternation of “B” and “P” frames repeats until the end of the group is reached. Then, a new group starts with a new I-frame and the entire process repeats.
One complete image, the “I” frame, is followed by a lot of frames simply listing changes, “B” and “P” frames.
THE DIFFERENCE IN FILE SIZE
All versions of ProRes using I-frame compression. All versions of H.264 use GOP compression. It is this difference that explains why H.264 is so much smaller than ProRes.
Granted, other compression factors also factor in here. For example, even though ProRes compresses each frame individually, different versions of ProRes apply different amounts of compression, which is why ProRes Proxy is much smaller than ProRes 4444. My goal here is not to explain all the different levels of compression that can be applied to a video clip but, rather, the differences between I-frame and GOP compression.
Because the file size differences can be dramatic. For example, in ProRes, a 5-second tripod shot at 30 fps where nothing is moving in the frame generates 150 complete images. That same shot in H.264 creates a single image along with a small text file that says “repeat this frame 150 times.” Essentially, the H.264 shot is 150 times smaller than the ProRes shot.
Or, in another example, a 5 second shot where the camera is locked on a tripod and one person walks through the frame, creates 150 complete images in ProRes, but in H.264 only the 20% of each frame that changes as the actor walks through needs to be reflected in the “B” and “P” frames. The ProRes version generates 150 frames, while the H.264 version only creates the equivalent of 30 new frames (20% of the shot). Again, much smaller files using H.264.
This is why it is impossible to predict how big an H.264 file will be prior to compression. The file size changes based upon the amount of movement between frames. If there is no movement – compressing a PowerPoint file filled with stationary slides, for example – the compressed file will be microscopic. If there is lots of movement – a dance recital shot with a hand-held camera – the file will be much, much bigger because the “B” and “P” frames need to account for so much more movement between frames.
THE PLAYBACK CHALLENGE
Because of the way H.264 uses “change documents” for its compression, this presents a significant challenge for playback. Like our chess match, if we start our video at the beginning, GOP compression is very efficient. The files are small, the changes are applied incrementally and in order and we can clearly see the moving image as it evolves over time.
However, again like our chess match, if we join a GOP in the middle, say by moving our playhead into the middle of a clip, we can’t see any image until, behind the scenes, our editing software goes back to the nearest “I” frame and reconstructs all the changes that occurred until we reach the position of the playhead.
With an I-frame video, no reconstruction is necessary, as each I-frame image is complete. Wherever the playhead stops, it can instantly display the complete image.
Matters get much more complicated as we start to place GOP-compressed video on higher layers/tracks, cut clips based on content not I-frames, stack multicam clips on top of each other where the I-frames don’t all occur at the same time.
Behind the scenes, every time we move the playhead, the NLE needs to find the nearest I-frame, and apply the changes until it reaches the position of the playhead.
Worse, when we create an edit in the middle of a GOP, the entire GOP structure needs to be rebuilt, because a GOP MUST start with an I-frame, otherwise simply listing the changes won’t make any sense.
This is the reason older computers, such as Mac Pros, have a very hard time playing H-264 video because they can’t solve the GOP compression fast enough to make editing feel smooth. There’s just too much math and not enough CPU horsepower.
GOP compression isn’t “bad,” anymore than I-frame compression is “good.” As with most things in video, good or bad depend upon what you are trying to do.
Without GOP compression, we couldn’t record video using a DSLR camera. Without GOP compression, YouTube videos wouldn’t exist.
But, I-frame video will always have better image quality, edit more smoothly, export faster, and play on slower devices.
Final Cut Pro X 10.4
Edit smarter with Larry’s brand-new webinars, all available in our store.