Last week, I interviewed Martin Baker, CEO of Digital Heaven, about their newest product: SpeedScriber. This week, I thought I’d take a more detailed look at the program.
NOTE: Listen to Martin’s interview here.
SpeedScriber is a macOS-based program that creates text transcripts from audio or video files faster than real-time. It integrates with Avid Media Composer, Adobe Premiere Pro and Apple Final Cut Pro X, as well as create stand-alone text files.
When you import or drag a media file into the SpeedScriber interface, it extracts the audio, sends it to a Cloud-based transcription service, then, shortly thereafter, displays text generated from that audio. Currently, the application supports American, British and Australian English.
Text can be edited as necessary, then printed, saved as a PDF, exported as a SRT caption file, or sent to Media Composer, Premiere Pro or Final Cut Pro X. When a new user signs up for the program, they are given a free 15 minute credit to experiment with creating transcripts. Additional minutes can be purchased using a variety of plans, ranging from $0.37 – $0.50 / minute (US).
Developer: Digital Heaven
Price: SpeedScriber is free, along with 15 minutes of initial transcription
Charges are per program minute, ranging from $0.37 – $0.50 / minute (US).
Installation is done through the Mac App Store and, like all Mac apps, installation is painless. Upon starting the application, a Welcome Overview appears that explains the product and how to use it. I like this implementation a lot – and wish that other developers would emulate it.
Here’s the workflow:
Here, for example, I’ve imported a QuickTime movie from one of my webinars. It helps to tell the transcription server how many voices it is listening for. More voices are harder to transcribe. To save time, you can add one or more files to a batch to process simultaneously.
NOTE: The program currently supports three English dialects: American, British and Australian. These can be set during import, or set as a default using a preference setting; which is illustrated here.
When a job is ready to transcribe, click the Transcribe button.
Because you are charged based on the number of minutes in your source file(s), the system confirms you want to actually send the file.
The system immediately extracts the audio file from the media and uploads it. Because audio files are far smaller than video files, extracting the audio reduces upload time. The system then displays a “Processing” status in the right panel.
When transcription is done, the status changes to “Transcribed.” To view the transcript, double-click the name of the file.
NOTE: This step is not obvious. I was looking for either the transcript to be automatically displayed or a button that says “Click Me.”
Clicking the Transcribed button menu also allows you to export the file or delete it.
NOTE: According to what Martin tells me, files are retained for a few days after transcription, then deleted. For companies paranoid about file security, check their website for other security options.
The Editorial screen is now displayed.
The first thing I did was change the Speaker’s name; because, um, why not?
Across the bottom are playback and editing controls, as well as a time display, the ability to change playback speed and a button to mark a Favorite range.
Press the space bar to play the video, on the right, which allows you to listen to the audio while the text highlights on the left. Text that is light has not been played back, while text that is dark has been reviewed.
NOTE: Click a word to display that portion of the video on the right.
To edit a word, select it, then either click the editing pencil or press the Return/Enter key. Red highlights the current word being played. Blue text can be edited.
While it would be nice to double-click a word to edit it, pressing the Enter key or clicking the Pencil isn’t really that hard; just a bit time-consuming. Adding punctuation can be done by selecting the word, then typing the punctuation; we don’t need to switch into editing mode for that.
Other keyboard shortcuts include:
Automated transcripts have several challenges: First, they don’t understand paragraphs. So everything comes back as a single block. Here, I selected where I want a paragraph to start (“Let’s”). Press Shift-Return, or Enter (on a full-size keyboard), or click the icon for the Return key in the control panel at the bottom to create a new paragraph.
Second, they don’t understand punctuation. While I was impressed at how well the system guessed where to put periods, it has no clue about commas or other punctuation. Here, for instance, I added the paragraph break and all commas. (Compare this paragraph to the source, seven screen shots earlier.)
One of the key reasons for creating a transcript is to find a specific section in a longer video.
Here, the application is excellent. Enter the text you want to find, select which speaker says it (or search across all of them). Instantly, relevant portions are displayed below the search box.
Press Enter and the system jumps to that part of the video and displays the relevant text in the transcript.
This was REALLY fast and easy to use.
IS IT ACCURATE?
There is only one answer to this question: It Depends. A lot.
If audio quality is poor, noisy, or muffled, accuracy will diminish.
Also, I’ve learned after all my years interviewing people for the Digital Production Buzz that very, very few people speak good English. We speak in sentence fragments, with lots of “ums,” “ahs,” and “you-knows.” We start a thought, change our mind in mid-sentence and never stop talking. Listening to many speakers is an exercise in processing verbal whiplash.
So, an automated system can only be as accurate as the speaker.
As well, content changes based on context. For example, each of these sentences means something different, depending upon punctuation:
Yet, if we were speaking these lines, we would probably say each of them the same way, and expect the listener to understand what we were saying based upon context.
So, not including punctuation means that accuracy is diminished, but guessing at correct punctuation is really, REALLY hard!
On the other hand, if we are only using the transcript as a reference for editing, accuracy isn’t important. As long as we can quickly find the section we are looking for, we don’t need punctuation or accuracy.
On the other, other hand, any transcript should be reviewed for accuracy. Any transcriber, human or machine, will struggle with words they don’t know. Automated transcription has problems with proper names, acronyms, homonyms (words that sound the same), and slurred speech.
For instance, look at this transcript. Are the errors here caused by my inability to speak clearly, the transcription software not understanding what I said, a lack of punctuation, or something else?
After listening to the audio, here’s the cleaned up version of the text. There were only a few incorrect words. The bulk of my time was spent in adding punctuation and appreciating that when I’m narrating a screen demo, I don’t speak in complete sentences.
So, is the system accurate? Yes, and No, and Maybe. Like I said, it depends…
SENDING TO FCP X
When you send a file to Final Cut Pro X, each sentence is sent as a separate clip. (Again, no media files are hurt in this process.)
When you display the Notes field inside FCP X, the text is included in the note.
Text in a Notes field is searchable. So, when I searched for “vignette,” it found the clip that contains that text, as well as displaying it in the Notes field for that clip.
This makes using transcripts inside FCP X REALLY useful.
NOTE: Both Premiere and Avid can import the text, but each video editor handles it differently. Visit the SpeedScriber support site to learn more about how to integrate transcripts with your video editor.
Suddenly, automated text transcripts are available from a number of vendors. But, SpeedScriber was the first. I am very impressed with its speed, overall accuracy and tight integration with key NLEs, while still supporting standalone transcripts.
If you are looking for ways to convert speech to text quickly, with reasonable accuracy, easy-to-use editing tools and multiple export options, SpeedScriber is an excellent choice.
Final Cut Pro X 10.4
Edit smarter with Larry’s brand-new webinars, all available in our store.