First Look: SpeedScriber, from Digital Heaven

Posted on by Larry

Last week, I interviewed Martin Baker, CEO of Digital Heaven, about their newest product: SpeedScriber. This week, I thought I’d take a more detailed look at the program.

NOTE: Listen to Martin’s interview here.

EXECUTIVE SUMMARY

SpeedScriber is a macOS-based program that creates text transcripts from audio or video files faster than real-time. It integrates with Avid Media Composer, Adobe Premiere Pro and Apple Final Cut Pro X, as well as create stand-alone text files.

When you import or drag a media file into the SpeedScriber interface, it extracts the audio, sends it to a Cloud-based transcription service, then, shortly thereafter, displays text generated from that audio. Currently, the application supports American, British and Australian English.

Text can be edited as necessary, then printed, saved as a PDF, exported as a SRT caption file, or sent to Media Composer, Premiere Pro or Final Cut Pro X. When a new user signs up for the program, they are given a free 15 minute credit to experiment with creating  transcripts. Additional minutes can be purchased using a variety of plans, ranging from $0.37 – $0.50 / minute (US).

Developer: Digital Heaven
Website: www.speedscriber.com
Price: SpeedScriber is free, along with 15 minutes of initial transcription
Charges are per program minute, ranging from $0.37 – $0.50 / minute (US).

INSTALLATION

Installation is done through the Mac App Store and, like all Mac apps, installation is painless. Upon starting the application, a Welcome Overview appears that explains the product and how to use it. I like this implementation a lot – and wish that other developers would emulate it.

OPERATION

Here’s the workflow:

Here, for example, I’ve imported a QuickTime movie from one of my webinars. It helps to tell the transcription server how many voices it is listening for. More voices are harder to transcribe. To save time, you can add one or more files to a batch to process simultaneously.

NOTE: The program currently supports three English dialects: American, British and Australian. These can be set during import, or set as a default using a preference setting; which is illustrated here.

When a job is ready to transcribe, click the Transcribe button.

Because you are charged based on the number of minutes in your source file(s), the system confirms you want to actually send the file.

The system immediately extracts the audio file from the media and uploads it. Because audio files are far smaller than video files, extracting the audio reduces upload time. The system then displays a “Processing” status in the right panel.

When transcription is done, the status changes to “Transcribed.” To view the transcript, double-click the name of the file.

NOTE: This step is not obvious. I was looking for either the transcript to be automatically displayed or a button that says “Click Me.”

Clicking the Transcribed button menu also allows you to export the file or delete it.

NOTE: According to what Martin tells me, files are retained for a few days after transcription, then deleted. For companies paranoid about file security, check their website for other security options.


(Click to display larger image.)

The Editorial screen is now displayed.

The first thing I did was change the Speaker’s name; because, um, why not?

Across the bottom are playback and editing controls, as well as a time display, the ability to change playback speed and a button to mark a Favorite range.


(Click to display larger image.)

Press the space bar to play the video, on the right, which allows you to listen to the audio while the text highlights on the left. Text that is light has not been played back, while text that is dark has been reviewed.


(Click to display larger image.)

NOTE: Click a word to display that portion of the video on the right.

EDITING

To edit a word, select it, then either click the editing pencil or press the Return/Enter key. Red highlights the current word being played. Blue text can be edited.

While it would be nice to double-click a word to edit it, pressing the Enter key or clicking the Pencil isn’t really that hard; just a bit time-consuming. Adding punctuation can be done by selecting the word, then typing the punctuation; we don’t need to switch into editing mode for that.

Other keyboard shortcuts include:

Automated transcripts have several challenges: First, they don’t understand paragraphs. So everything comes back as a single block. Here, I selected where I want a paragraph to start (“Let’s”). Press Shift-Return, or Enter (on a full-size keyboard), or click the icon for the Return key in the control panel at the bottom to create a new paragraph.

Second, they don’t understand punctuation. While I was impressed at how well the system guessed where to put periods, it has no clue about commas or other punctuation. Here, for instance, I added the paragraph break and all commas. (Compare this paragraph to the source, seven screen shots earlier.)

SEARCH

One of the key reasons for creating a transcript is to find a specific section in a longer video.

Here, the application is excellent. Enter the text you want to find, select which speaker says it (or search across all of them). Instantly, relevant portions are displayed below the search box.


(Click to display larger image.)

Press Enter and the system jumps to that part of the video and displays the relevant text in the transcript.

This was REALLY fast and easy to use.

IS IT ACCURATE?

There is only one answer to this question: It Depends. A lot.

If audio quality is poor, noisy, or muffled, accuracy will diminish.

Also, I’ve learned after all my years interviewing people for the Digital Production Buzz that very, very few people speak good English. We speak in sentence fragments, with lots of “ums,” “ahs,” and “you-knows.” We start a thought, change our mind in mid-sentence and never stop talking. Listening to many speakers is an exercise in processing verbal whiplash.

So, an automated system can only be as accurate as the speaker.

As well, content changes based on context. For example, each of these sentences means something different, depending upon punctuation:

Yet, if we were speaking these lines, we would probably say each of them the same way, and expect the listener to understand what we were saying based upon context.

So, not including punctuation means that accuracy is diminished, but guessing at correct punctuation is really, REALLY hard!

On the other hand, if we are only using the transcript as a reference for editing, accuracy isn’t important. As long as we can quickly find the section we are looking for, we don’t need punctuation or accuracy.

On the other, other hand, any transcript should be reviewed for accuracy. Any transcriber, human or machine, will struggle with words they don’t know. Automated transcription has problems with proper names, acronyms, homonyms (words that sound the same), and slurred speech.

For instance, look at this transcript. Are the errors here caused by my inability to speak clearly, the transcription software not understanding what I said, a lack of punctuation, or something else?

After listening to the audio, here’s the cleaned up version of the text. There were only a few incorrect words. The bulk of my time was spent in adding punctuation and appreciating that when I’m narrating a screen demo, I don’t speak in complete sentences.

So, is the system accurate? Yes, and No, and Maybe. Like I said, it depends…

SENDING TO FCP X

When you send a file to Final Cut Pro X, each sentence is sent as a separate clip. (Again, no media files are hurt in this process.)

When you display the Notes field inside FCP X, the text is included in the note.

Text in a Notes field is searchable. So, when I searched for “vignette,” it found the clip that contains that text, as well as displaying it in the Notes field for that clip.

This makes using transcripts inside FCP X REALLY useful.

NOTE: Both Premiere and Avid can import the text, but each video editor handles it differently. Visit the SpeedScriber support site to learn more about how to integrate transcripts with your video editor.

SUMMARY

Suddenly, automated text transcripts are available from a number of vendors. But, SpeedScriber was the first. I am very impressed with its speed, overall accuracy and tight integration with key NLEs, while still supporting standalone transcripts.

If you are looking for ways to convert speech to text quickly, with reasonable accuracy, easy-to-use editing tools and multiple export options, SpeedScriber is an excellent choice.


Bookmark the permalink.

3 Responses to First Look: SpeedScriber, from Digital Heaven

  1. Jeffrey Cipin says:

    As the old saying goes, “Good, fast, cheap: Pick Two”. SpeedScriber is definitely faster and cheaper than human transcription. And, as someone who’s been using it since last November, I would say that while it sometimes could be better, it’s usually ok as a reference for editing. I wish it was a lot better at distinguishing one speaker from the next. But I’ve adjusted my expectations and I simply don’t bother spending time that I’m not being paid for and going through the transcripts to clean them up any more than I have to. One workaround that I’ve been using for interviews is to say “next question” before I ask the next question. That gives me something to quickly search for when going through the transcript. And audio quality can have a noticeable effect on transcription accuracy. For instance,I suggest that leaving in the camera mic channel to help pick up the audio of an un-mic’d interviewer isn’t worth the trade off of less accuracy of the subject’s transcription. I should mention that the app is continually being improved in terms of navigation and editing features. So overall, an extremely useful application. But not yet perfect 🙂

  2. Griffon Collie says:

    It sounds like a good tool for those on Premiere and FCPX but as primarily an Avid editor, I’ll stick with ScriptSync and PhraseFind which are very accurate and work on Windows and MAC systems

  3. Loren Miller says:

    Great overview, Larry.

    First there was the word. Some of us actually edit precise rough cuts from transcripts, because reading is faster than hand-eye. One gets to the skinny sooner, melding separate parts of an interview seamlessly to build cogent thoughts. Color-coding then organizes good picks by topic, in Pages, Word, etc. and a “conform” to actual rough cut follows quickly. On interview-heavy shows, I’ve used this process successfully for over 30 years. Accurate transcripts are also required as part of deliverables for fact-based broadcast work.

    SpeedScriber gives a good head start on *verbatim* transcript editing in a manner I’ve not seen other products, including the ability to correct easily and precisely in the app, as Larry has demonstrated here.

    From the finished export, one begins cutting ideas right down to the word, annotating for camera, and 90% of the time the first cuts are effective.

    Congratulations to Martin for addressing so many fundamental needs for those of us who us transcripts for more than searching video!

Leave a Reply

Your email address will not be published. Required fields are marked *

Larry Recommends:

FCPX Complete

NEW & Updated!

Edit smarter with Larry’s latest training, all available in our store.

Access over 1,900 on-demand video editing courses. Become a member of our Video Training Library today!

JOIN NOW

Subscribe to Larry's FREE weekly newsletter and save 10%
on your first purchase.