Pop Up Podcasting

View Original

Automatic Transcription Services Compared: Which Should You Use?

Transcribing audio or video content into text is a handy way to speed the production process, ease collaboration, and improve the SEO and accessibility of a finished podcast or video. Until recently, this was a slow and/or expensive manual process - but now a slew of automated cloud-based services claim to make transcription quick, easy, and cheap.

But is automated transcription any good? We ran the same audio through some of the most popular options to judge if any come close to good ol' human transcribers.

UPDATE! (Dec 2017)

Things are changing rapidly in the automated transcription world so we thought it was time for a few updates to this review:

  • Happy Scribe, which we originally panned, has made major improvements so we've taken down our original review and replaced it with this new review.

  • Pop Up Archive closed its doors recently after being acquired by Apple - congrats to them!

  • Temi is a new offering from the folks at Rev. We haven't had a chance to review it yet, but it warrants a mention because Rev is such a big player in the transcription business and because at the time of this writing, Temi is offering an unlimited FREE trial, making it the best value we've seen (for now).

Testing Process

Each of the services listed below transcribed two audio files:

  1. Matty.mp3, an 8-minute excerpt of a fully produced podcast (Marion Kane’s Sittin’ in the Kitchen interview with Viceland star Matty Matheson). It includes two distinct voices, one male and one female.

  2. Birthday.wav, a 2-minute excerpt from a raw interview that includes three voices and some background noise.

Summary of Results

After uploading the test files to eight different transcription services, two things became clear: automated transcription isn't quite ready for prime time, and the available services vary a lot in terms of quality, features, and ease of use.

The leader of the pack is Rev, which isn't really automated transcription at all (see below for more on that). Following close behind Rev are services that share a common back-end: Google's Speech API. Top performers Pop Up Archive, trint, Go Transcribe, and surprise winner Sonix, all use it. They each apply their own post-processing to the text, so their output isn't identical, and each has it's own features and interface quirks.

Then there's the bottom of the barrel: SpokenData isn't worth your time or money. VoiceBase might be worth a shot, but only because it offers such a generous free plan: 50 hours of audio transcribed.

Read more about the services below, and if you prefer to evaluate them for yourself, you can download the full transcripts they each generated, linked at the bottom of each section.

Jump to Individual Reviews:

A Note About Errors

Automated transcription services produce interesting, odd, and sometimes hilarious or even offensive errors. So while these services are fine for internal use, you should always review their output carefully before publishing an automated transcript publicly. Below are some highlights of the kinds of misinterpretations only a computer could make.

The services that use Google's Speech API (Go Transcribe, trint, Pop Up Archive, and Sonix) all produced very similar errors:

  • The theme song lyrics in matty.mp3 were butchered beyond recognition.

  • "Matty" intermittently became "Mattie" or "Maddie"

  • "Clicks" became "CLECs" - oddly.

  • "F***ing house" became the much tamer "second house"

  • "Trish and Mac" were merged into a single name: "Tricia Mac"

  • "Fort Erie" intermittently became "fortiori"

  • And rather sweetly, "my parents" became "my friends"

These are significant problems, but nothing compared to some of the standout errors from the other services.

  • VoiceBase was nearly incomprehensible at times:

    • "pouring" became "horrifying"

    • "you had a heart attack" became "you had a hard"

    • "I still can't afford a f***ing house in Toronto" became the much more terrifying "I stuck in a f***in house in Tron"

    • Sadly, "We rent a house in Parkdale" became "we rent a house apart"

    • "Trish and Mac" became the defeatist publication "attrition mag"

    • "favourite restaurant" became "favor Russia"

    • "Fort Erie" became "forty area" - which I suppose it is.

  • SpokenData had way more errors than we could possibly list here:

    • "her birth" became "Herbert"

    • "head to toe" became the ominous "had to kill"

    • "it's ours" became "is Paris"

    • "Pudgy chef Matty Matheson has been called the John Belushi of the Toronto food scene, and not just because he's witty, brash, and talented" morphed into the bizarrely poetic "Party chef Matty Matheson has been cool to join blue. She also to run to food see and not just because he's witty brash untalented"

    • "Hugely popular Vice TV show" became "usually call feel of ice TV show"

    • "Early thirties" became "India of Lisa"

    • "Cool my jets" became "Coma jets"

    • Another Tron reference emerged when "Grimy underbelly of Toronto" turned into the "Crimea underbelly of Tron"

    • "Dark side" didn't seem so bad as the "dog side"

    • "house in Toronto" became "I was in Tron"

    • and many more...

 

The Reviews

See this content in the original post

The Gold Standard: Rev

Free Trial: NO
hourly rate post-trial: US$75

Let's start with the best. Nothing quite compares to skilled human transcription. You can find humans interested in this kind of work via radio and podcasting groups, and great time-stamped transcripts run around $1/minute of audio.

But since we're looking at automating the transcription process, I wanted to highlight a company that's essentially the Uber of transcription. From the user's perspective, the service acts a lot like the automated transcription services below, but behind the scenes they're farming your audio out to human transcribers. I’ve used Rev on several projects and found it to be quick and reliable. That said, like Uber, workers for this kind of cloud-based micro-task service don't always get a great deal. If you can, hire transcribers directly.

Rev’s ease of use, speed, quality of transcription, and unique features make it stand out - but the cost can add up compared to truly automated options.

Positives:

  • Fast: Files under 30 minutes are guaranteed done within 12 hours or less. In my tests it was much faster than that though - matty.mp3 took 30 minutes to finish and birthday.wav took 8 minutes.

  • Speaker Identification and Glossary: When you upload a file to Rev, they’ll ask you for speaker names and any tricky words that might appear in your file. A nice feature that many automated options fail to include.

  • Nice Interface: Rev’s web interface is slick and easy to use. Uploading multiple files and formats at once is easy, and your transcriptions are conveniently available via a dashboard (plus they email you the files).

  • Time Stamps: The inclusion of time stamps every 30 seconds makes finding that killer quote quick and easy.

Negatives:

  • Price: While still relatively inexpensive, Rev’s fees can add up if you’re transcribing a lot of raw tape. It’s US$1.25/minute for transcription with time stamps, so my 2 clips cost a total of US$13.75 (my audio totalled 10:01, and the system rounded that up to 11 minutes). There’s also no free trial.

  • Can Be Inconsistent: Despite entering speaker names for birthday.wav, the transcriber labelled speakers “Speaker 1” etc... I’ve also encountered some inconsistency with the quality of transcription in the past. Fortunately Rev allows you to rate the quality of transcription out of 5 stars each time. Less than 4 stars means you won’t be paired with that transcriber again. I once received a refund for a poor quality transcription (which still wasn’t that bad).

Accuracy:

Fantastic, nearly flawless. Speakers were named correctly (with the exception noted above), proper nouns were handled well, and the punctuation was great. The intro music lyrics on matty.mp3 were ignored, which is the right way to go I think. I didn’t select the “verbatim” option, so some repeated words and “you know”s were ignored. It would be nice to see laughter noted (this might be included with the verbatim option). I like the non-verbatim option for publishing online though, since those verbal tics sometimes work well in the audio, but can hinder readability.

The only significant error here was "invincible" became "invisible".

View Transcripts: birthday.wav / matty.mp3

Overall Grade: A

Highly recommended, if you can afford it.

See this content in the original post

Usable, But Only Just: Go Transcribe

Free Trial: 1 file, under 10 minutes
Price per hour after the trial: US$13.20 (and monthly plans reduce this further)

Positives:

  • Easy to Use: No credit card needed for the free trial. Upload was quick and easy. 

  • REALLY Fast: As with all the automated transcription services we tested, transcripts were ready in less time than it takes to listen to the audio.

  • Nice Online Editor: Another common feature among automatic transcription tools, this allows you to listen back to your audio while viewing or editing the text. The current word is highlighted in real time to help you follow along and spot errors. Go Transcribe even highlights words it's uncertain about in red.

  • Time Stamps: They aren't displayed in the online editor interface, but they're frequently inserted when you download the transcript.

Negatives:

  • No Speaker Names: Knowing who's talking at any given time is often critically important, and entering this by hand is a lot of extra work.

Accuracy:

Actually pretty good. Punctuation was non-existent, other than some seemingly-random line breaks and periods. Proper nouns were hit or miss, names were sometimes correct, sometimes not (e.g. YouTube was capitalized properly, but the name a restaurant wasn’t).

View Transcripts: birthday.wav / matty.mp3

Overall Grade: C

Passable, but lack of speaker names and lots of errors bring down the final grade.

See this content in the original post

Good, But Not Great: trint

Free Trial: 30 minutes free
Price per hour after the trial: US$15 (and monthly plans reduce this further)

Positives:

  • Easy to Use: No credit card needed for the free trial. Upload was quick and easy. Works well with lots of cloud storage options too.

  • REALLY Fast: As with all the automated transcription services we tested, transcripts were ready in less time than it takes to listen to the audio.

  • Online Editor + Interactive Download: Like Go Transcribe above, trint provides an interactive viewer/editor that visually marries the audio and text, allowing you to follow along and correct errors as you go. Taking it a step further, trint allows you to download a version of this editor that works offline.

  • Caption Formats: In addition to the ubiquitous Word file download option, trint provides Subrip and VTT download options. Great for captioning videos, including audiograms.

  • Decent Time Stamp Support: Good time stamp frequency, but not super consistent, sometimes over a minute between stamps.

Negatives:

  • Speaker Names Must Be Entered Manually: trint's editor includes the ability to enter speaker names, but it's a manual paragraph-by-paragraph affair. Previously used names are added to a drop-down menu for later use in the same transcript - but it's still pretty slow going.

Accuracy:

Good - a bit better than Go-Transcribe. Still quite a few errors though, and like other solutions, the punctuation is lacking. You'll notice a lot of the same errors as Go-Transcribe. Many of these transcription services use Google's Speech API as a first step, then do further processing and formatting from there.

View Transcripts: birthday.wav / matty.mp3

Overall Grade: C+

Speaker name support and the usual errors hold trint back - but it edges out Go Transcribe on accuracy and features.

See this content in the original post

The Public Radio Fav: Pop Up Archive [No Longer Available]

Free Trial: 1-hour
Price per hour after the trial: US$15 (and monthly plans reduce this further)

Pop Up Archive is used a lot in the public radio world, and its transcription features and accuracy are pretty good. The only major downside is a confusing interface that stems from the fact that it's not meant exclusively for transcription like others reviewed here.

Positives:

  • Nice Online Editor: Like several others we looked at, Pop Up Archive features a slick online editor that allows you to easily verify and correct transcripts.

  • Good (but not great) Speaker Identification: Pop Up Archive takes a unique approach to speaker identification. It does its best to ID speakers and assigns them alpha-numeric codes, M1,M2, M3 etc.. for male speakers, and F1 etc.. for female. Once transcription is complete, you can go in and assign real names to each code, so every "M1" becomes "Mike Smith" for example. In my tests the system identified more speakers than there actually were in the file, but it's still a huge time saver and you can quickly assign "Mike Smith" to M1, M2, and M3 if it mistakenly thought those were different people. 

  • Nice Time Stamps: They're frequently inserted and you can easily export your transcript with or without them.

Negatives:

  • Confusing Upload Process: I could select multiple files to upload at once, but it didn’t work - you have to do them one at a time (though there is a separate bulk uploader). "Collections" are confusing. The upload form has a lot of fields you might not need. But these issues all boil down to the fact that Pop Up Archive is designed to be more than an online transcription service. It's designed to help public radio producers and stations organize searchable archives.

Accuracy: 

Very similar to other Google Speech API services (so quite good).

View Transcripts: birthday.wav / matty.mp3

Overall Grade: B-

Good accuracy and speaker recognition, but a confusing interface keeps Pop Up Archive from ranking higher.

See this content in the original post

The Student Project: Happy Scribe

Free Trial: nothing, pay as you go from the start
Price per hour after the trial: ~$6USD (0.09 euros/min)

Update - December 13, 2017: We took the time to re-review Happy Scribe after one of the founders informed us of changes that might improve our initially negative impressions. Turns out they were right - we’re happy to report many of our initial complaints about Happy Scribe have been addressed. See our new review below.

Positives:

  • Follows Along: The text is highlighted as the audio is playing and shows time stamps.

  • Intro Music: The intro music and lyrics in our audio sample were totally ignored and transcription continued as soon as audible speech was recognized.

  • Super Simple Interface: Drag a file into the upload window, enter your credit card number, and you're done.

  • Price: Inexpensive, one of the cheapest we looked at.

  • Continuous Improvement: Several issued we ran into in our initial review have been addressed since. Time stamps and a dashboard listing all your transcripts have been added for example.

Negatives:

  • No Speaker Names

  • No Bulk Uploads: Each file has to be individually uploaded and paid for. Not good for large projects.

  • Transcript Downloads: Only available as text file.

Accuracy: 

Pretty much on part with others in this class.

View Transcripts: birthday.wav / matty.mp3

Overall Grade:C+

See this content in the original post

You Get What You Pay For: VoiceBase

Free Trial: 50 hours!
Price per hour after the trial: US$1.20

VoiceBase is the cheapest option reviewed here, unfortunately, you get what you pay for.

Positives:

  • Amazing Free Account: 50 Hours Free!

  • Easy Signup and Upload Process

Negatives:

  • No Time Stamps

  • Speaker Names: You can manually add speaker names line-by-line in the editor, but the interface is cumbersome.

Accuracy: 

Not great, and poor formatting makes it harder to read. Line breaks are added randomly it seems.

View Transcripts: birthday.wav / matty.mp3

Overall Grade: D-

The 50-hour free account is tempting, but the accuracy is pretty bad and important features are missing.

See this content in the original post

Just Avoid It: SpokenData

Free Trial: 1 hour
Price per hour after the trial: 6 euros

Positives:

  • Speed: Pretty much as quick as the others.

  • Serviceable Editor: It follows along as you play the audio.

Negatives:

  • No Bulk Uploading: Just one file at a time.

  • Strangely, Too Many Time Stamps: SpokenData provides time stamps every few words, in an unconventional format, making the transcripts much more cumbersome to work with.

  • No Speaker Names

  • Confusing Interface

Accuracy:

Pretty bad - lots of fun errors (see above).

View Transcripts: birthday.wav / matty.mp3

Overall Grade: F

In places the text barely bore any relation to the audio.

See this content in the original post

The New Kid: Sonix

Free Trial: TBD [currently in Beta]
Price per hour after the trial:  TBD [CURRENTLY IN BETA]

UPDATE: Sonix is out of beta with a 1-hour free trial and pricing details on their website.

Sonix is currently in beta - it's still being tested and perfected, and pricing isn't yet available yet. That said, it shows a lot of promise. We'll do a full review when it's out of beta.

Positives:

  • Great Interface: Uploading was easy and lots of file types are accepted. 

  • Great Editor: Among the best I've tried. It smoothly shows where you are in the transcript and allows you to jump to specific points easily.

  • Good (but not great) Speaker Naming: Similar to Pop Up Archive, Sonix identifies speakers generically, then lets you give them specific names quickly and easily. Sonix didn't correctly identify that there were 3 different speakers in birthday.wav however.

  • Time Stamps: They were frequent and well formatted in the downloaded transcript.

Negatives:

  • No Bulk File Uploads: Audio files need to be uploaded one at a time, but that process is pretty easy and efficient.

  • Speaker Identification Isn't Perfect: As mentioned above, Sonix didn't correctly identify that there were 3 different speakers in birthday.wav, it also struggled with correctly switching speakers back and forth at the right times in matty.mp3.

Accuracy: 

Sonix' accuracy is among the best reviewed, if not the best. 

View Transcripts: birthday.wav / matty.mp3

Overall Grade: B

Combine great accuracy with a simple interface, a great editor, and useful speaker identification and we're nearing the functionality of human transcription. That said, there's still no substitute for the real thing. Even Sonix, the best automated transcription we reviewed falls short in terms of punctuation and accuracy. 

Have you tried automated transcription?
What did you think?
Let us know in the comments.