Testing speech transcription services and tools

Back in January of 2019 I tested a bunch of speech transcription services in the hopes of improving some of my life logging activities. I ended up settling on Otter.ai, but ultimately I don’t find it to be ideal to use it much. I prefer hand-written text for my current purposes. But I thought my testing notes might be useful to someone, even after this much time has passed (no doubt some of these services have improved, or other info like pricing has changed).

Maybe I’ll go through all this again some day and update. For now, here are my original test notes from 1/15/2019. Just keep in mind they are brief, non-comprehensive tests for a specific personal use case. Your results may vary.

A Note on Google

I presume all of Google’s speech-to-text tech derives from the same core. If that’s not the case it could change things, but proceeding on that assumption for now.

That presumably means that, in testing the GDocs voice typing, I have already tested their tech. It’s possible the “realtime” nature of the voice typing does however mean it’s using a faster/degraded/less accurate engine? In that case it may be worth testing this:

Also of note, several of the below services all use Google’s speech API, but can have different results as they apply different post processing to the text apparently? Interesting. According to a review these ones do: Go Transcribe, trint, Pop Up Archive, and Sonix.

Sonix

  • According to one review, potentially more accurate than some others
  • $10/mo and then $6/hr for transcription on top of that
  • Nice UI, color grading for confidence
  • Works quickly to transcribe

Test Notes

Actual transcription quality is better than many, but still not fantastic. It does ignore filler words well, but has some surprising errors. Still, not bad, might be usable. Price is, hmm, OK-ish.

Otter AI

This is a very interesting app and service that kind of does the note-taking thing I want an app to do (converting audio note to text). It’s cheap, 600 min free, 6000m/mo for $10/mo. It scored well on a blogger’s independent analysis.

Test Notes

Hmm. Promising but problematic. Text accuracy was pretty good, getting words like “cappucino”, although missing “meringue”. Might be fixable with better enunciation. But more problematically it breaks things up into paragraphs way too easily, so you get a lot of breaks, which I don’t want.

Update: Otter has emerged as a pretty clear winner. It’s accuracy is high, its transcription tools are good, its mobile and desktop versions are both good, and it offers the most free transcription time, which I have yet to go over on any given month. The only real issues with it are issues of any of these services: they don’t put out log-ready text and need varying degrees of massaging to be useful. This will hopefully improve over time and/or I will find more effective ways to make use of them.

Speak AI

An Otter-like startup (it seems) doing speech-to-text as well as regular text input and automated video transcription, trying to link them all together. And then do sentiment and other textual analysis. One of the co-founders has an interesting journey and is into quantified self, and he thinks the tool can be useful for that, so something to watch perhaps:

Remains to be seen how accurate its speech recognition is vs. e.g. Otter.ai, etc. (untested)

Spext

  • Scored well on an independent test for accuracy
  • Includes feature to highlight “um” and other filler words, nice!
  • $10-20/hr, depending

Test Notes

Arguably the best transcription I got. Broken up into larger paragraphs, left out “um” words, etc., and accurate, it got complex words like meringue! Impressive.

Descript

  • Desktop app, pretty much the only service that does that
  • Transcription is not the only use-case, it has an interesting “edit audio by editing text” thing, but I don’t need that
  • 15c/min pay as you go; $20/mo = 4hrs + $4 each additional hour

Test Notes

Quick transcription, faster than realtime, like most of these. Near the top in transcription quality, but handles punctuation and paragraphs a bit oddly, which could be problematic. Can only edit in the desktop app too, hmm. Editing is decent, but it doesn’t highlight low confidence words either. So a bit of a mixed bag really.

Simon Says

  • $15/hour without monthly, or $12.5/mo and $7.50/hr

Test Notes

Transcription started out good but got crappy in the 2nd half, missed “cappuchino”. Some OK stuff later. Mixed bag.

Trint

  • Surprisingly nice tools
  • I like the “mark reviewed” thing and it’s the
  • Only one I’ve seen with a custom vocab editor [this later changed, and some services offer it with paid plans only)

Test Notes

Unfortunately one of the worst, weirdest transcriptions. Oh well, hah.

Speech Pal

  • An entirely automated service
  • 15c/minute without pre-payment; 11c/minute for 1 month prepay
  • 120m free to test
  • Transcription of 8 min ready in 10 min or less.

Test Notes

Not the greatest transcription or tools.

Cielo24

  • A web-based voice transcription service that is partnered with VoiceBase below
  • Automated as well as human transcription
  • Similar ~10 min or less to transcribe 8 min of audio.

Test Notes

OK, so really bad transcription. This one’s out.

Temi

  • Slightly better UI than some others

Test Notes

Transcription was OK, not amazing. Fair number of errors.

Happy Scribe

  • Another Google-based option, I believe

Untested

VoiceBase

  • The enterprise version of the Cielo24 app above. Need to be a dev, probably.
  • 50hrs free though
  • According to at least 1 review, they have a high(er) error rate.