I’ve worked with a range of A/V files, dating back as far as the mid 1980s. The early ones have to be converted to digital format, while more recent files are born digital. A level of sound quality that required professional grade equipment a few years ago can now be accomplished with a smartphone or cheap recorder and clip-on mic, making it available to virtually everyone. Technology isn’t the only component necessary for good audio though; often the difference between gibberish and crisp clarity is your recording environment. Of course you always want good sound, but you really need it if you plan to transcribe, and here’s why: transcribing is extremely time-consuming. I’ve estimated that it takes my interns approximately five to six times the length of an interview to create a rough transcription, and that doesn’t include quality checking. There are various types of transcription software available that can cut down on that time significantly, but only if your audio is comprehensible to them. So how to simplify the process of turning audio into searchable text?
Recording the clearest sound possible is the place to start. Always do an audio test before you start the interview, making sure that all speakers can be heard clearly. Be aware of background noise; always try to avoid recording outside, since there are sounds you can’t control, like traffic, birds, wind, etc. Also be aware of the ambient sound in your interior environment; things like heating and cooling units, pets, or people moving around. Sound meter apps are available for any modern smartphone, which is a great way to check the decibel levels. For more information about that, check out Doug Boyd’s Digital Audio Recording Levels and Oral History.
I know next to nothing about recording equipment, because my work takes place once files are transferred to a computer. However, I’ve done some informal tests, and been able to achieve decent quality audio using my less than stellar Windows phone (don’t judge me), the Mini Recorder Free app, and a cheap clip-on mic (3 for $6.50 on Amazon). I recorded an interview with my Grandma in her sunroom with that setup, and the results weren’t bad. Below is a clip of from the interview, using the basic, automated transcription service of Pop Up Archive. The automated transcription is indicated by the timestamps, and the manual transcription in brackets:
0:00: starting at the first things you remember that sounds like I remember when
[Yeah, starting with the first things you remember, that sounds like a good – well, I remember when]
0:05 my brother Jack was born he was born at home
[My brother Jack was born. He was born at home]
0:10 he was born right around Easter
[and he was born right around Easter]
0:15 I’m not sure if it was born on Easter he may have been
[I’m not sure if he was born on Easter, he may have been]
By contrast, here is a sample of an interview done for one of Kenyon’s oral history projects, again using Pop Up Archive basic. This recording was done outside, the interviewee had a strong regional accent, and he uses the filler word “like” throughout his speech:
0:50 government recommended
[When I grew up, my parents recommended]
0:55: South Chicago Learning Center services like no more than
[their child to come here for Penn Center purposes like, to know more of]
1:00: go to know everything so they’re like back in the day
[the culture and to know like, how everything was there like, back in the day]
1:05: who is the cast in 82 on the foundation of Tennis Center
[And there’s a camp and they teach you all the foundation of Penn Center]
The point is, good audio doesn’t need to be expensive or difficult, but it’s important. If you have a lot of ambient noise, don’t test your audio beforehand, or are using dated equipment, it will cause problems. While you can use audio editing software like Audacity to “clean up” your sound, this too can be time-consuming, and depending on your level of still and what the specific issues with the audio are, it may or may not help.
The most difficult challenge for speech recognition software however is language itself. The software must be able to separate out words from other noises of course; hence why it’s so important to record audio where voices are distinctly recognizable from other sounds. That’s tricky enough, but then let’s throw in the incredibly complex and fluid nature of language. It’s impossible to get an accurate count of the number of words in English, because there’s no consensus about what counts as a word. What about acronyms? Slang? Words from other languages that are in common usage? To further complicate the matter, there are differences in inflection and rhythms of speech; regional and social dialects can make it difficult for two individuals to understand each other, even when technically speaking the same language. All of these elements can decrease the accuracy of speech recognition software significantly, so before investing in a particular product, consider whether it’s appropriate for your project.
It’s also important to consider the role of transcription as it relates to the overarching goals of your project. Do you want the students to participate in transcription for pedagogical purposes? If not, will you do it yourself? It is worth hiring someone? Do you need complete transcripts, or just an easy way to easily navigate your interviews? 3Play Media offers video captioning, transcription, and subtitling services, but the cost is substantial, starting at $3 per minute on a pay-as-you-go plan, as of writing this post. Pop Up Archive offers an hour of free basic transcription processing, and if you have an OHLA-funded project, you have access to unlimited this basic processing. Dragon is a program that is installed from disk or download on an individual workstation, and costs a flat fee, which varies based on version and license type.
Another approach is to use the Oral History Metadata Synchronizer (OHMS), a free, open source program developed by the University of Kentucky’s Louie B. Nunn Center for Oral History. The main advantages OHMS offers the transcribing process that you can “index” transcripts ( i.e., create time-stamped titles that can take viewers right to the part of the interview they want to hear), and you can transcribe in the same window the A/V file is streaming in. The downside to OHMS, as is often the cause with free, open source software, is that it requires some IT skills and support to fully utilize. I’ll discuss the setup and uses of OHMS in more detail in a later post.
Regardless of your choice of recording and transcribing tools, the foundation of successful transcription is always the initial recording quality. Once you’ve got the hang of that, make sure you’re taking into account the substantial time and/or financial investment full transcription takes. In my experience, it’s easy to underestimate. That makes it the most common place oral histories get bogged down, and the least fun place to be stuck.