This isn’t another ChatGPT story. OpenAI, best known for ChatGPT, has a vast portfolio of tools, from image generation to voice transcription, which provide a compelling business case – provided businesses find the square peg for the square hole.
At Clockwork, we work with various requests that are time-sensitive and, more importantly, privacy-focused. One of our recent tasks was to transcribe a batch of nearly 23 hours of market research focus group recordings. It arrived on a Tuesday, and it needed to be completed by the end of that week. Anyone who’s transcribed recordings to text before knows it’s a game of play/pause, write, ‘what did they say?’, rewind, and repeat. All this would have meant pulling staff away from important tasks to do, let’s face it, a menial one.
With this privacy-focused project, and the sensitive nature of the recordings, online services such as Otter.ai and Gong were a non-starter. Balancing privacy with accessibility is set to be one of the future dilemmas of tech as AI improves and external privacy policies become more complex (for companies that even follow them). One of the most elegant solutions is to just never let privacy blunders happen by not letting external parties have access to sensitive data in the first place.
Enter OpenAI Whisper, the latest automatic speech recognition system from ChatGPT’s creators. The model can be downloaded and run locally, so source videos never have to touch someone else’s servers.
One of Whisper’s functionalities is transcription. A model is only as strong as how it’s used, so Clockwork data analysts used Python knowledge and a modern computer to create a transcription program to perfectly fit our needs. This program serves as the ‘transcription expert’ artificial resource, which transcribes with extreme accuracy and speed. On an M1 Mac, an hour of fast-paced group conversation took about 15 minutes to transcribe. The industry standard for one hour of human-based transcription is four hours, or 16 times slower than the computer.
Being an international company, we were faced with another challenge: not everyone speaks English. Another of Whisper’s capabilities is AI-based translation, which enable businesses to truly understand clients without language barriers.
This proved extremely helpful with one transcription batch, which was in Brazilian Portuguese. Any Portuguese speaker will tell you that the automated online translation services out there will translate word for word, but not the meaning behind what’s being said. This is because dictionary-based translation (think Google Translate – at least at the time of writing) will translate common phrases fine, but more nuanced sentences terribly.
Here’s where Whisper excels. The AI-based model listens for meaning in the words, providing a far more accurate translation of intonation, emphasis and context. When proofing this transcription and translation, a first-language Portuguese speaker proved that this tool works, and it works exceptionally well.
In the age of meaningful work, businesses need to balance grunt work with employee happiness. Any copywriter will tell you that they want to create great copy, not write down what they’ve heard in a recording. This is where Clockwork has taken the initiative to leverage human resources to create specific tools to create ‘artificial resources’, based on Whisper’s AI models, to keep creativity high and ‘creativity-killers’ low.
The beauty behind creating tools through code is having creative carte blanche on how they work. In its initial version, our tool provided exactly what was said in one long paragraph. Useful, but not quite there yet. Using voice analysis, the second version separated out speakers into different groups of who said what, which created even more value to the process. In an effort to make the most accessible version of the tool, we added time stamps to enable the tool to create subtitle files tied to when the person spoke. In the final version, several transcription files were produced for five popular languages: English, Spanish, German, Italian, and Portuguese. This created the most accessible version of our artificial resource. After roughly two hours of programming, we created our ‘artificial resource’ for those who are hard of hearing and those who don’t speak English.
The possibilities around using bespoke AI in business are growing rapidly. Privacy-focused, onsite transcription and translation services are just the start of making business more enjoyable, and Clockwork is always looking for more ways to work smarter.