Just after the UK launch of the Amazon Echo in the Autumn of 2016, I wrote a blog post titled "Why it’s good to talk, trust, think and feel", in which I explored the origins of human speech and the potential for synthetic voices where I linked to Wavenet, the work of DeepMind AI. They have been part of Google since 2014 and are undoubtedly behind many of the impressive aspects of Duplex. It’s funny as an audio creative I’ve always been drawn to natural, emotive vocal delivery, trying to distill and replicate its impact in my own presentations and yet when it comes the production of ads we often remove the imperfect, the umms, arrhs and breathes, unless it’s dialogue of course. However why shouldn’t they remain in some announcement, single voice scenarios. If they need to be added to enable trust in the delivery of a synthetic voice then perhaps we should be more forgiving in other circumstances.
The other noteworthy recent development in this area was the synthetic recreation of JFK’s voice to deliver the speech he never gave in Dallas - 1963, the day he was assassinated. This was the work of Edinburgh based Text to Speech specialist, CereProc.
There are some really interesting applications for this technology with A Million Ads, starting with simply testing how dynamic scripts might sound within our Studio pre-production, right through to voicing huge lists of store locations, retargeted product catalogues or all known first names to entire campaigns. The key creative aspects to believable synthetic voices are the same we are dealing with when ensuring that dynamic campaigns using human voices sound indistinguishable from non-dynamic broadcast style ads. Particularly making sure dynamic edit points are compatible with the way we naturally merge sounds in the way we speak. However longer term the idea of being able to synthetically sample and recreate people’s voices could have a profound effect on voice talent. I used a CereProc synthetic voice, that we considered the most believable called Stuart, for this Nissan Leaf pitch demo highlighting that lack of emotional engagement.