Voice actors seeing an increasing threat from AI

By Trevor Mogg Published February 9, 2023

Voice actors are the latest creatives to feel the heat from artificial intelligence (AI).

According to a Motherboard report this week, those providing their voices for content such as ads, gaming titles, and animations have been noticing how clients are increasingly asking them to sign contracts that hand over the rights to their voice so that AI can be used to create a synthetic version.

Recommended Videos

This, of course, leaves them in a very awkward position, for if they refuse to agree to such a clause, they could well lose the work. Acceptance of the clause, however, would likely result in an AI version of their voice handling future projects.

One voice actor expressed concerns to Motherboard that a client would be able to use the technology to “squeeze more performances out of me” without offering any extra compensation.

Another noted how, at the current time, if they’re in a recording booth and have an issue with a particular line, they can inform the director and find a solution there and then. But the AI technology means that sweeping edits, including the insertion of entirely new sentences, could take place later without the voice actor ever being told.

Contracts that give clients the right to synthesize an actor’s voice are now “very prevalent,” Tim Friedlander, president and founder of the National Association of Voice Actors (NAVA), told Motherboard.

Friedlander said the language in the contracts can be “confusing and ambiguous,” meaning the actor might sign away their rights without even realizing it.

Worryingly, clients are informing some actors that they won’t be considered for a job if they refuse to accept the clause.

The situation is deemed so serious for the industry that NAVA has issued advice for voice actors, telling them never to grant synthesis rights to a client and to contact their union or an attorney if they suspect the contract is trying to take their rights.

“Long story short, any contract that allows a producer to use your voice forever in all known media (and any new media developed in the future) across the universe is one we want to avoid,” NAVA says on its website.

With AI now gaining greater prominence and the technology improving all the time, it’s hard to see how many industries, voice acting among them, will escape its effects.

One cited solution, in this case, could be to build into contracts a licensing system that means an actor will be paid each time a synthesized version of their voice is used, but rates for such usage would almost certainly be low, meaning it’s unlikely to be accepted by those currently able to make a living from voice work.

Topics

Contributing Editor

Not so many moons ago, Trevor moved from one tea-loving island nation that drives on the left (Britain) to another (Japan)…

Computing

ChatGPT’s Advanced Voice Mode now has a ‘better personality’

ChatGPT's Advanced Voice Mode on a smartphone.

If you find that ChatGPT’s Advanced Voice Mode is a little too keen to jump in when you’re engaged in a conversation, then you’ll be pleased to know that the latest update should bring an end to such unwanted interruptions.

OpenAI post-training researcher Manuka Stratta said in a video posted on Monday that the update gives Advanced Voice Mode a “better personality,” adding that the AI-powered tool will now “interrupt you much less ... [and] because it interrupts you less, you'll be able to have more time to gather your thoughts and not feel like you have to fill in all the gaps and silences all the time.”

Computing

Samsung might put AI smart glasses on the shelves this year

Google's AR smartglasses translation feature demonstrated.

Samsung’s Project Moohan XR headset has grabbed all the spotlights in the past few months, and rightfully so. It serves as the flagship launch vehicle for a reinvigorated Android XR platform, with plenty of hype from Google’s own quarters.
But it seems Samsung has even more ambitious plans in place and is reportedly experimenting with different form factors that go beyond the headset format. According to Korea-based ET News, the company is working on a pair of smart glasses and aims to launch them by the end of the ongoing year.
Currently in development under the codename “HAEAN” (machine-translated name), the smart glasses are reportedly in the final stages of locking the internal hardware and functional capabilities. The wearable device will reportedly come equipped with camera sensors, as well.

What to expect from Samsung’s smart glasses?
The Even G1 smart glasses have optional clip-on gradient shades. Photo by Tracey Truly / Digital Trends
The latest leak doesn’t dig into specifics about the internal hardware, but another report from Samsung’s home market sheds some light on the possibilities. As per Maeil Business Newspaper, the Samsung smart glasses will feature a 12-megapixel camera built atop a Sony IMX681 CMOS image sensor.
It is said to offer a dual-silicon architecture, similar to Apple’s Vision Pro headset. The main processor on Samsung’s smart glasses is touted to be Qualcomm’s Snapdragon AR1 platform, while the secondary processing hub is a chip supplied by NXP.
The onboard camera will open the doors for vision-based capabilities, such as scanning QR codes, gesture recognition, and facial identification. The smart glasses will reportedly tip the scales at 150 grams, while the battery size is claimed to be 155 mAh.

Computing

I tested the future of AI image generation. It’s astoundingly fast.

Imagery generated by HART.

One of the core problems with AI is the notoriously high power and computing demand, especially for tasks such as media generation. On mobile phones, when it comes to running natively, only a handful of pricey devices with powerful silicon can run the feature suite. Even when implemented at scale on cloud, it’s a pricey affair.
Nvidia may have quietly addressed that challenge in partnership with the folks over at the Massachusetts Institute of Technology and Tsinghua University. The team created a hybrid AI image generation tool called HART (hybrid autoregressive transformer) that essentially combines two of the most widely used AI image creation techniques. The result is a blazing fast tool with dramatically lower compute requirement.
Just to give you an idea of just how fast it is, I asked it to create an image of a parrot playing a bass guitar. It returned with the following picture in just about a second. I could barely even follow the progress bar. When I pushed the same prompt before Google’s Imagen 3 model in Gemini, it took roughly 9-10 seconds on a 200 Mbps internet connection.

A massive breakthrough
When AI images first started making waves, the diffusion technique was behind it all, powering products such as OpenAI’s Dall-E image generator, Google’s Imagen, and Stable Diffusion. This method can produce images with an extremely high level of detail. However, it is a multi-step approach to creating AI images, and as a result, it is slow and computationally expensive.
The second approach that has recently gained popularity is auto-regressive models, which essentially work in the same fashion as chatbots and generate images using a pixel prediction technique. It is faster, but also a more error-prone method of creating images using AI.
On-device demo for HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
The team at MIT fused both methods into a single package called HART. It relies on an autoregression model to predict compressed image assets as a discrete token, while a small diffusion model handles the rest to compensate for the quality loss. The overall approach reduces the number of steps involved from over two dozen to eight steps.
The experts behind HART claim that it can “generate images that match or exceed the quality of state-of-the-art diffusion models, but do so about nine times faster.” HART combines an autoregressive model with a 700 million parameter range and a small diffusion model that can handle 37 million parameters.

nproxy.org