Skip to main content

AI assistants will soon recognize and respond to the emotion in your voice

emotion
Konstantynov/123RF
You know when people say that it’s not what you say, but how you say it that matters? Well, very soon that could become a part of smart assistants such as Amazon’s Alexa or Apple’s Siri. At least, it could if these companies decide to use new technology developed by emotion tracking artificial intelligence company Affectiva.

Affectiva’s work has previously focused on identifying emotion in images by observing the way that a person’s face changes when they express particular sentiments. Affectiva’s latest technology builds on that premise through the creation of a cloud-based application program interface (API) that is able to detect emotion in speech. Developed using the power of deep learning technology, the smart tech is capable of observing changes in tone, volume, speed, and voice quality and using this to recognize emotions like anger, laughter, and arousal in recorded speech.

Recommended Videos

“The addition of Emotion AI for speech builds on Affectiva’s existing emotion recognition technology for facial expressions, making us the first AI company to allow for a person’s emotions to be measured across face and speech,” Rana el Kaliouby, co-founder and CEO of Affectiva, told Digital Trends. “This is all part of a larger vision that we have. People sense and express emotion in many different ways: Through facial expressions, voice, and gestures. We’ve set out to develop multi-modal Emotion AI that can detect emotion the way humans do from multiple communication channels. The launch of Emotion AI for speech takes us one step closer.”

Affectiva Overview

Affectiva developed its voice recognition system by collecting naturalistic speech data from a variety of sources, including commercially available databases. This data was then labeled by human experts for the occurrence of what the company calls “emotion events.” These human generated labels were used to train and validate the team’s deep learning models, so that over time it grew to understand how certain shifts in a person’s voice might indicate a particular emotion.

Please enable Javascript to view this content

It’s smart stuff from a technology perspective but, like the best technology, it also has the possibility of helping users on a practical basis. One specific application could include car navigation systems that are able to hear a driver start to experience road rage, and react to prevent them from making a rash driving decision. It could similarly be used to allow automated assistants to change their approach when they hear anger or frustration from a user — or to learn what kind of responses elicit the best reactions and repeat these strategies.

Luke Dormehl
Former Digital Trends Contributor
I'm a UK-based tech writer covering Cool Tech at Digital Trends. I've also written for Fast Company, Wired, the Guardian…
I tried out Google’s latest AI tool that generates images in a fun, new way
Google's Whisk AI tool being used with images.

Google’s latest AI tool helps you automate image generation even further. The tool is called Whisk, and it's based on Google’s latest Imagen 3 image generation model. Rather than relying solely on text prompts, Whisk helps you create your desired images using other images as the base prompt.

Whisk is currently in an experimental phase, but once set up it's fairly easy to navigate. Google detailed in a blog post introducing Whisk that it is intended for “rapid visual exploration, not pixel-perfect edits.”

Read more
Google strikes back with an answer to OpenAI’s Sora launch
Veo 2 on VideoFX

Google's DeepMind division unveiled its second generation Veo video generation model on Monday, which can create clips up to two minutes in length and at resolutions reaching 4K quality -- that's six times the length and four times the resolution of the 20-second/1080p resolution clips Sora can generate.

Of course, those are Veo 2's theoretical upper limits. The model is currently only available on VideoFX, Google's experimental video generation platform, and its clips are capped at eight seconds and 720p resolution. VideoFX is also waitlisted, so not just anyone can log on to try Veo 2, though the company announced that it will be expanding access in the coming weeks. A Google spokesperson also noted that Veo 2 will be made available on the Vertex AI platform once the company can sufficiently scale the model's capabilities.

Read more
ChatGPT vs. Perplexity: battle of the AI search engines
Perplexity on Nothing Phone 2a.

The days of Google's undisputed internet search dominance may be coming to an end. The rise of generative AI has ushered in a new means of finding information on the web, with ChatGPT and Perplexity AI leading the way.

Unlike traditional Google searches, these platforms scour the internet for information regarding your query, then synthesize an answer using a conversational tone rather than returning a list of websites where the information can be found. This approach has proven popular with users, even though it's raised some serious concerns with the content creators that these platforms scrape for their data. But which is best for you to actually use? Let's dig into how these two AI tools differ, and which will be the most helpful for your prompts.
Pricing and tiers
Perplexity is available at two price points: free and Pro. The free tier is available to everybody and offers unlimited "Quick" searches, 3 "Pro" searches per day, and access to the standard Perplexity AI model. The Pro plan, which costs $20/month, grants you unlimited Quick searches, 300 Pro searches per day, your choice of AI model (GPT-4o, Claude-3, or LLama 3.1), the ability to upload and analyze unlimited files as well as visualize answers using Playground AI, DALL-E, and SDXL.

Read more