Skip to main content

Meta made DALL-E for video, and it’s both creepy and amazing

Meta unveiled a crazy artificial intelligence model that allows users to turn their typed descriptions into video. The system is called Make-A-Video and is the latest in a trend of AI generated content on the web.

The system accepts short descriptions like “a robot surfing a wave in the ocean” or “clown fish swimming through the coral reef” and dynamically generates a short GIF of the description. There are even three different styles of videos to choose from: surreal, realistic, and stylized.

An artist’s brush painting on a canvas close up

According to a Facebook post by Meta CEO, Mark Zuckerberg, translating written text into video is much harder because of how video requires movement:

Recommended Videos

“It’s much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they’ll change over time. Make-A-Video solves this by adding a layer of unsupervised learning that enables the system to understand motion in the physical world and apply it to traditional text-to-image generation.”

A young couple walking in a heavy rain

Meta’s AI Research team wrote a paper describing how the system works and how it differs from current text-to-image (T2I) methods. Unlike other machine language models, Meta’s Text-to-Video (T2V) method doesn’t use pre-defined text-video pairs. For example, it doesn’t pair “man walking” with a video of an actual man walking.

Please enable Javascript to view this content

If this sounds a lot like DALL-E, the popular T2I application, you wouldn’t be far off. Other T2I applications have rolled out since DALL-E gained popularity. TikTok released a filter in August called AI Greenscreen that generates painting style images based on the words you type.

A fluffy baby sloth with an orange knitted hat trying to figure out a laptop close up highly detailed studio lighting screen reflecting in its eye

AI-generated content has become quite buzzworthy within the last few years. Deepfake technology, machine learning techniques to replace a person’s face with another, is even used by visual effects studios for big budget shows like The Mandalorian.

In July, The Times mistakenly reported on a Ukrainian woman in the midst of the Russia-Ukraine war. The problem is she wasn’t real.

The threat of AI probably isn’t a real threat, but projects like DALL-E and Make-A-Video are fun explorations into some of the interesting possibilities.

David Matthews
Former Digital Trends Contributor
David is a freelance journalist based just outside of Washington D.C. specializing in consumer technology and gaming. He has…
Meta is reportedly working on a GPT-4 rival, and it could have dire consequences
The Facebook app icon on an iPhone home screen, with other app icons surrounding it.

Facebook owner Meta is working on an artificial intelligence (AI) system that it hopes will be more powerful than GPT-4, the large language model developed by OpenAI that powers ChatGPT Plus. If successful, that could add much more competition to the world of generative AI chatbots -- and potentially bring a host of serious problems along with it.

According to The Wall Street Journal, Meta is aiming to launch its new AI model in 2024. The company reportedly wants the new model to be “several times more powerful” than Llama 2, the AI tool it launched as recently as July 2023.

Read more
Meta is building a space-age ‘universal language translator’
A silhouetted person holds a smartphone displaying the Facebook logo. They are standing in front of a sign showing the Meta logo.

When you think of tools infused with artificial intelligence (AI) these days, it’s natural for ChatGPT and Bing Chat to spring to mind. But Facebook owner Meta wants to change that with SeamlessM4T, an AI-powered “universal language translator” that could instantly convert any language in the world into whatever output you want.

Meta describes SeamlessM4T as “the first all-in-one multilingual multimodal AI translation and transcription model.” That’s quite a mouthful, but in simple terms, it means it can convert languages in a range of different ways, such as taking speech audio and switching it into text in a different tongue.

Read more
DALL-E 3 could take AI image generation to the next level
DALL-E 2DALL-E 2 Image on OpenAI.

OpenAI might be preparing the next version of its DALL-E AI text-to-image generator with a series of alpha tests that have now been leaked to the public, according to the Decoder.

An anonymous leaker on Discord shared details about his experience, having access to the upcoming OpenAI image model being referred to as DALL-E 3. He first appeared in May, telling the interest-based Discord channel that he was part of an alpha test for OpenAI, trying out a new AI image model. He shared the images he generated at the time.

Read more