watch openAI’s sora make lifelike videos just from text and descriptions

OpenAI debuts its text-to-video model, Sora

OpenAI debuts its new video generation model Sora, which can create realistic AI videos just from text prompts and instructions. In a recent interview with Bill Gates, returned OpenAI CEO Sam Altman mentioned the future of ChatGPT, which he hoped could also generate videos from text. That dream has finally come true in the form of Sora, and the text-to-video AI model can generate videos up to a minute long while, as the OpenAI team claims, ‘maintaining visual quality and adherence to the user’s prompt.’

images and videos courtesy of OpenAI

OpenAI has released a series of samples from its new text-to-video model Sora. The text prompts need to be detailed so that the generated video can capture the visuals the user wants. So far, the text-to-video Sora can understand long instructions such as ‘The camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery.’

prompt: a movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.

OpenAI has also tried text prompts such as ‘A close-up view of a glass sphere that has a zen garden within it. There is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand’ and ‘A Chinese Lunar New Year celebration video with Chinese Dragon.’ Sora executed both prompts with seconds-long clips that can sustain a lifelike quality to the AI video. OpenAI says that Sora uses a transformer architecture similar to its GPT models, which helps scale the performance and quality of the videos.

prompt: a litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in

Aside from generating AI videos from text, OpenAI’s Sora can also transform an existing static image into moving videos. It is a feature that the text-to-video model can offer, and OpenAI also says that Sora can even take an existing video and extend it or fill in missing frames. It can also generate entire videos all at once or extend these generated videos to make them longer. ‘Sora is a diffusion model, which generates a video by starting off with one that looks like static noise and gradually transforms it by removing the noise over many steps,’ says OpenAI.

prompt: step-printing scene of a person running, cinematic film shot in 35mm.

What’s the catch with OpenAI’s Sora?

Amidst the new model, the text-to-video Sora still has holes to fill. OpenAI acknowledges their model’s weaknesses, enumerating that Sora can find it difficult to understand the physics of a scene or may not figure out some instances of cause and effect. ‘For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark,’ says OpenAI. In fact, Sora can mix up left and right, as seen in the AI generated video of a man running on a treadmill in the opposite direction.

prompt: the camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery.

Other notable strange effects that OpenAI’s Sora can cause so far is the appearance of additional objects not mentioned in the text prompts, such as animals or people spontaneously appearing. In one of the sample videos, a basketball even puts the hoop’s net into flames, causing it to explode; then, all of a sudden, a new basketball appears out of nowhere from the sky and passes through the hoop’s ring like a ghost. Even the camera movement can still be tricky, making the generated AI video shaky or unstable.

prompt: a Chinese Lunar New Year celebration video with Chinese Dragon

As of publishing the story, OpenAI has only granted access to a number of visual artists, designers, and filmmakers to its text-to-video model Sora for them ‘to gain feedback on how to advance the model to be most helpful for creative professionals.’ Even if they can’t use it yet, fans of the company are already in line to use the AI model themselves, but others also weigh in on the potential risks that this generative model might entail.

Users weigh in on the new AI text-to-video model

Some users feel excited that they can play with OpenAI’s Sora to turn their ideas into reality, such as remaking an episode from their favorite TV show so that it favors the way they want it to end. Others also note that if the new text-to-video model improves – and with the current speed of advanced technologies keeps moving forward – then people may no longer turn to the services of stock footage and commercials since they can make them themselves.

prompt: the camera directly faces colorful buildings in burano italy. an adorable dalmation looks through a window on a building on the ground floor.

Worries about the future of the creative industry also surface, including job losses replaced by Sora or other text-to-video models that may come up in the future or an entire movie generated by Sora. There are also users who find OpenAI’s new text-to-video model fascinating, commenting on the quality of the generated videos and the app’s potential to further shape it up.

prompt: a close up view of a glass sphere that has a zen garden within it. there is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand.

On X, Sam Altman even tested out some of the text prompts suggested by the users and uploaded them for everyone to see. The series includes a grandmother waving to her audience as she prepares homemade gnocchi, two golden retrievers hosting their podcast on a mountain, a half-duck, half-dragon flying through the sunset with a hamster dressed in adventure gear on its back, and animals locked in a zoo and chomping on pieces of jewelry.