Why OpenAI’s new video generator, Sora, is making a splash

Subject: Science and tech

Section: Awareness in IT and Computers

What is Sora:

Sora (Japanese) means sky, an imagery of ‘limitless creative potential’.
OpenAI (CEO-Sam Altman) has created the video creation artificial intelligence (AI) model (Sora) that can create near flawless one-minute-long videos based on text prompt.
Sora is trained on videos and images of various durations, resolutions, and aspect ratios to generate crisp, clear, cinematic, high-resolution and photorealistic output.
This new diffusion-based AI model is built on the foundation of transformer architecture, similar to large language models like ChatGPT.

Comparison with others –

Sora’s achievement dwarfs Meta’s Make-a-Video and Google’s Lumiere text-to-video
Unlike the output from Meta’s, Google’s, or other earlier AI video tools, Sora provides studio-grade final product.

What can it do?

It can create images and videos with near-accuracy on a given subject.
It can construct a video from an image and fill gaps in existing video clips.
What is Diffusion model ?
Diffusion models are named after the physical diffusion process in which molecules move from high concentration to low-concentration zones.
Diffusion models are used to generate high-quality images and videos.
In machine learning, these models generate new data by reversing the diffusion process. These add noise to data and then reverse the visual data back to its original state by filtering out the noise.

Working principle of Sora:

According to OpenAI, Sora works by- “turning videos into patches by first compressing videos into a lower-dimensional latent space and subsequently decomposing the representation into spacetime patches.”
When a user sends prompts to Sora, it creates a video by stitching together compressed patches of visual data.

Patches vs Tokens:

Patches is to Sora what tokens is to ChatGPT.
Tokens unify diverse modalities of text like code, data, and natural languages.
Similarly, patches unify videos by compressing them — a form of tokenisation for visual data.

How good is Sora’s output?

It is a top-class AI-based image generator and photorealistic can stun at first look
But a closer look reveals there is work to be done in object tracking.
While OpenAI claims Sora can handle occlusion (a term in computer vision for objects disappearing when two or more of them come too close to each other).
But the text-to-video model does suffer from this limitation to an extent. For instance, in one of the clips shared by the Microsoft-backed company, people in the background disappear when the focus moves past a couple walking.
The AI model needs to understand space and time better.

Present status:

Sora is not available to the public yet. OpenAI plans to start sharing the model with third-party testers to receive feedback to improve the model.
Some experts are of the view that more systemic glitches will surface as more people gain access to the tool.

Can occlusion be remedied?

While AI researchers are looking to solve the object tracking problem. some AI experts predict it will be hard to set it right. They note that the fault stem from how the system builds reality.
Sora’s weird physics glitches are in some ways akin to LLM “hallucinations”, artefacts from

decompression and lossy compression.

That means more data is not going to solve the problem, and generative AI-based models are not going to understand or function as per the physical laws of nature.

What about the training data?

OpenAI’s achievement through Sora is monumental and it will disrupt video creation and gaming industries.
The video generation tool was trained on data from game engines, movies, documentaries, YouTube videos, and possibly videos scraped from every corner of the web.
But this data could possibly include copyrighted work. Unless OpenAI shares this information, it will be hard to know.
The unauthorised material scraped from the web to train AI violates copyright law is yet to be addressed by the courts.
Tech firms claim they are protected by the copyright’s fair use doctrine and lawsuits against them will stifle a growing AI industry.

What about misinformation?

Photorealistic video generation capabilities are worrying considering the burgeoning misuse of generative AI tools to spread misinformation.
To counter this OpenAI took the red-teaming route ahead of its public launch.
Sora already has a filter that will block prompt requests that mention violent, sexual, or hateful language, as well as images of well-known personalities.
A second filter will check frames of generated videos and block content that violates the company’s safety guidelines.
OpenAI has also said Sora uses a fake-image detector developed for