Stability AI gets into the video-generating game

AI startups other than OpenAI are making progress this week, despite the widespread coverage of the turmoil at OpenAI dominating the headlines.

One such startup is Stability AI, which has introduced Stable Video Diffusion, an AI model that creates videos by animating existing images. This model, an extension of Stability’s text-to-image model, is notable for being one of the few video-generating models available in open source or commercially.

However, Stable Video Diffusion is currently in a “research preview” stage, and users must agree to specified terms of use, outlining its intended applications (e.g., educational or creative tools, design, and other artistic processes) and non-intended uses (e.g., factual or true representations of people or events).

Given the historical trends of AI research previews, including those from Stability, there is concern that the model might be misused if it circulates on the dark web. The absence of a built-in content filter raises worries about potential abuse, similar to the misuse of Stable Diffusion for creating nonconsensual deepfake content.

Stable Video Diffusion comprises two models, SVD and SVD-XT. SVD transforms still images into 576×1024 videos with 14 frames, while SVD-XT increases the frames to 24. Both models can generate videos at a speed ranging from three to 30 frames per second.

The training process involved initially training the models on a dataset of millions of videos, followed by fine-tuning on a smaller set of hundreds of thousands to around a million clips. The origin of these videos is unclear, raising questions about potential copyright issues and ethical challenges.

Despite limitations such as the inability to generate videos without motion, control by text, render text legibly, or consistently generate faces and people correctly, the models produce high-quality four-second clips. Stability acknowledges these limitations and suggests possible adaptations, such as generating 360-degree views of objects.

Looking ahead, Stability plans to develop various models building on and extending SVD and SVD-XT, including a “text-to-video” tool for web-based text prompting. The ultimate goal is commercialization, with applications envisioned in advertising, education, entertainment, and beyond.

Stability AI, facing financial challenges and an executive hunt to boost sales, recently raised $25 million through a convertible note. Despite the funding, the startup’s valuation remains at $1 billion, with reports indicating a quest for a higher valuation in the coming months. The departure of key personnel, such as Ed Newton-Rex, adds to the challenges, with Newton-Rex citing disagreements about copyright and the ethical use of copyrighted data in training AI models as the reason for his departure.