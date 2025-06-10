Skywork Unveils SkyReels-V2: Open-Source AI Video Model Delivering Unlimited-Length Generation

In Brief Skywork’s SkyReels-V2 open-source AI video model enables unlimited-length video generation via a browser, supporting diverse applications like story creation and multi-subject video synthesis.

Platform specializing in AI workplace agents, Skywork announced that its AI video creation tool, Skyreels, has introduced SkyReels-V2, an open-source AI video model capable of generating videos of unlimited length directly from a web browser at no cost. The model’s weights and inference code are now publicly available on GitHub. SkyReels-V2 employs a Diffusion Forcing framework that integrates Multi-modal Large Language Models (MLLM), multi-stage pretraining, reinforcement learning, and diffusion forcing techniques to optimize performance comprehensively. This model supports a variety of practical applications, including story generation, image-to-video synthesis, camera direction, and consistent multi-subject video creation through the Skyreels-A2 system.

The Diffusion Forcing framework enables the generation of videos of infinite duration. SkyReels-V2 supports both text-to-video (T2V) and image-to-video (I2V) generation tasks, and it is capable of running inference in both synchronous and asynchronous modes, with example scripts demonstrating long video generation available.

A notable component of SkyReels-V2 is SkyCaptioner-V1, a video captioning model designed for data annotation. This model is trained on caption results from the base Qwen2.5-VL-72B-Instruct model and additional sub-expert captioners using a carefully curated dataset of approximately two million balanced videos to ensure annotation quality and conceptual balance.

SkyCaptioner-V1, which builds on the Qwen2.5-VL-7B-Instruct foundation, is fine-tuned for improved domain-specific video captioning performance. Evaluations using a test set of 1,000 samples indicate that SkyCaptioner-V1 achieves higher average accuracy than state-of-the-art baseline models, particularly excelling in shot-related fields.

Building on prior successes with large language models, the developers focused on enhancing generative video quality through reinforcement learning, addressing identified limitations such as difficulties with large, deformable motions and occasional physical inconsistencies in generated videos.

In order to improve performance, two sequential stages of supervised fine-tuning (SFT) were implemented at 540p and 720p resolutions, respectively. The initial SFT phase took place immediately after pretraining and before the reinforcement learning stage. This first SFT stage acts as a conceptual equilibrium trainer, refining the foundation model’s pretraining outcomes, which used only 24 frames per second (fps) video data, and simplifying the architecture by removing FPS embedding components.

What Is SkyReels?

SkyReels is a video creation platform powered by artificial intelligence that allows users to produce short films, animations, and videos by combining text prompts, images, and audio inputs. The platform provides a wide range of features including AI-generated characters, tools for storyboarding, lip-syncing capabilities, music composition, and video editing, all designed to streamline the content creation process. It also includes advanced AI models such as SkyReels-V1 and SkyReels-V2.

SkyReels-V1 is an open-source video foundation model focused on human-centered video production for short dramas, supporting both text-to-video and image-to-video generation while accurately rendering subtle facial expressions and delivering cinematic-quality visuals.

