{ "task_id": "qaldrg3a9d9mfiw2tf", "user_id": 1, "version": "009719e7de9128f21878a3c96fe39663cc29c7d37103ca0b59f8a5d5b15ff73e", "error": null, "total_time": 30.8, "predict_time": 30.2, "logs": null, "output": [ "https://vmodel.ai/data/model/vmodel/wan-2.1-i2v-480p/result.mp4" ], "status": "succeeded", "create_at": null, "input": { "prompt": "A woman is talking", "image": "https://vmodel.ai/data/model/vmodel/wan-2.1-i2v-480p/2.png", "max_area": "832x480", "fast_mode": "Balanced", "lora_scale": 1, "num_frames": 81, "sample_shift": 3, "sample_steps": 30, "frames_per_second": 16, "sample_guide_scale": 5 } }
Wan 2.1 Image-to-Video Model (480p Version) An open, high-performance video foundation model designed for efficient 480p image-to-video generation, powered by the Wan 2.1 14B architecture.
wan2.1-i2v-14b-480p
is part of the Wan 2.1 suite – a comprehensive and open family of large-scale generative video models. This variant is optimized for fast and efficient image-to-video synthesis at 480p resolution, balancing accessibility and quality.
State-of-the-Art Performance Outperforms existing open-source and some commercial models in image-to-video generation benchmarks.
Optimized for Consumer-Grade GPUs Can run on devices with 8.19 GB VRAM. A 5-second 480p video can be generated in ~4 minutes on an RTX 4090, even without quantization or acceleration tricks.
Multi-Task Foundation Although this variant focuses on image-to-video (I2V), the Wan 2.1 family also supports:
Dual-Language Text Support Generates clear and readable Chinese and English text in videos, useful for subtitles, signs, and in-video captions.
Powerful Video VAE Backbone Backed by the Wan-VAE encoder/decoder, enabling temporal consistency and support for 1080p+ videos with minimal loss.
Property | Value |
---|---|
Model Name | wan2.1-i2v-14b-480p |
Resolution | 480p |
Input Type | Static Image (PNG/JPG) |
Output | MP4 video (5s default) |
VRAM Requirement | ~8.2 GB |
Inference Speed | ~4 mins on RTX 4090 |
Language Support | Chinese & English text gen |