vmodel/wan-2.1-i2v-480p
Experience accelerated image-to-video generation with wan-2.1-i2v-480p. Powered by the Wan 2.1 14B model suite, delivering efficient 480p video synthesis for research and production use.
Output: $0.2 / use or 5 uses / $1
Input
prompt * string
Prompt for video generation
image * image
Input image to start generating from
num_frames int
Number of video frames. 81 frames give the best results
max_area enum
Maximum area of generated image. The input image will shrink to fit these dimensions
frames_per_second int
Frames per second. Note that the pricing of this model is based on the video duration at 16 fps
fast_mode enum
Speed up generation with different levels of acceleration. Faster modes may degrade quality somewhat. The speedup is dependent on the content, so different videos may see different speedups.
sample_steps int
Number of generation steps. Fewer steps means faster generation, at the expensive of output quality. 30 steps is sufficient for most prompts
sample_guide_scale int
Higher guide scale makes prompt adherence better, but can reduce variation
sample_shift int
Sample shift factor
lora_weights string
Load LoRA weights. Supports Replicate models in the format <owner>/<username> or <owner>/<username>/<version>, HuggingFace URLs in the format huggingface.co/<owner>/<model-name>, CivitAI URLs in the format civitai.com/models/<id>[/<model-name>], or arbitrary .safetensors URLs from the Internet. For example, 'fofr/flux-pixar-cars'
lora_scale int
Determines how strongly the main LoRA should be applied. Sane results between 0 and 1 for base inference. For go_fast we apply a 1.5x multiplier to this value; we've generally seen good performance when scaling the base value by that amount. You may still need to experiment to find the best value for your particular lora.
seed int
Random seed. Leave blank to randomize the seed
Reset
Output
{
  "task_id": "qaldrg3a9d9mfiw2tf",
  "user_id": 1,
  "version": "009719e7de9128f21878a3c96fe39663cc29c7d37103ca0b59f8a5d5b15ff73e",
  "error": null,
  "total_time": 30.8,
  "predict_time": 30.2,
  "logs": null,
  "output": [
    "https://vmodel.ai/data/model/vmodel/wan-2.1-i2v-480p/result.mp4"
  ],
  "status": "succeeded",
  "create_at": null,
  "input": {
    "prompt": "A woman is talking",
    "image": "https://vmodel.ai/data/model/vmodel/wan-2.1-i2v-480p/2.png",
    "max_area": "832x480",
    "fast_mode": "Balanced",
    "lora_scale": 1,
    "num_frames": 81,
    "sample_shift": 3,
    "sample_steps": 30,
    "frames_per_second": 16,
    "sample_guide_scale": 5
  }
}
Generated in: 30.2 seconds
Download
Examples
Pricing
This model is priced based on a single task.
Output: $0.2 / use or 5 uses / $1
Readme

wan2.1-i2v-14b-480p

Wan 2.1 Image-to-Video Model (480p Version) An open, high-performance video foundation model designed for efficient 480p image-to-video generation, powered by the Wan 2.1 14B architecture.


✨ Model Overview

wan2.1-i2v-14b-480p is part of the Wan 2.1 suite – a comprehensive and open family of large-scale generative video models. This variant is optimized for fast and efficient image-to-video synthesis at 480p resolution, balancing accessibility and quality.


🔧 Key Features

  • State-of-the-Art Performance Outperforms existing open-source and some commercial models in image-to-video generation benchmarks.

  • Optimized for Consumer-Grade GPUs Can run on devices with 8.19 GB VRAM. A 5-second 480p video can be generated in ~4 minutes on an RTX 4090, even without quantization or acceleration tricks.

  • Multi-Task Foundation Although this variant focuses on image-to-video (I2V), the Wan 2.1 family also supports:

    • Text-to-Video (T2V)
    • Video Editing
    • Text-to-Image
    • Video-to-Audio
  • Dual-Language Text Support Generates clear and readable Chinese and English text in videos, useful for subtitles, signs, and in-video captions.

  • Powerful Video VAE Backbone Backed by the Wan-VAE encoder/decoder, enabling temporal consistency and support for 1080p+ videos with minimal loss.


📦 Use Cases

  • AI-assisted video creation for storytelling and social media
  • Game and animation prototyping
  • Educational content generation
  • Visual marketing material automation
  • Creative tools and generative applications

⚙️ Specifications

Property Value
Model Name wan2.1-i2v-14b-480p
Resolution 480p
Input Type Static Image (PNG/JPG)
Output MP4 video (5s default)
VRAM Requirement ~8.2 GB
Inference Speed ~4 mins on RTX 4090
Language Support Chinese & English text gen

📌 Limitations

  • This model is optimized for 480p output only
  • Not suitable for real-time generation
  • For research and creative use only – commercial deployment requires license verification