vmodel/talking-photo-sonic
The CVPR 2025 Sonic model enables photorealistic talking-face animation generation from a static portrait and corresponding audio input.
Output: $0.3 / second or 3 seconds / $1
Input
image * image
Input Image
input image
audio * audio
Input audio file (WAV, MP3, etc.) for the voice.
Audio File
dynamic_scale float
Controls movement intensity. Increase/decrease for more/less movement.
min_resolution int
Minimum image resolution for processing. Lower values use less memory but may reduce quality.
inference_steps int
Number of diffusion steps. Higher values may improve quality but take longer.
keep_resolution boolean
If true, output video matches the original image resolution. Otherwise uses the min_resolution after cropping.
seed int
Random seed for reproducible results. Leave blank for a random seed.
disable_safety_checker boolean
Note: The website version of this model always runs with safety checks enabled. For details,see VModel's platform safety guidelines..
Disable safety checker for generated images
Reset
Output
{
  "task_id": "d9zzvghifs95q8fkfd",
  "user_id": 1,
  "version": "c6d80220ce71d8df04d5dbf2b189b70b9f4937aea6a030de12cb46951b24d134",
  "error": null,
  "total_time": 300,
  "predict_time": 300,
  "logs": null,
  "output": [
    "https://vmodel.ai/data/model/vmodel/talking-photo-sonic/output.mp4"
  ],
  "status": "succeeded",
  "create_at": 1746492954,
  "completed_at": 1746493015,
  "input": {
    "audio": "https://raw.githubusercontent.com/jixiaozhong/Sonic/main/examples/wav/talk_female_english_10s.MP3",
    "image": "https://raw.githubusercontent.com/jixiaozhong/Sonic/main/examples/image/anime1.png",
    "dynamic_scale": 1,
    "min_resolution": 512,
    "inference_steps": 25,
    "keep_resolution": false,
    "disable_safety_checker": false
  }
}
Generated in: 300 seconds
Download
Examples
Pricing
Model pricing for vmodel/talking-photo-sonic. Looking for volume pricing? Get in touch.
When
using this model
$0.3000
per second of input video
or 3 seconds for $1
Readme

Loading...