{ "task_id": "d9zzvghifs95q8fkfd", "user_id": 1, "version": "cf50350d63bbe4178e97bd144aaae86167255ac1d33b09a0662fb9c195ad6f55", "error": null, "total_time": 300, "predict_time": 300, "logs": null, "output": [ "https://vmodel.ai/data/model/vmodel/talking-photo-sonic/output.mp4" ], "status": "succeeded", "create_at": 1746492954, "completed_at": 1746493015, "input": { "audio": "https://raw.githubusercontent.com/jixiaozhong/Sonic/main/examples/wav/talk_female_english_10s.MP3", "image": "https://raw.githubusercontent.com/jixiaozhong/Sonic/main/examples/image/anime1.png", "dynamic_scale": 1, "min_resolution": 512, "inference_steps": 25, "keep_resolution": false } }
Sonic is an innovative audio-driven portrait animation model that goes beyond traditional lip-sync techniques. By leveraging global audio features—such as tone, rhythm, and emotional cues—it generates natural and expressive facial animations, including subtle head movements. This helps avoid the stiff, “puppet-like” appearance often seen in older methods, resulting in more lifelike and engaging visuals.
Sonic integrates several advanced technologies: