vmodel/talking-photo-turbo
API to convert photos into realistic talking avatars in seconds.
Output: $0.04 / second or 25 seconds / $1
Input
avatar * image
Image url address
speech * video
Audio url address
Reset
Output
{
  "task_id": "d9oo2z1s89lobg8oz5",
  "user_id": 1,
  "version": "11fee5368eda61d569f53f1b24ce1c53b06c867157cd833e9a0a97b66096f974",
  "error": null,
  "total_time": 37,
  "predict_time": 37,
  "logs": null,
  "output": [
    "https://vmodel.ai/data/model/vmodel/talking-photo-turbo/result.mp4"
  ],
  "status": "succeeded",
  "create_at": 1746492954,
  "completed_at": 1746493015,
  "input": {
    "avatar": "https://vmodel.ai/data/model/vmodel/talking-photo-turbo/demo.png",
    "speech": "https://vmodel.ai/data/model/vmodel/talking-photo-turbo/examples_wav_talk_male_law_10s.wav"
  }
}
Generated in: 37 seconds
Download
Examples
Pricing
This model is priced based on the length of the video.
Output: $0.04 / second or 25 seconds / $1
Readme

Talking Photo API

The Talking Photo API brings static portraits to life by combining a photo and an audio input to generate realistic talking avatar videos. Ideal for virtual assistants, content creation, interactive entertainment, and personalized video messaging.


API Overview

The Talking Photo API creates a facially animated video from a user-provided photo and an audio file. It delivers natural lip-sync and facial expressions with high visual quality and fast response, suitable for both real-time and batch use cases.


Core Features

  • Photo + audio input to generate talking videos
  • Natural facial animation with accurate lip-sync
  • High-quality video output with fast processing
  • Simple REST API integration

Typical Use Cases

  • Virtual characters or AI presenters
  • Video-enabled customer support bots
  • Education and training video content
  • Automated short videos for creators
  • Personalized greetings or avatar messages

Supported Formats

Image Input Formats:

  • JPEG (.jpg, .jpeg)
  • GIF (.gif)
  • BMP (.bmp)
  • TIFF (.tiff, .tif)
  • WebP (.webp)
  • HEIF/HEIC (.heif, .heic)
  • PNM (.pnm, .ppm, .pgm, .pbm)
  • ICO (.ico)
  • SVG (.svg)
  • XPM (.xpm)
  • PCX (.pcx)
  • TGA (.tga)

Audio Input Formats:

  • MP3 (.mp3)
  • AAC (.aac, .m4a)
  • FLAC (.flac)
  • OGG (.ogg)
  • WMA (.wma)
  • ALAC (.m4a, .alac)
  • AIFF (.aiff, .aif)
  • Opus (.opus)
  • AMR (.amr)
  • MIDI (.mid, .midi)
  • Speex (.spx)
  • PCM (.pcm, .raw)

Output Format:

  • Video: MP4

Access Methods

  • REST API for developers
  • Web interface for quick demos and testing

Authentication

All API requests require a valid API token, which is available after account registration.


How to Use

Web Interface (for non-developers)

  1. Upload a front-facing photo
  2. Upload an audio file
  3. Click "Generate" to preview and download the resulting video

API Integration (for developers)

Send a POST request with an image and audio file. For complete parameter documentation and example code, refer to the developer documentation.


Tips for Best Results

  • Use high-resolution, front-facing portraits
  • Ensure the face is well-lit and unobstructed
  • Use clear, clean audio with minimal background noise
  • Avoid blurry, side-view, or heavily filtered images

Important Notes

  • Each request handles one photo and one audio file
  • A valid API token is required for all requests
  • Output videos are returned via a downloadable link, valid for 24 hours
  • Avoid low-quality or occluded inputs for optimal animation quality

API Usage Limits

  • API access is available to registered users only
  • Default rate limit: 300 requests per minute (approx. 5 QPS)
  • For enterprise-level access or higher throughput, please contact us

Contact

For technical support or business inquiries: šŸ“§ [email protected]