VModel/whisper
Convert speech in audio to text.
Output: $0.01 / use or 100 uses / $1
Input
audio * audio
The audio file to transcribe
Audio File
transcription enum
The format of the transcription output. Default: plain text
translate boolean
Whether to translate the speech to English. Default: false
language enum
Language spoken in the audio, specify 'auto' for automatic language detection
temperature float
temperature to use for sampling. Default 0
patience float
optional patience value to use in beam decoding, as in https://arxiv.org/abs/2204.05424, the default (1.0) is equivalent to conventional beam search.
suppress_tokens string
comma-separated list of token ids to suppress during sampling; '-1' will suppress most special characters except common punctuations. Default: -1
initial_prompt string
optional text to provide as a prompt for the first window.
condition_on_previous_text boolean
if True, provide the previous output of the model as a prompt for the next window; disabling may make the text inconsistent across windows, but the model becomes less prone to getting stuck in a failure loop. Default: true
temperature_increment_on_fallback float
temperature to increase when falling back when the decoding fails to meet either of the thresholds below. Default: 0.2
compression_ratio_threshold float
if the gzip compression ratio is higher than this value, treat the decoding as failed. Default: 2.4
logprob_threshold float
if the average log probability is lower than this value, treat the decoding as failed. Defaults to -1.
no_speech_threshold float
if the probability of the <|nospeech|> token is higher than this value AND the decoding has failed due to `logprob_threshold`, consider the segment as silence. Default 0.6
Reset
Output
{
  "task_id": "d9zzvghifs95q8fkfd",
  "user_id": 1,
  "version": "8099696689d249cf8b122d833c36ac3f75505c666a395ca40ef26f68e7d3d16e",
  "error": null,
  "total_time": 6.41,
  "predict_time": 6.41,
  "logs": null,
  "output": [
    "{\"detected_language\":\"chinese\",\"segments\":[{\"avg_logprob\":-0.13921591999766592,\"compression_ratio\":1.1081081081081081,\"end\":10.1,\"id\":0,\"no_speech_prob\":0.08148325234651566,\"seek\":0,\"start\":0,\"temperature\":0,\"text\":\"宝贝,欢迎收听凯书365页,也感谢你关注凯书讲故事的微信公众账号和APP软件。\",\"tokens\":[50365,2415,251,18464,251,11,28566,17699,18681,31022,6336,107,2930,99,11309,20,10178,113,11,6404,9709,11340,2166,28053,26432,6336,107,2930,99,39255,43045,6973,1546,39152,17665,13545,7384,245,18464,99,26987,12565,8749,17819,107,20485,1543,50870]},{\"avg_logprob\":-0.13921591999766592,\"compression_ratio\":1.1081081081081081,\"end\":22.400000000000002,\"id\":1,\"no_speech_prob\":0.08148325234651566,\"seek\":0,\"start\":11.040000000000001,\"temperature\":0,\"text\":\"今天凯书要给你讲一个成语故事,叫做《悲公蛇影》,这个故事发生在东汉年间。\",\"tokens\":[50917,12074,6336,107,2930,99,4275,23197,2166,39255,20182,11336,5233,255,43045,6973,11,19855,10907,9806,14696,110,13545,26145,229,16820,9782,11,15368,43045,6973,28926,8244,3581,38409,12800,231,5157,31685,1543,51485]},{\"avg_logprob\":-0.10092328843616304,\"compression_ratio\":0.8070175438596491,\"end\":29.06,\"id\":2,\"no_speech_prob\":0.13098260760307312,\"seek\":2240,\"start\":22.4,\"temperature\":0,\"text\":\"话说这是一年盛夏,天气燥热得很。\",\"tokens\":[50365,21596,8090,27455,2257,5157,5419,249,42708,11,6135,42204,24184,98,23661,255,5916,4563,1543,50698]}],\"transcription\":\"宝贝,欢迎收听凯书365页,也感谢你关注凯书讲故事的微信公众账号和APP软件。今天凯书要给你讲一个成语故事,叫做《悲公蛇影》,这个故事发生在东汉年间。话说这是一年盛夏,天气燥热得很。\",\"translation\":null}"
  ],
  "status": "succeeded",
  "create_at": 1746492954,
  "completed_at": 1746493015,
  "input": {
    "seed": 0,
    "audio": "https://vmodel.ai/data/dev/model/vmodel/whisper/007_output_01.mp3",
    "model": "large-v3",
    "transcription": "plain text",
    "translate": false,
    "language": "auto",
    "temperature": 0,
    "suppress_tokens": "-1",
    "initial_prompt": "",
    "condition_on_previous_text": true,
    "temperature_increment_on_fallback": 0.2,
    "compression_ratio_threshold": 2.4,
    "logprob_threshold": -1,
    "no_speech_threshold": 0.6
  }
}
Generated in: 6.41 seconds
Pricing
Model pricing for vmodel/whisper. Looking for volume pricing? Get in touch.
When
using this model
$0.0100
per use
or 100 uses for $1
Readme

Loading...