Text to Music: Turn Words into Melodies

vmodel/text-to-music

Turn your words into expressive, AI-generated music and bring your stories, emotions, or ideas to life through sound.

Output: $0.15 / use or 6 uses / $1

Input

tags * string

Text prompts to guide music generation, e.g., 'epic,cinematic'

lyrics string

Lyrics for the music. Use [verse], [chorus], and [bridge] to separate different parts of the lyrics. Use [instrumental] or [inst] to generate instrumental music

duration int

Duration of the generated audio in seconds. -1 means a random duration between 30 and 240 seconds.

seed int

Random seed. Set to -1 to randomize.

number_of_steps int

Number of inference steps.

scheduler enum

Scheduler type.

guidance_type enum

Guidance type for CFG.

granularity_scale int

Omega scale for APG guidance, or similar for other CFG types.

guidance_interval float

Guidance interval.

guidance_interval_decay float

Guidance interval decay.

guidance_scale int

Overall guidance scale.

min_guidance_scale int

Minimum guidance scale.

tag_guidance_scale int

Guidance scale for tags (text prompt).

lyric_guidance_scale int

Guidance scale for lyrics.

Reset

Output

{
  "task_id": "db2x1fp43o3n10nuwb",
  "user_id": 1,
  "version": "280fc4f9ee507577f880a167f639c02622421d8fecf492454320311217b688f1",
  "error": null,
  "total_time": 6,
  "predict_time": 6,
  "logs": null,
  "output": [
    "https://vmodel.ai/data/model/vmodel/text-to-music/text_to_music_output.mp3"
  ],
  "status": "succeeded",
  "create_at": 1751596221,
  "completed_at": 1751596227,
  "input": {
    "seed": -1,
    "tags": "emotional,melancholic,piano,pop,male vocal,ballad,urban sadness",
    "lyrics": "[verse]\nI left your cup right where you placed it\nStill warm, like you never faced it\nThe end we both anticipated\nBut no one ever really says it\n\n[chorus]\nWe’re too good at acting like we’re fine\nEven goodbye sounds like a borrowed line\nYou said “don’t hold on too tight”\nI smiled and said, “I’ll try”\n\n[verse]\nYou packed your words with quiet grace\nI memorized your empty face\nTried not to beg, tried to give space\nBut silence fills this broken place\n\n[chorus]\nWe’re too good at acting like we’re fine\nEven goodbye sounds like a borrowed line\nYou said “don’t hold on too tight”\nI smiled and said, “I’ll try”\n\n[bridge]\nLove slowed down like falling rain\nThe slower it gets, the clearer the pain\nYou didn’t cry, you just turned away\nI didn’t stop you — I had nothing to say",
    "duration": 60,
    "scheduler": "euler",
    "guidance_type": "apg",
    "guidance_scale": 15,
    "number_of_steps": 60,
    "granularity_scale": 10,
    "guidance_interval": 0.5,
    "min_guidance_scale": 3,
    "tag_guidance_scale": 0,
    "lyric_guidance_scale": 0,
    "guidance_interval_decay": 0
  }
}

Generated in: 6 seconds

Download

Input

tags * string

Text prompts to guide music generation, e.g., 'epic,cinematic'

lyrics string

Lyrics for the music. Use [verse], [chorus], and [bridge] to separate different parts of the lyrics. Use [instrumental] or [inst] to generate instrumental music

duration int

Duration of the generated audio in seconds. -1 means a random duration between 30 and 240 seconds.

seed int

Random seed. Set to -1 to randomize.

number_of_steps int

Number of inference steps.

scheduler enum

Scheduler type.

guidance_type enum

Guidance type for CFG.

granularity_scale int

Omega scale for APG guidance, or similar for other CFG types.

guidance_interval float

Guidance interval.

guidance_interval_decay float

Guidance interval decay.

guidance_scale int

Overall guidance scale.

min_guidance_scale int

Minimum guidance scale.

tag_guidance_scale int

Guidance scale for tags (text prompt).

lyric_guidance_scale int

Guidance scale for lyrics.

Reset

Output

{
  "task_id": "db2x1fp43o3n10nuwb",
  "user_id": 1,
  "version": "280fc4f9ee507577f880a167f639c02622421d8fecf492454320311217b688f1",
  "error": null,
  "total_time": 6,
  "predict_time": 6,
  "logs": null,
  "output": [
    "https://vmodel.ai/data/model/vmodel/text-to-music/text_to_music_output.mp3"
  ],
  "status": "succeeded",
  "create_at": 1751596221,
  "completed_at": 1751596227,
  "input": {
    "seed": -1,
    "tags": "emotional,melancholic,piano,pop,male vocal,ballad,urban sadness",
    "lyrics": "[verse]\nI left your cup right where you placed it\nStill warm, like you never faced it\nThe end we both anticipated\nBut no one ever really says it\n\n[chorus]\nWe’re too good at acting like we’re fine\nEven goodbye sounds like a borrowed line\nYou said “don’t hold on too tight”\nI smiled and said, “I’ll try”\n\n[verse]\nYou packed your words with quiet grace\nI memorized your empty face\nTried not to beg, tried to give space\nBut silence fills this broken place\n\n[chorus]\nWe’re too good at acting like we’re fine\nEven goodbye sounds like a borrowed line\nYou said “don’t hold on too tight”\nI smiled and said, “I’ll try”\n\n[bridge]\nLove slowed down like falling rain\nThe slower it gets, the clearer the pain\nYou didn’t cry, you just turned away\nI didn’t stop you — I had nothing to say",
    "duration": 60,
    "scheduler": "euler",
    "guidance_type": "apg",
    "guidance_scale": 15,
    "number_of_steps": 60,
    "granularity_scale": 10,
    "guidance_interval": 0.5,
    "min_guidance_scale": 3,
    "tag_guidance_scale": 0,
    "lyric_guidance_scale": 0,
    "guidance_interval_decay": 0
  }
}

Generated in: 6 seconds

Download

HTTP Request

Run vmodel/text-to-music:280fc4f9ee507577f880a167f639c02622421d8fecf492454320311217b688f1 using Vmodel's HTTP API.

  curl -X POST https://api.vmodel.ai/api/tasks/v1/create
    -H "Authorization: Bearer $VModel_API_TOKEN"
    -H "Content-Type: application/json"
    -d '{
    "version": "280fc4f9ee507577f880a167f639c02622421d8fecf492454320311217b688f1",
    "input": {}
}'

Input Schema

The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.

tags

Type: string

Default value: -

Description: Text prompts to guide music generation, e.g., 'epic,cinematic'

lyrics

Type: string

Default value:

Description: Lyrics for the music. Use [verse], [chorus], and [bridge] to separate different parts of the lyrics. Use [instrumental] or [inst] to generate instrumental music

duration

Type: int

Default value: 60

Description: Duration of the generated audio in seconds. -1 means a random duration between 30 and 240 seconds.

Range: Min: 1 | Max: 240

seed

Type: int

Default value: -1

Description: Random seed. Set to -1 to randomize.

number_of_steps

Type: int

Default value: 60

Description: Number of inference steps.

Range: Min: 10 | Max: 200

scheduler

Type: enum

Default value: euler

Description: Scheduler type.

Choices: euler, heun

guidance_type

Type: enum

Default value: apg

Description: Guidance type for CFG.

Choices: apg, cfg, cfg_star

granularity_scale

Type: int

Default value: 10

Description: Omega scale for APG guidance, or similar for other CFG types.

Range: Min: -100 | Max: 100

guidance_interval

Type: float

Default value: 0.5

Description: Guidance interval.

Range: Min: 0 | Max: 1

guidance_interval_decay

Type: float

Default value: 0

Description: Guidance interval decay.

Range: Min: 0 | Max: 1

guidance_scale

Type: int

Default value: 15

Description: Overall guidance scale.

Range: Min: 0 | Max: 30

min_guidance_scale

Type: int

Default value: 3

Description: Minimum guidance scale.

Range: Min: 0 | Max: 100

tag_guidance_scale

Type: int

Default value: 0

Description: Guidance scale for tags (text prompt).

Range: Min: 0 | Max: 10

lyric_guidance_scale

Type: int

Default value: 0

Description: Guidance scale for lyrics.

Range: Min: 0 | Max: 10

Pricing

Model pricing for vmodel/text-to-music. Looking for volume pricing? Get in touch.

When

⚙ using this model

$0.1500

per use

or 6 uses for $1

Readme