ace-step/ace-step-1.5

★ 10,312PythonAudience · generalComplexity · 3/5Setup · moderate

Mindmap

mindmap
  root((ACE-Step 1.5))
    How It Works
      Language model planner
      Diffusion Transformer audio
    Input Options
      Text description
      Reference audio
      Lyrics and metadata
    Editing Features
      Cover versions
      Section replacement
      Add layers
      LoRA fine-tuning
    Setup
      Python and uv
      4GB GPU minimum
      Windows and Mac packages

mindmap root((ACE-Step 1.5)) How It Works Language model planner Diffusion Transformer audio Input Options Text description Reference audio Lyrics and metadata Editing Features Cover versions Section replacement Add layers LoRA fine-tuning Setup Python and uv 4GB GPU minimum Windows and Mac packages

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Generate an original music track in any genre and mood by typing a plain-language description.

USE CASE 2

Create a cover version of an existing song by providing the original audio as a reference input.

USE CASE 3

Fine-tune the model on a small collection of your own songs to generate music that matches your personal style.

USE CASE 4

Automatically generate background music for a vocal recording you already have.

Tech stack

PythonuvDiffusion TransformerLoRA

Getting it running

Difficulty · moderate Time to first run · 30min

Requires a GPU with at least 4GB VRAM, no CPU-only fallback is available.

In plain English

ACE-Step 1.5 is a locally-running music generation model that turns text descriptions and optional reference audio into complete music tracks. You describe what you want in plain language, and the model produces an audio file. It is designed to run on a personal computer rather than in the cloud, with a minimum of around 4GB of graphics card memory required and better results with more. The system uses two parts working together. A language model acts as a planner: it reads your description and expands it into a detailed blueprint covering song structure, lyrics, tempo, key, and style. Then a second component called a Diffusion Transformer takes that blueprint and generates the actual audio. This two-stage approach means you can give a simple description like a genre and mood and get a full structured song in return, or you can control the details precisely with lyrics and metadata. Generation is fast for an open-source model. A full song takes under two seconds on high-end server hardware and under ten seconds on a consumer gaming GPU. Tracks can range from ten seconds to ten minutes long. The model supports lyrics in over 50 languages. Beyond basic generation, the README describes a range of editing features. You can generate a cover version of an existing song, edit specific sections of an audio file, add layers to a track, or automatically create background music for a vocal recording. A LoRA training feature lets you fine-tune the model's style on a small collection of your own songs, which the README says takes about an hour on a mid-range GPU. Installation uses a package manager called uv. After cloning the repository and running a sync command, a web interface opens at a local address. Portable pre-built packages for Windows and macOS are also available. A free online demo exists at the project's website for those who do not want to install anything locally.

Copy-paste prompts

Prompt 1

I have ACE-Step 1.5 installed locally. Write me a detailed generation prompt for a cinematic orchestral track that builds tension and resolves into a triumphant finale.

Prompt 2

Help me write a Python script that calls ACE-Step 1.5 in a loop to generate 5 music tracks from different genre descriptions and save them as numbered WAV files.

Prompt 3

I want to fine-tune ACE-Step 1.5 with LoRA on 20 of my original songs. Walk me through the training config, recommended learning rate, and how to check if the style transfer is working.

Prompt 4

Show me how to use ACE-Step 1.5's section-editing feature to replace only the chorus of an existing MP3 file with newly generated audio while keeping the verses.

Open on GitHub → Explain another repo

← ace-step on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.