New — MCP server, CLI, and SDKs in one launch

AudioPod for Agents

Generate music, narrate audiobooks, clone voices, transcribe meetings, separate stems — from any AI agent, IDE, or terminal. One MCP endpoint. Two SDKs. One CLI. Auth that any modern agent already understands.

Connect your agent

Use from CLI

$pip install audiopod && audiopod music "lo-fi rainy 90 BPM"

Send your AI agent to AudioPod

View skill.md

Works with Claude, GPT, Codex, Cursor, Continue, Cline, OpenClaw, Hermes, and any agent that can read a URL. Paste this into your agent and it onboards itself.

One-shot agent prompt

Read https://audiopod.ai/skill.md and follow the instructions to onboard yourself to AudioPod. After you finish, do whatever I ask next using AudioPod's tools.

1. Copy the line above.
2. Paste it into your AI agent's chat.
3. The agent fetches /skill.md and walks you through getting an API key + wiring up MCP, CLI, or SDK on its own.

Plug into any agent

AudioPod speaks Model Context Protocol over Streamable HTTP. Pick the one-line install for your CLI, or paste a snippet into a desktop client.

Recommended

One-line install

ships with Claude Code 2.x — adds the server to your user-level config

claude mcp add --transport http audiopod https://mcp.audiopod.ai \
  --header "X-API-Key: ap_YOUR_KEY" --scope user

Or paste a snippet into a desktop client

~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "audiopod": {
      "url": "https://mcp.audiopod.ai",
      "headers": {
        "X-API-Key": "ap_YOUR_KEY"
      }
    }
  }
}

Need an API key? Create one in your dashboard. Free tier credits unlock the full tool surface — no card required.

Nine tools, one endpoint

Each skill is documented at /.well-known/agent-skills/ and can be invoked over MCP, REST, the SDKs, or the CLI.

generate_music

Generate full songs, instrumentals, vocal stems, or rap tracks from a text prompt. Royalty-free.

Read skill

text_to_speech

85+ languages, 500+ voices, custom cloned voices. Stream or batch.

Catalog

clone_voice

5–30s reference clip → reusable voice with watermark + consent attestation.

Read skill

separate_stems

Vocals, drums, bass, guitar, piano, other — up to 16 stems. Perfect for remixes and karaoke.

Read skill

transcribe_audio

Word-level timestamps, speaker diarization, SRT/VTT/JSON output.

Read skill

denoise_audio

Remove hiss, hum, traffic, room tone — preserves voice character.

Try it

translate_audio

Dub speech across 85+ languages while preserving the original speaker's voice.

Try it

generate_karaoke

Vocal-stripped instrumental + synced lyric file in one call.

Try it

convert_media

Convert between MP3, WAV, FLAC, OGG, M4A, MP4, MOV. Adjust quality and bitrate.

Try it

From the terminal

One CLI, ships with both SDKs. Same commands whether you pip install audiopod or npm i -g audiopod.

pip install audiopod

Auth: audiopod login stores your API key at ~/.audiopod/config.json (or read AUDIOPOD_API_KEY).

Output: human-readable by default; pass --json for scripting.

Async jobs: the CLI streams progress and writes the final file when complete.

# Authenticate once
audiopod login

# Generate music from a prompt
audiopod music "lo-fi rainy 90 BPM" --duration 60 --out song.wav

# Voiceover with a public voice
audiopod tts "Welcome to AudioPod." --voice alloy --out hello.wav

# Transcribe a meeting with speaker labels
audiopod transcribe meeting.mp3 --diarize --format srt > meeting.srt

# Split a song into stems
audiopod stems track.wav --mode six

# Clone a voice from a 30-second sample
audiopod clone reference.wav --name "Narrator"

# Poll any async job
audiopod jobs job_abc123

Two SDKs, identical surface

The Python and Node clients mirror each other call-for-call. Same shapes, same async ergonomics, same automatic credit reservations.

Pythonpip install audiopod

from audiopod import AudioPod

client = AudioPod()  # reads AUDIOPOD_API_KEY

# Generate a song
job = client.music.generate(
    prompt="lo-fi rainy 90 BPM",
    duration=60,
)
song = job.wait()
song.download("song.wav")

# Synthesize speech
speech = client.tts.synthesize(
    text="Welcome to AudioPod.",
    voice="alloy",
)
speech.download("hello.wav")

TypeScriptnpm install audiopod

import AudioPod from "audiopod";

const client = new AudioPod(); // reads AUDIOPOD_API_KEY

// Generate a song
const job = await client.music.generate({
  prompt: "lo-fi rainy 90 BPM",
  duration: 60,
});
const song = await job.wait();
await song.download("song.wav");

// Synthesize speech
const speech = await client.tts.synthesize({
  text: "Welcome to AudioPod.",
  voice: "alloy",
});
await speech.download("hello.wav");

Open standards everywhere

Every discovery surface an agent might check is published. No bespoke handshakes, no closed schemas.

skill.md (one-shot agent onboarding)moltbook-style MCP Server Cardmodelcontextprotocol.io OAuth Protected ResourceRFC 9728 OAuth Authorization ServerRFC 8414 OpenID DiscoveryOpenID Connect Discovery 1.0 A2A Agent Carda2a-protocol.org Agent Skillsagentskills.io API CatalogRFC 9727 ai.txtCrawler metadata llms.txt / llms-full.txtllmstxt.org

Ship your first agent-driven audio in 60 seconds

Free credits. No card. No vendor lock-in. Open standards from the first request.

Get an API key Read the docs

AudioPod for Agents

$pip install audiopod && audiopod music "lo-fi rainy 90 BPM"

# Authenticate once audiopod login # Generate music from a prompt audiopod music "lo-fi rainy 90 BPM" --duration 60 --out song.wav # Voiceover with a public voice audiopod tts "Welcome to AudioPod." --voice alloy --out hello.wav # Transcribe a meeting with speaker labels audiopod transcribe meeting.mp3 --diarize --format srt > meeting.srt # Split a song into stems audiopod stems track.wav --mode six # Clone a voice from a 30-second sample audiopod clone reference.wav --name "Narrator" # Poll any async job audiopod jobs job_abc123

from audiopod import AudioPod client = AudioPod() # reads AUDIOPOD_API_KEY # Generate a song job = client.music.generate( prompt="lo-fi rainy 90 BPM", duration=60, ) song = job.wait() song.download("song.wav") # Synthesize speech speech = client.tts.synthesize( text="Welcome to AudioPod.", voice="alloy", ) speech.download("hello.wav")

import AudioPod from "audiopod"; const client = new AudioPod(); // reads AUDIOPOD_API_KEY // Generate a song const job = await client.music.generate({ prompt: "lo-fi rainy 90 BPM", duration: 60, }); const song = await job.wait(); await song.download("song.wav"); // Synthesize speech const speech = await client.tts.synthesize({ text: "Welcome to AudioPod.", voice: "alloy", }); await speech.download("hello.wav");