Generate music, narrate audiobooks, clone voices, transcribe meetings, separate stems — from any AI agent, IDE, or terminal. One MCP endpoint. Two SDKs. One CLI. Auth that any modern agent already understands.
Works with Claude, GPT, Codex, Cursor, Continue, Cline, OpenClaw, Hermes, and any agent that can read a URL. Paste this into your agent and it onboards itself.
Read https://audiopod.ai/skill.md and follow the instructions to onboard yourself to AudioPod. After you finish, do whatever I ask next using AudioPod's tools.
AudioPod speaks Model Context Protocol over Streamable HTTP. Pick the one-line install for your CLI, or paste a snippet into a desktop client.
claude mcp add --transport http audiopod https://mcp.audiopod.ai \
--header "X-API-Key: ap_YOUR_KEY" --scope user{
"mcpServers": {
"audiopod": {
"url": "https://mcp.audiopod.ai",
"headers": {
"X-API-Key": "ap_YOUR_KEY"
}
}
}
}Need an API key? Create one in your dashboard. Free tier credits unlock the full tool surface — no card required.
Each skill is documented at /.well-known/agent-skills/ and can be invoked over MCP, REST, the SDKs, or the CLI.
Generate full songs, instrumentals, vocal stems, or rap tracks from a text prompt. Royalty-free.
Vocals, drums, bass, guitar, piano, other — up to 16 stems. Perfect for remixes and karaoke.
One CLI, ships with both SDKs. Same commands whether you pip install audiopod or npm i -g audiopod.
pip install audiopodAuth: audiopod login stores your API key at ~/.audiopod/config.json (or read AUDIOPOD_API_KEY).
Output: human-readable by default; pass --json for scripting.
Async jobs: the CLI streams progress and writes the final file when complete.
# Authenticate once
audiopod login
# Generate music from a prompt
audiopod music "lo-fi rainy 90 BPM" --duration 60 --out song.wav
# Voiceover with a public voice
audiopod tts "Welcome to AudioPod." --voice alloy --out hello.wav
# Transcribe a meeting with speaker labels
audiopod transcribe meeting.mp3 --diarize --format srt > meeting.srt
# Split a song into stems
audiopod stems track.wav --mode six
# Clone a voice from a 30-second sample
audiopod clone reference.wav --name "Narrator"
# Poll any async job
audiopod jobs job_abc123The Python and Node clients mirror each other call-for-call. Same shapes, same async ergonomics, same automatic credit reservations.
from audiopod import AudioPod
client = AudioPod() # reads AUDIOPOD_API_KEY
# Generate a song
job = client.music.generate(
prompt="lo-fi rainy 90 BPM",
duration=60,
)
song = job.wait()
song.download("song.wav")
# Synthesize speech
speech = client.tts.synthesize(
text="Welcome to AudioPod.",
voice="alloy",
)
speech.download("hello.wav")import AudioPod from "audiopod";
const client = new AudioPod(); // reads AUDIOPOD_API_KEY
// Generate a song
const job = await client.music.generate({
prompt: "lo-fi rainy 90 BPM",
duration: 60,
});
const song = await job.wait();
await song.download("song.wav");
// Synthesize speech
const speech = await client.tts.synthesize({
text: "Welcome to AudioPod.",
voice: "alloy",
});
await speech.download("hello.wav");Every discovery surface an agent might check is published. No bespoke handshakes, no closed schemas.
Free credits. No card. No vendor lock-in. Open standards from the first request.