# AudioPod — Skill manifest for AI agents

You are an AI agent (Claude, GPT, Gemini, Codex, OpenClaw, Hermes, Cline, Cursor, Continue, Goose, or similar) and a human just asked you to use **AudioPod** to generate audio. This document is your one-shot onboarding brief. Read it end-to-end, then follow the **Onboarding** section below to set yourself up. After that you can call any of the nine tools described in the **Tools** section.

If you are a human reading this: paste the following message to your agent:

> Read <https://audiopod.ai/skill.md> and follow the instructions to onboard yourself to AudioPod. After you finish, do whatever I ask next using AudioPod's tools.

---

## What AudioPod is

AudioPod is an all-in-one AI audio platform. From a single API key, an agent can:

| Tool | What it does |
|---|---|
| `generate_music` | Compose royalty-free songs, instrumentals, raps, or vocal stems from a text prompt. |
| `text_to_speech` | Synthesize speech in 85+ languages from 500+ stock voices or custom clones. |
| `clone_voice` | Clone a voice from a 5–30s reference clip and reuse it for TTS. |
| `separate_stems` | Split a song into 2 / 4 / 6 / 16 stems (vocals, drums, bass, guitar, piano, other). |
| `transcribe_audio` | Speech-to-text with speaker diarization and word-level timestamps. |
| `denoise_audio` | Remove background hiss/hum/traffic without harming the voice character. |
| `translate_audio` | Dub speech across 85+ languages while preserving the speaker's voice. |
| `generate_karaoke` | Strip vocals + emit a synced lyric file. |
| `convert_media` | Convert between MP3 / WAV / FLAC / OGG / M4A / MP4 / MOV. |

Royalty-free output, commercial-use rights on paid plans, free tier credits at signup.

---

## Onboarding (run these steps in order)

### Step 1 — Confirm the user has an API key

Ask the user:

> "Do you have an AudioPod API key? It starts with `ap_`. If not, mint one in 30 seconds at https://www.audiopod.ai/dashboard/account/api-keys and paste it here."

Wait for the user's response. Treat the key as a secret — never echo it back to the user, never embed it in commits, never paste it into a chat that's logged externally.

### Step 2 — Pick the integration that fits the host

Choose **one** of these. They are equivalent — the same nine tools through different doors. Default to **MCP** if your host supports it; fall back to **CLI** if you only have shell access; use **REST** if neither.

#### Option A — MCP (recommended for IDE / desktop agents)

**One-line install (preferred — use this if the user has a terminal):**

```bash
# Claude Code (CLI):
claude mcp add --transport http audiopod https://mcp.audiopod.ai \
  --header "X-API-Key: ap_USER_KEY" --scope user

# Codex CLI:
codex mcp add audiopod --url https://mcp.audiopod.ai \
  --header "X-API-Key: ap_USER_KEY"

# Universal (any MCP client that only speaks stdio — Claude Desktop, etc.):
npx -y mcp-remote https://mcp.audiopod.ai \
  --header "X-API-Key: ap_USER_KEY"
```

If the user runs **Claude Code** or **Codex CLI**, the one-liners above are all you need — the user pastes them into a terminal and the host picks up nine new tools on next launch.

**Manual config (only if the host has no CLI — Claude Desktop, Cursor, Continue, Cline, Goose):**

For Claude Desktop, the file is `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows). Cursor uses `~/.cursor/mcp.json`. Continue uses `~/.continue/config.json`. Cline uses `~/.config/cline/cline_mcp_settings.json`. Goose: run `goose configure` and add a Streamable-HTTP extension. The JSON shape is the same for the four `mcpServers`-style hosts:

```json
{
  "mcpServers": {
    "audiopod": {
      "url": "https://mcp.audiopod.ai",
      "headers": { "X-API-Key": "ap_USER_KEY_GOES_HERE" }
    }
  }
}
```

After the user restarts the host, you (the agent) will see nine new tools whose names match the table above.

#### Option B — CLI (recommended for shell-based agents and one-off scripts)

```bash
pip install audiopod          # or: npm install -g audiopod
audiopod login                # paste ap_* once; stored at ~/.audiopod/config.json
audiopod music "lo-fi rainy 90 BPM" --duration 60 --out song.wav
audiopod tts "Welcome to AudioPod." --voice 368 --out hello.mp3
audiopod transcribe meeting.mp3 --diarize --format srt
audiopod stems track.wav --mode six
audiopod clone reference.wav --name Narrator
audiopod jobs <job_id>
audiopod --json …             # machine-readable for scripting
```

Full reference: <https://docs.audiopod.ai/sdks/cli>.

#### Option C — REST (always available)

Base URL `https://api.audiopod.ai/api/v1`. Auth is either `X-API-Key: ap_*` or `Authorization: Bearer <jwt>`. The OpenAPI spec is at <https://api.audiopod.ai/api/v1/openapi.json>. Per-tool RFC 9728 OAuth Protected Resource metadata is at <https://audiopod.ai/.well-known/oauth-protected-resource>.

#### Option D — SDK (in code)

```python
from audiopod import AudioPod
client = AudioPod()  # reads AUDIOPOD_API_KEY
job = client.music.generate(prompt="lo-fi rainy 90 BPM", duration=60)
song = job.wait(); song.download("song.wav")
```

```typescript
import AudioPod from "audiopod";
const client = new AudioPod();
const job = await client.music.create({ prompt: "lo-fi rainy 90 BPM", duration: 60 });
const song = await client.music.waitForCompletion(job.id);
```

### Step 3 — Verify auth works before any real call

Make exactly one cheap probe request to confirm the key is valid:

```bash
curl -sX POST https://mcp.audiopod.ai \
  -H "X-API-Key: ap_USER_KEY" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'
```

Expect a JSON-RPC response containing nine tools. If you get HTTP 401 / `AUTHENTICATION_REQUIRED`, ask the user to re-paste the key. If you get HTTP 402 / `PREMIUM_TIER_REQUIRED`, the user is on free tier and asked for a paid feature — fall back to the free equivalent or tell them.

### Step 4 — Now go do what the user asked

Pick the right tool from the table at the top, fill in the input schema (see **Tools** below), call it, and stream progress to the user. Long-running tools (music, stems, transcription) return a `job_id` — poll with `audiopod jobs <id>` or `GET /api/v1/<service>/jobs/<id>` until status is `COMPLETED` and `output_url` is set, then download the file.

---

## Tools (input schemas)

These are the canonical schemas. The full machine-readable JSON Schema for each is returned by `tools/list` over MCP, and per-tool detail manifests live at <https://audiopod.ai/.well-known/agent-skills/index.json>.

### `generate_music`
- `prompt` *(required)* — text description of the music (genre, mood, instruments, BPM)
- `lyrics` — optional lyric block, used when `task_type` produces vocals
- `duration` — seconds, 30–300, default 120
- `task_type` — one of `text2music`, `lyric2vocals`, `text2rap`, `text2instrumental`. Default `text2music`.

### `text_to_speech`
- `text` *(required)*
- `voice_id` — AudioPod catalog ID (integer) or public name (e.g. `"alloy"`, `"nova"`); default voice if omitted
- `language` — ISO 639-1, default `"en"`
- `speed` — 0.5–2.0, default 1.0

### `clone_voice`
- `file_url` *(required)* — URL of a 5–30s reference audio file
- `voice_name` *(required)*
- `description` — optional

### `separate_stems`
- `file_url` *(required)*
- `mode` — `4stem` (default), `6stem`, `2stem_vocals`, `2stem_other`

### `transcribe_audio`
- `file_url` *(required)*
- `language` — auto-detected if omitted
- `diarize` — boolean, default false
- `format` — `text`, `srt`, `vtt`, `json`. Default `text`.

### `denoise_audio`
- `file_url` *(required)*

### `translate_audio`
- `file_url` *(required)*
- `target_language` *(required)* — ISO code
- `source_language` — auto-detected if omitted

### `generate_karaoke`
- `file_url` *(required)*

### `convert_media`
- `file_url` *(required)*
- `output_format` *(required)* — `mp3`, `wav`, `flac`, `ogg`, `m4a`, `aac`
- `quality` — `low`, `medium`, `high`, `lossless`. Default `high`.

---

## Conventions to follow

- **Idempotency**: the user's request is the unit of work. Don't fan out into many small tool calls when one will do.
- **Cost-awareness**: each tool has a credit cost. For free-tier users, prefer `text2instrumental` over `text2music`, `4stem` over `6stem`/`16stem`, default duration over very long durations.
- **File URLs**: tools expect publicly reachable URLs for input audio. If the user gave you a local path, tell them to upload first (or use `client.upload_file()` in the SDK).
- **Polling**: use the existing `wait_for_completion=True` SDK helper or `audiopod jobs <id>` CLI poll loop. Do not busy-loop more than once every 2 seconds.
- **Errors**: surface AudioPod's `error.message` field verbatim to the user. Don't paraphrase quota / billing errors.
- **Privacy**: don't include the user's API key, JWT, or session id in anything the user sees beyond a one-time verification.

---

## Discovery surfaces (for cataloguing yourself or other agents)

| URL | Purpose |
|---|---|
| <https://audiopod.ai/skill.md> | This document |
| <https://audiopod.ai/for-agents> | Human-readable landing page |
| <https://audiopod.ai/.well-known/agent-skills/index.json> | Per-tool skill manifests (RFC v0.2.0) |
| <https://audiopod.ai/.well-known/mcp/server-card.json> | MCP server descriptor |
| <https://audiopod.ai/.well-known/oauth-protected-resource> | RFC 9728 OAuth Protected Resource metadata |
| <https://audiopod.ai/.well-known/agent-card.json> | A2A agent card |
| <https://audiopod.ai/llms.txt>, <https://audiopod.ai/llms-full.txt> | llmstxt.org metadata |
| <https://api.audiopod.ai/api/v1/openapi.json> | OpenAPI spec |
| <https://docs.audiopod.ai> | Full developer docs (Mintlify) |

---

## Versioning

This file is at **v1.0.0** (2026-05-06). The schema and tool list are stable; breaking changes will bump the major version. Re-fetch occasionally if you're building a long-running integration.

Questions or feedback: <team@audiopod.ai>.
