Claim this agent
Agent DossierCLAWHUBSafety 84/100

Xpersona Agent

Elevenlabs Tts

ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Skill: Elevenlabs Tts Owner: Shaharsha Summary: ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Tags: ai-voice:2.1.0, audio:2.1.0, elevenlabs:2.1.0, elevenlabs-tts:1.3.2, hebrew:2.1.0, latest:2.2.0, multilingual:2.1.0, nikud:2.1.0, openclaw:1.3.2, podcast:1.2.1, singing:2.1.0, speech:2.1.0, text-to-speech:

4.5K downloadsTrust evidence available
clawhub skill install kn77700wny92h2kvpav2am1yjx80ewfp:elevenlabs-tts

Overall rank

#62

Adoption

4.5K downloads

Trust

Unknown

Freshness

Feb 28, 2026

Freshness

Last checked Feb 28, 2026

Best For

Elevenlabs Tts is best for general automation workflows where documented compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

editorial-content, CLAWHUB, runtime-metrics, public facts pack

Overview

Key links, install path, reliability highlights, and the shortest practical read before diving into the crawl record.

Verifiededitorial-content

Overview

Executive Summary

ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Skill: Elevenlabs Tts Owner: Shaharsha Summary: ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Tags: ai-voice:2.1.0, audio:2.1.0, elevenlabs:2.1.0, elevenlabs-tts:1.3.2, hebrew:2.1.0, latest:2.2.0, multilingual:2.1.0, nikud:2.1.0, openclaw:1.3.2, podcast:1.2.1, singing:2.1.0, speech:2.1.0, text-to-speech: Capability contract not published. No trust telemetry is available yet. 4.5K downloads reported by the source. Last updated 4/15/2026.

No verified compatibility signals4.5K downloads

Trust score

Unknown

Compatibility

Profile only

Freshness

Feb 28, 2026

Vendor

Clawhub

Artifacts

0

Benchmarks

0

Last release

2.2.0

Install & run

Setup Snapshot

clawhub skill install kn77700wny92h2kvpav2am1yjx80ewfp:elevenlabs-tts
  1. 1

    Setup complexity is classified as HIGH. You must provision dedicated cloud infrastructure or an isolated VM. Do not run this directly on your local workstation.

  2. 2

    Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence & Timeline

Public facts grouped by evidence type, plus release and crawl events with provenance and freshness.

Verifiededitorial-content

Public facts

Evidence Ledger

Vendor (1)

Vendor

Clawhub

profilemedium
Observed Apr 15, 2026Source linkProvenance
Release (1)

Latest release

2.2.0

releasemedium
Observed Feb 14, 2026Source linkProvenance
Adoption (1)

Adoption signal

4.5K downloads

profilemedium
Observed Apr 15, 2026Source linkProvenance
Security (1)

Handshake status

UNKNOWN

trustmedium
Observed unknownSource linkProvenance

Artifacts & Docs

Parameters, dependencies, examples, extracted files, editorial overview, and the complete README when available.

Self-declaredCLAWHUB

Captured outputs

Artifacts Archive

Extracted files

3

Examples

6

Snippets

0

Languages

Unknown

Executable Examples

text

[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!

text

[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!

text

[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.

text

[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?

text

[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.

json

{
  "messages": {
    "tts": {
      "provider": "elevenlabs",
      "elevenlabs": {
        "apiKey": "sk_your_api_key_here",
        "voiceId": "pNInz6obpgDQGcFmaJgB",
        "modelId": "eleven_v3",
        "languageCode": "en",
        "voiceSettings": {
          "stability": 0.5,
          "similarityBoost": 0.75,
          "style": 0,
          "useSpeakerBoost": true,
          "speed": 1
        }
      }
    }
  }
}
Extracted Files

SKILL.md

---
name: elevenlabs-tts
description: ElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp, ElevenLabs multilingual support. Generate realistic AI voices using ElevenLabs API.
tags: [elevenlabs, tts, voice, text-to-speech, audio, speech, whatsapp, multilingual, ai-voice]
metadata: {"clawdbot":{"emoji":"🎙️","requires":{"env":["ELEVENLABS_API_KEY"],"system":["ffmpeg"]},"primaryEnv":"ELEVENLABS_API_KEY"}}
allowed-tools: [exec, tts, message]
---

# ElevenLabs TTS (Text-to-Speech)

Generate expressive voice messages using ElevenLabs v3 with audio tags.

## Prerequisites

- **ElevenLabs API Key** (`ELEVENLABS_API_KEY`): Required. Get one at [elevenlabs.io](https://elevenlabs.io) → Profile → API Keys. Configure in `openclaw.json` under `messages.tts.elevenlabs.apiKey`.
- **ffmpeg**: Required for audio format conversion (MP3 → Opus for WhatsApp compatibility). Must be installed and available on PATH.

## Quick Start Examples

**Storytelling (emotional journey):**
```
[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!
```

**Horror/Suspense (building dread):**
```
[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!
```

**Conversation with reactions:**
```
[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.
```

**Hebrew (romantic moment):**
```
[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?
```

**Spanish (celebration to reflection):**
```
[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.
```

## Configuration (OpenClaw)

In `openclaw.json`, configure TTS under `messages.tts`:

```json
{
  "messages": {
    "tts": {
      "provider": "elevenlabs",
      "elevenlabs": {
        "apiKey": "sk_your_api_key_here",
        "voiceId": "pNInz6obpgDQGcFmaJgB",
        "modelId": "eleven_v3",
        "languageCode": "en",
        "voiceSettings": {
          "stability": 0.5,
          "similarityBoost": 0.75,
          "style": 0,
          "useSpeakerBoost": true,
          "speed": 1
        }
      }
    }
  }
}
```

**Getting your API Key:**
1. Go to https://elevenlabs.io
2. Sign up/login
3. Click profile → API Keys
4. Copy your key

## Recommended Voices for v3

These premade voices are optimized for v3 and wo

_meta.json

{
  "ownerId": "kn77700wny92h2kvpav2am1yjx80ewfp",
  "slug": "elevenlabs-tts",
  "version": "2.2.0",
  "publishedAt": 1771087774137
}

references/audio-tags.md

# Audio Tags Reference

Complete guide to ElevenLabs v3 audio tags.

## Prerequisites

- **Model**: `eleven_v3` (alpha) - ONLY this model supports audio tags
- **Voice Type**: IVC (Instant Voice Clone) or designed voices - PVC not optimized yet
- **Prompt Length**: 250+ characters for consistent results
- **Stability**: Creative or Natural mode (Robust reduces tag responsiveness)

## Core Principle

Write NATURAL sentences that tags modify, NOT explanations.

❌ WRONG: `[excited] אני מתרגש!`
✅ RIGHT: `[excited] זה ממש מדהים מה שעשינו היום!`

---

## Tag Categories

### Emotions (High Reliability)

| Tag | Description |
|-----|-------------|
| `[excited]` | Energy, enthusiasm |
| `[happy]` | Joy, cheerfulness |
| `[happily]` | Speaking with happiness |
| `[sad]` | Sadness, melancholy |
| `[sorrowful]` | Deep sadness |
| `[angry]` | Anger, intensity |
| `[curious]` | Curiosity, interest |
| `[nervous]` | Nervousness, anxiety |
| `[sarcastic]` | Sarcasm, irony |
| `[tired]` | Fatigue, weariness |
| `[serious]` | Seriousness |
| `[confident]` | Confidence |
| `[frustrated]` | Frustration |
| `[mischievous]` | Playful mischief |
| `[awe]` | Wonder, amazement |
| `[resigned]` | Acceptance, giving up |
| `[flustered]` | Confused embarrassment |
| `[casual]` | Relaxed, informal |
| `[annoyed]` | Irritation |

### Delivery & Volume (High Reliability)

| Tag | Description |
|-----|-------------|
| `[whispers]` | Quiet, intimate |
| `[shouts]` | Loud, intense |
| `[dramatic tone]` | Theatrical |
| `[dramatic]` | Dramatic delivery |
| `[matter-of-fact]` | Plain, factual |
| `[whiny]` | Complaining tone |
| `[flatly]` | No emotion |
| `[quietly]` | Soft voice |
| `[suspiciously]` | Suspicious tone |

### Pacing & Timing (High Reliability)

| Tag | Description |
|-----|-------------|
| `[pause]` | Brief silence |
| `[breathes]` | Breathing sound |
| `[continues after a beat]` | Pause then continue |
| `[rushed]` | Fast, urgent |
| `[slows down]` | Decreasing speed |
| `[deliberate]` | Careful, intentional |
| `[rapid-fire]` | Very fast |
| `[drawn out]` | Stretched, slow |
| `[stammers]` | Stuttering |
| `[hesitates]` | Uncertainty |
| `[timidly]` | Shy, tentative |
| `[repeats]` | Repetition |

### Emphasis (Medium Reliability)

| Tag | Description |
|-----|-------------|
| `[emphasized]` | Strong emphasis |
| `[stress on next word]` | Emphasize following word |
| `[understated]` | Downplayed delivery |

### Reactions & Sounds (Very High Reliability)

| Tag | Description |
|-----|-------------|
| `[laughs]` | Laughter |
| `[laughs softly]` | Gentle laugh |
| `[laughs harder]` | Increasing laughter |
| `[starts laughing]` | Beginning to laugh |
| `[nervous laugh]` | Anxious laughter |
| `[giggles]` | Small laugh |
| `[wheezing]` | Breathless laugh |
| `[sighs]` | Exhale of emotion |
| `[sigh]` | Single sigh |
| `[gasps]` | Sharp intake |
| `[exhales]` | Breathing out |
| `[clears throat]` | Throat clearing |
| `[gulps]` | Swallowing |
| `[swallows]` | Swallowin

Editorial read

Docs & README

Docs source

CLAWHUB

Editorial quality

ready

ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Skill: Elevenlabs Tts Owner: Shaharsha Summary: ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Tags: ai-voice:2.1.0, audio:2.1.0, elevenlabs:2.1.0, elevenlabs-tts:1.3.2, hebrew:2.1.0, latest:2.2.0, multilingual:2.1.0, nikud:2.1.0, openclaw:1.3.2, podcast:1.2.1, singing:2.1.0, speech:2.1.0, text-to-speech:

Full README

Skill: Elevenlabs Tts

Owner: Shaharsha

Summary: ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7...

Tags: ai-voice:2.1.0, audio:2.1.0, elevenlabs:2.1.0, elevenlabs-tts:1.3.2, hebrew:2.1.0, latest:2.2.0, multilingual:2.1.0, nikud:2.1.0, openclaw:1.3.2, podcast:1.2.1, singing:2.1.0, speech:2.1.0, text-to-speech:2.1.0, tts:2.1.0, voice:2.1.0, whatsapp:2.1.0

Version history:

v2.2.0 | 2026-02-14T16:49:34.137Z | user

Security scan fixes

v2.1.0 | 2026-02-09T13:46:48.695Z | user

Comprehensive Hebrew nikud guide: dagesh (B/V, K/Kh, P/F), gender suffixes, homographs, stress placement, foreign names. Clear principle: only nikud where ambiguity exists.

v2.0.0 | 2026-02-09T13:35:12.104Z | user

Major polish: improved description for discoverability, added allowed-tools declaration, fixed stability value in troubleshooting (0.0 not 0.5), selective nikud in Hebrew example, security-clean SKILL.md with lib/audio_convert.py wrapper.

v1.6.0 | 2026-02-09T13:33:27.653Z | user

Security: moved all ffmpeg shell commands into lib/audio_convert.py wrapper script. SKILL.md no longer contains raw bash commands. Added convert and concat CLI utilities.

v1.5.0 | 2026-02-09T13:27:37.042Z | user

Fixed stability values (v3 only accepts 0.0/0.5/1.0). Added singing guide with correct format ([singing] on own line). Updated audio-tags reference with singing tips and limitations.

v1.4.0 | 2026-02-09T12:52:07.278Z | user

Added Hebrew nikud (vowel points) support for accurate pronunciation. Updated Hebrew example with full nikud. Changed config example to use placeholder voiceId.

v1.3.2 | 2026-02-04T12:14:22.141Z | auto

  • Updated the skill description in SKILL.md for improved clarity and focus on key ElevenLabs/OpenClaw features.
  • Revised keywords in tags for better search relevance.
  • No changes to code or functionality—documentation change only.

v1.3.1 | 2026-02-04T12:13:02.242Z | auto

Version 1.3.1

  • No file changes detected in this release.
  • Documentation and usage instructions remain the same.
  • No new features, bug fixes, or updates in this version.

v1.2.9 | 2026-02-04T12:12:26.571Z | auto

elevenlabs-tts 1.2.9

  • Updated SKILL.md with improved and concise description, adding tags for discoverability.
  • Enhanced feature summary to emphasize WhatsApp, multilingual support, and OpenClaw integration.
  • No code changes; documentation only.

v1.2.8 | 2026-02-03T22:09:57.449Z | user

Added: WhatsApp transcribe button only works with Opus format

v1.2.7 | 2026-02-03T22:07:12.207Z | user

Complete WhatsApp workflow: generate→convert to Opus→send. MP3 fails on Android, Opus works everywhere.

v1.2.6 | 2026-02-03T22:05:53.551Z | user

Added WhatsApp sending instructions (message tool with asVoice), audio cutoff fix (add pause at end)

v1.2.5 | 2026-02-03T22:02:39.924Z | user

Improved Quick Start examples with more audio tags demonstrating emotional transitions

v1.3.0 | 2026-02-03T21:59:24.517Z | user

Major update: Added comprehensive best practices for natural-sounding audio tags - how many to use, where to place them, context tips, regeneration strategies, punctuation effects, and updated examples with emotional progressions

v1.2.4 | 2026-02-03T21:56:25.048Z | user

Updated examples to show multiple audio tags per message

v1.2.3 | 2026-02-03T18:08:09.266Z | user

Fixed display name

v1.2.2 | 2026-02-03T18:07:40.253Z | user

Added (Text-to-Speech) to title for clarity

v1.2.1 | 2026-02-03T18:03:39.803Z | user

Added TTS explanation (Text-to-Speech) in description for clarity

v1.2.0 | 2026-02-03T17:54:31.286Z | user

Added: 3 language examples (EN/HE/ES), OpenClaw config guide, 5 recommended voice IDs with table, voice selection tips, how to get API key

v1.1.0 | 2026-02-03T17:50:29.026Z | user

Major update: Added multi-speaker dialogue, 50+ new tags, stability modes guide, speed control, punctuation effects, fixed API limit (10K not 5K), 70+ languages support

v1.0.1 | 2026-02-03T17:49:17.446Z | user

Added references/audio-tags.md

v1.0.0 | 2026-02-03T17:46:32.654Z | auto

  • Initial release of ElevenLabs-TTS with integrated audio tag support for expressive voice synthesis.
  • Supports creation of voice messages, podcasts, audiobooks, and other spoken content with emotional expression.
  • Handles WhatsApp voice compatibility, including guidance on Opus conversion.
  • Provides instructions for segmenting and concatenating long-form audio.
  • Includes quick reference for critical audio tags and troubleshooting tips.

Archive index:

Archive v2.2.0: 3 files, 8483 bytes

Files: references/audio-tags.md (6750b), SKILL.md (10531b), _meta.json (133b)

File v2.2.0:SKILL.md


name: elevenlabs-tts description: ElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp, ElevenLabs multilingual support. Generate realistic AI voices using ElevenLabs API. tags: [elevenlabs, tts, voice, text-to-speech, audio, speech, whatsapp, multilingual, ai-voice] metadata: {"clawdbot":{"emoji":"🎙️","requires":{"env":["ELEVENLABS_API_KEY"],"system":["ffmpeg"]},"primaryEnv":"ELEVENLABS_API_KEY"}} allowed-tools: [exec, tts, message]

ElevenLabs TTS (Text-to-Speech)

Generate expressive voice messages using ElevenLabs v3 with audio tags.

Prerequisites

  • ElevenLabs API Key (ELEVENLABS_API_KEY): Required. Get one at elevenlabs.io → Profile → API Keys. Configure in openclaw.json under messages.tts.elevenlabs.apiKey.
  • ffmpeg: Required for audio format conversion (MP3 → Opus for WhatsApp compatibility). Must be installed and available on PATH.

Quick Start Examples

Storytelling (emotional journey):

[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!

Horror/Suspense (building dread):

[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!

Conversation with reactions:

[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.

Hebrew (romantic moment):

[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?

Spanish (celebration to reflection):

[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.

Configuration (OpenClaw)

In openclaw.json, configure TTS under messages.tts:

{
  "messages": {
    "tts": {
      "provider": "elevenlabs",
      "elevenlabs": {
        "apiKey": "sk_your_api_key_here",
        "voiceId": "pNInz6obpgDQGcFmaJgB",
        "modelId": "eleven_v3",
        "languageCode": "en",
        "voiceSettings": {
          "stability": 0.5,
          "similarityBoost": 0.75,
          "style": 0,
          "useSpeakerBoost": true,
          "speed": 1
        }
      }
    }
  }
}

Getting your API Key:

  1. Go to https://elevenlabs.io
  2. Sign up/login
  3. Click profile → API Keys
  4. Copy your key

Recommended Voices for v3

These premade voices are optimized for v3 and work well with audio tags:

| Voice | ID | Gender | Accent | Best For | |-------|-----|--------|--------|----------| | Adam | pNInz6obpgDQGcFmaJgB | Male | American | Deep narration, general use | | Rachel | 21m00Tcm4TlvDq8ikWAM | Female | American | Calm narration, conversational | | Brian | nPczCjzI2devNBz1zQrb | Male | American | Deep narration, podcasts | | Charlotte | XB0fDUnXU5powFXDhCwa | Female | English-Swedish | Expressive, video games | | George | JBFqnCBsd6RMkjVDRZzb | Male | British | Raspy narration, storytelling |

Finding more voices:

  • Browse: https://elevenlabs.io/voice-library
  • v3-optimized collection: https://elevenlabs.io/app/voice-library/collections/aF6JALq9R6tXwCczjhKH
  • API: GET https://api.elevenlabs.io/v1/voices

Voice selection tips:

  • Use IVC (Instant Voice Clone) or premade voices - PVC not optimized for v3 yet
  • Match voice character to your use case (whispering voice won't shout well)
  • For expressive IVCs, include varied emotional tones in training samples

Model Settings

  • Model: eleven_v3 (alpha) - ONLY model supporting audio tags
  • Languages: 70+ supported with full audio tag control

Stability Modes

| Mode | Stability | Description | |------|-----------|-------------| | Creative | 0.3-0.5 | More emotional/expressive, may hallucinate | | Natural | 0.5-0.7 | Balanced, closest to original voice | | Robust | 0.7-1.0 | Highly stable, less responsive to tags |

For audio tags, use Creative (0.5) or Natural. Higher stability reduces tag responsiveness.

Speed Control

Range: 0.7 (slow) to 1.2 (fast), default 1.0

Extreme values affect quality. For pacing, prefer audio tags like [rushed] or [drawn out].

Critical Rules

Length Limits

  • Optimal: <800 characters per segment (best quality)
  • Maximum: 10,000 characters (API hard limit)
  • Quality degrades with longer text - voice becomes inconsistent

Audio Tags - Best Practices for Natural Sound

How many tags to use:

  • 1-2 tags per sentence or phrase (not more!)
  • Tags persist until the next tag - no need to repeat
  • Overusing tags sounds unnatural and robotic

Where to place tags:

  • At emotional transition points
  • Before key dramatic moments
  • When energy/pace changes

Context matters:

  • Write text that matches the tag emotion
  • Longer text with context = better interpretation
  • Example: [nervous] I... I'm not sure about this. What if it doesn't work? works better than [nervous] Hello.

Combine tags for nuance:

  • [nervously][whispers] = nervous whispering
  • [excited][laughs] = excited laughter
  • Keep combinations to 2 tags max

Regenerate for best results:

  • v3 is non-deterministic - same text = different outputs
  • Generate 3+ versions, pick the best
  • Small text tweaks can improve results

Match tag to voice:

  • Don't use [shouts] on a whispering voice
  • Don't use [whispers] on a loud/energetic voice
  • Test tags with your chosen voice

SSML Not Supported

v3 does NOT support SSML break tags. Use audio tags and punctuation instead.

Punctuation Effects (use with tags!)

Punctuation enhances audio tags:

  • Ellipses (...) → dramatic pauses: [nervous] I... I don't know...
  • CAPS → emphasis: [excited] That's AMAZING!
  • Dashes (—) → interruptions: [explaining] So what you do is— [interrupting] Wait!
  • Question marks → uncertainty: [nervous] Are you sure about this?
  • Exclamation! → energy boost: [happy] We did it!

Combine tags + punctuation for maximum effect:

[tired] It was a long day... [sighs] Nobody listens anymore.

WhatsApp Voice Messages

Complete Workflow

  1. Generate with tts tool (returns MP3)
  2. Convert to Opus (required for Android!)
  3. Send with message tool

Step-by-Step

1. Generate TTS (add [pause] at end to prevent cutoff):

tts text="[excited] This is amazing! [pause]" channel=whatsapp

Returns: MEDIA:/tmp/tts-xxx/voice-123.mp3

2. Convert MP3 → Opus:

ffmpeg -i /tmp/tts-xxx/voice-123.mp3 -c:a libopus -b:a 64k -vbr on -application voip /tmp/tts-xxx/voice-123.ogg

3. Send the Opus file:

Note: The message field below contains a Unicode Left-to-Right Mark (U+200E) between the quotes. This is intentional — WhatsApp requires a non-empty message body to send voice notes. The LTR mark is invisible but satisfies this requirement without displaying any text.

message action=send channel=whatsapp target="+972..." filePath="/tmp/tts-xxx/voice-123.ogg" asVoice=true message="‎"

Why Opus?

| Format | iOS | Android | Transcribe | |--------|-----|---------|------------| | MP3 | ✅ Works | ❌ May fail | ❌ No | | Opus (.ogg) | ✅ Works | ✅ Works | ✅ Yes |

Always convert to Opus - it's the only format that:

  • Works on all devices (iOS + Android)
  • Supports WhatsApp's transcribe button

Audio Cutoff Fix

ElevenLabs sometimes cuts off the last word. Always add [pause] or ... at the end:

[excited] This is amazing! [pause]

Long-Form Audio (Podcasts)

For content >800 chars:

  1. Split into short segments (<800 chars each)
  2. Generate each with tts tool
  3. Concatenate with ffmpeg:
    cat > list.txt << EOF
    file '/path/file1.mp3'
    file '/path/file2.mp3'
    EOF
    ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3
    
  4. Convert to Opus for WhatsApp
  5. Send as single voice message

Important: Don't mention "part 2" or "chapter" - keep it seamless.

Multi-Speaker Dialogue

v3 can handle multiple characters in one generation:

Jessica: [whispers] Did you hear that?
Chris: [interrupting] —I heard it too!
Jessica: [panicking] We need to hide!

Dialogue tags: [interrupting], [overlapping], [cuts in], [interjecting]

Audio Tags Quick Reference

| Category | Tags | When to Use | |----------|------|-------------| | Emotions | [excited], [happy], [sad], [angry], [nervous], [curious] | Main emotional state - use 1 per section | | Delivery | [whispers], [shouts], [soft], [rushed], [drawn out] | Volume/speed changes | | Reactions | [laughs], [sighs], [gasps], [clears throat], [gulps] | Natural human moments - sprinkle sparingly | | Pacing | [pause], [hesitates], [stammers], [breathes] | Dramatic timing | | Character | [French accent], [British accent], [robotic tone] | Character voice shifts | | Dialogue | [interrupting], [overlapping], [cuts in] | Multi-speaker conversations |

Most effective tags (reliable results):

  • Emotions: [excited], [nervous], [sad], [happy]
  • Reactions: [laughs], [sighs], [whispers]
  • Pacing: [pause]

Less reliable (test and regenerate):

  • Sound effects: [explosion], [gunshot]
  • Accents: results vary by voice

Full tag list: See references/audio-tags.md

Troubleshooting

Tags read aloud?

  • Verify using eleven_v3 model
  • Use IVC/premade voices, not PVC
  • Simplify tags (no "tone" suffix)
  • Increase text length (250+ chars)

Voice inconsistent?

  • Segment is too long - split at <800 chars
  • Regenerate (v3 is non-deterministic)
  • Try lower stability setting

WhatsApp won't play?

  • Convert to Opus format (see above)

No emotion despite tags?

  • Voice may not match tag style
  • Try Creative stability mode (0.5)
  • Add more context around the tag

File v2.2.0:_meta.json

{ "ownerId": "kn77700wny92h2kvpav2am1yjx80ewfp", "slug": "elevenlabs-tts", "version": "2.2.0", "publishedAt": 1771087774137 }

File v2.2.0:references/audio-tags.md

Audio Tags Reference

Complete guide to ElevenLabs v3 audio tags.

Prerequisites

  • Model: eleven_v3 (alpha) - ONLY this model supports audio tags
  • Voice Type: IVC (Instant Voice Clone) or designed voices - PVC not optimized yet
  • Prompt Length: 250+ characters for consistent results
  • Stability: Creative or Natural mode (Robust reduces tag responsiveness)

Core Principle

Write NATURAL sentences that tags modify, NOT explanations.

❌ WRONG: [excited] אני מתרגש! ✅ RIGHT: [excited] זה ממש מדהים מה שעשינו היום!


Tag Categories

Emotions (High Reliability)

| Tag | Description | |-----|-------------| | [excited] | Energy, enthusiasm | | [happy] | Joy, cheerfulness | | [happily] | Speaking with happiness | | [sad] | Sadness, melancholy | | [sorrowful] | Deep sadness | | [angry] | Anger, intensity | | [curious] | Curiosity, interest | | [nervous] | Nervousness, anxiety | | [sarcastic] | Sarcasm, irony | | [tired] | Fatigue, weariness | | [serious] | Seriousness | | [confident] | Confidence | | [frustrated] | Frustration | | [mischievous] | Playful mischief | | [awe] | Wonder, amazement | | [resigned] | Acceptance, giving up | | [flustered] | Confused embarrassment | | [casual] | Relaxed, informal | | [annoyed] | Irritation |

Delivery & Volume (High Reliability)

| Tag | Description | |-----|-------------| | [whispers] | Quiet, intimate | | [shouts] | Loud, intense | | [dramatic tone] | Theatrical | | [dramatic] | Dramatic delivery | | [matter-of-fact] | Plain, factual | | [whiny] | Complaining tone | | [flatly] | No emotion | | [quietly] | Soft voice | | [suspiciously] | Suspicious tone |

Pacing & Timing (High Reliability)

| Tag | Description | |-----|-------------| | [pause] | Brief silence | | [breathes] | Breathing sound | | [continues after a beat] | Pause then continue | | [rushed] | Fast, urgent | | [slows down] | Decreasing speed | | [deliberate] | Careful, intentional | | [rapid-fire] | Very fast | | [drawn out] | Stretched, slow | | [stammers] | Stuttering | | [hesitates] | Uncertainty | | [timidly] | Shy, tentative | | [repeats] | Repetition |

Emphasis (Medium Reliability)

| Tag | Description | |-----|-------------| | [emphasized] | Strong emphasis | | [stress on next word] | Emphasize following word | | [understated] | Downplayed delivery |

Reactions & Sounds (Very High Reliability)

| Tag | Description | |-----|-------------| | [laughs] | Laughter | | [laughs softly] | Gentle laugh | | [laughs harder] | Increasing laughter | | [starts laughing] | Beginning to laugh | | [nervous laugh] | Anxious laughter | | [giggles] | Small laugh | | [wheezing] | Breathless laugh | | [sighs] | Exhale of emotion | | [sigh] | Single sigh | | [gasps] | Sharp intake | | [exhales] | Breathing out | | [clears throat] | Throat clearing | | [gulps] | Swallowing | | [swallows] | Swallowing sound | | [snorts] | Snorting sound | | [crying] | Sobbing |

Character & Accents (Medium Reliability)

| Tag | Description | |-----|-------------| | [French accent] | French accent | | [American accent] | American accent | | [British accent] | British accent | | [Australian accent] | Australian accent | | [Southern US accent] | Southern American | | [strong X accent] | Replace X with accent | | [pirate voice] | Pirate character | | [evil scientist voice] | Mad scientist | | [childlike tone] | Child-like voice | | [robotic tone] | Robot voice | | [deep voice] | Lower pitch |

Narrative & Genre (Medium Reliability)

| Tag | Description | |-----|-------------| | [storytelling tone] | Narrator voice | | [voice-over style] | Documentary style | | [fantasy narrator] | Epic fantasy | | [sci-fi AI voice] | Futuristic AI | | [classic film noir] | 1940s detective | | [epic build-up] | Building intensity | | [narrative flourish] | Dramatic narration |

Multi-Speaker Dialogue

| Tag | Description | |-----|-------------| | [interrupting] | Cutting off speaker | | [overlapping] | Speaking over | | [cuts in] | Interjecting | | [interjecting] | Jumping in | | [fast-paced] | Quick exchange |

Sound Effects (Low-Medium Reliability)

| Tag | Description | |-----|-------------| | [gunshot] | Gun sound | | [clapping] | Applause | | [applause] | Audience clapping | | [explosion] | Blast sound | | [thunder] | Thunder |

Experimental (Test First)

| Tag | Description | |-----|-------------| | [sings] | Singing | | [woo] | Exclamation | | [fart] | Sound effect | | [panicked] | Panic | | [trembling] | Shaking voice |


Usage Guidelines

✅ DO:

  • Use simple tags: [excited] not [excited tone]
  • Write natural sentences that work without tags
  • Use 2-4 tags per paragraph max
  • Place tags at sentence start or key moment
  • Match tags to voice character
  • Test and regenerate (v3 is non-deterministic)
  • Combine tags: [whispering][pause] Did you hear that?

❌ DON'T:

  • Don't add "tone" suffix: [serious tone]
  • Don't overload with tags
  • Don't explain what the tag does
  • Don't use incompatible combos (whisper voice + shout tag)
  • Don't expect consistency (regenerate if needed)

Examples

Emotional Monologue

[sighs] I've been thinking about what you said. [pause] 
And you're right. [sadly] I should have listened earlier.
[determined] But I'm going to fix this. Starting now.

Multi-Character Dialogue

Sarah: [whispers] I think someone's coming.
Mike: [interrupting] —I heard it too! [panicked] Hide!
Sarah: [annoyed] I was TRYING to tell you that!

Comedic Timing

[confident] So I walked up to the boss and said... 
[pause] [nervous laugh] Actually, I didn't say anything. 
[sighs] I just stood there. [laughs] Classic me.

Accent Performance

[British accent] Terribly sorry, but I must insist.
[switches to Southern US accent] Well now, that's mighty kind of y'all.
[French accent] Mon ami, you simply must try ze croissant!

Troubleshooting

Tags being read aloud?

  • Check you're using eleven_v3 (not turbo_v3 or v2.5)
  • Use IVC/designed voices, not PVC
  • Simplify tags (remove "tone", "sound", etc.)
  • Increase prompt length (250+ chars)

Tags not working?

  • Generate multiple times (v3 is variable)
  • Use Creative or Natural stability (not Robust)
  • Add surrounding context text
  • Try different tag placement
  • Voice may not match tag style

Multi-speaker not distinct?

  • Add character cues: [deep voice], [higher pitch]
  • Use accent tags for differentiation
  • Add emotional contrast between speakers

Archive v2.1.0: 4 files, 10870 bytes

Files: lib/audio_convert.py (4003b), references/audio-tags.md (7739b), SKILL.md (11540b), _meta.json (133b)

File v2.1.0:SKILL.md


name: elevenlabs-tts description: ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 70+ languages, Hebrew with selective nikud, multi-speaker dialogue, and singing. Includes audio converter utility. tags: [elevenlabs, tts, voice, text-to-speech, audio, speech, whatsapp, multilingual, ai-voice, hebrew, nikud, singing] allowed-tools: [tts, message, exec]

ElevenLabs TTS (Text-to-Speech)

Generate expressive voice messages using ElevenLabs v3 with audio tags.

Quick Start Examples

Storytelling (emotional journey):

[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!

Horror/Suspense (building dread):

[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!

Conversation with reactions:

[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.

Hebrew (romantic moment - selective nikud only where needed):

[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] אַתְּ יודעת שאני אוהב אותָךְ, נכון?

Spanish (celebration to reflection):

[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.

Configuration (OpenClaw)

In openclaw.json, configure TTS under messages.tts:

{
  "messages": {
    "tts": {
      "provider": "elevenlabs",
      "elevenlabs": {
        "apiKey": "sk_your_api_key_here",
        "voiceId": "YOUR_VOICE_ID",
        "modelId": "eleven_v3",
        "languageCode": "en",
        "voiceSettings": {
          "stability": 0.5,
          "similarityBoost": 0.75,
          "style": 0,
          "useSpeakerBoost": true,
          "speed": 1
        }
      }
    }
  }
}

Getting your API Key:

  1. Go to https://elevenlabs.io
  2. Sign up/login
  3. Click profile → API Keys
  4. Copy your key

Recommended Voices for v3

These premade voices are optimized for v3 and work well with audio tags:

| Voice | ID | Gender | Accent | Best For | |-------|-----|--------|--------|----------| | Adam | pNInz6obpgDQGcFmaJgB | Male | American | Deep narration, general use | | Rachel | 21m00Tcm4TlvDq8ikWAM | Female | American | Calm narration, conversational | | Brian | nPczCjzI2devNBz1zQrb | Male | American | Deep narration, podcasts | | Charlotte | XB0fDUnXU5powFXDhCwa | Female | English-Swedish | Expressive, video games | | George | JBFqnCBsd6RMkjVDRZzb | Male | British | Raspy narration, storytelling |

Finding more voices:

  • Browse: https://elevenlabs.io/voice-library
  • v3-optimized collection: https://elevenlabs.io/app/voice-library/collections/aF6JALq9R6tXwCczjhKH
  • API: GET https://api.elevenlabs.io/v1/voices

Voice selection tips:

  • Use IVC (Instant Voice Clone) or premade voices - PVC not optimized for v3 yet
  • Match voice character to your use case (whispering voice won't shout well)
  • For expressive IVCs, include varied emotional tones in training samples

Model Settings

  • Model: eleven_v3 (alpha) - ONLY model supporting audio tags
  • Languages: 70+ supported with full audio tag control

Stability Modes

v3 only accepts three values: 0.0, 0.5, 1.0

| Mode | Value | Description | |------|-------|-------------| | Creative | 0.0 | Most emotional/expressive, best for singing, may hallucinate | | Natural | 0.5 | Balanced, closest to original voice | | Robust | 1.0 | Highly stable, less responsive to tags |

For audio tags, use Creative (0.0) or Natural (0.5). Robust reduces tag responsiveness.

Speed Control

Range: 0.7 (slow) to 1.2 (fast), default 1.0

Extreme values affect quality. For pacing, prefer audio tags like [rushed] or [drawn out].

Hebrew Nikud (Vowel Points)

Use nikud selectively - only on words where pronunciation is ambiguous. Full nikud on every word can degrade quality.

The rule: only add nikud where the model might guess wrong.

Common cases where nikud helps:

  1. Gender suffixes - שלומֵךְ (f) vs שלומְךָ (m), לָךְ (f) vs לְךָ (m), אותָךְ (f) vs אותְךָ (m)
  2. Dagesh (hard/soft consonants) - letters בכפ change sound with dagesh:
    • פּ = P, פ = F: פִּיצה (pizza), פִּייר (Pierre)
    • בּ = B, ב = V: בְּרָכָה (brakha), בְּדִיוּק (bediyuk)
    • כּ = K, כ = Kh: כּוֹס (kos), כַּמָּה (kama)
  3. Homographs - same spelling, different meaning/pronunciation:
    • בּוֹקֶר (morning) vs בּוֹקֵר (cowboy)
    • עוֹלָם (world) vs עוֹלֵם (concealing)
    • סֵפֶר (book) vs סָפַר (counted)
  4. Foreign names and loanwords - the model often guesses wrong
  5. Stress placement - when it changes meaning or sounds unnatural

When NOT to add nikud:

  • Common words with only one pronunciation (מה, יש, הרבה, שלום, אני, הוא, etc.)
  • Context makes pronunciation obvious
  • Most of the sentence - keep it clean

Example:

❌ Full nikud: מַה שְׁלוֹמְךָ? יֵשׁ לְךָ הַרְבֵּה כֶּסֶף.
✅ Selective: מה שלומְךָ? יש לְךָ הרבה כסף.
✅ Dagesh: ז'אן-פִּייר אפה פִּיצה מושלמת.

Principle: If you read the word and there's only one way to say it - skip the nikud. If there's ambiguity - add it.

Critical Rules

Length Limits

  • Optimal: <800 characters per segment (best quality)
  • Maximum: 10,000 characters (API hard limit)
  • Quality degrades with longer text - voice becomes inconsistent

Audio Tags - Best Practices for Natural Sound

How many tags to use:

  • 1-2 tags per sentence or phrase (not more!)
  • Tags persist until the next tag - no need to repeat
  • Overusing tags sounds unnatural and robotic

Where to place tags:

  • At emotional transition points
  • Before key dramatic moments
  • When energy/pace changes

Context matters:

  • Write text that matches the tag emotion
  • Longer text with context = better interpretation
  • Example: [nervous] I... I'm not sure about this. What if it doesn't work? works better than [nervous] Hello.

Combine tags for nuance:

  • [nervously][whispers] = nervous whispering
  • [excited][laughs] = excited laughter
  • Keep combinations to 2 tags max

Regenerate for best results:

  • v3 is non-deterministic - same text = different outputs
  • Generate 3+ versions, pick the best
  • Small text tweaks can improve results

Match tag to voice:

  • Don't use [shouts] on a whispering voice
  • Don't use [whispers] on a loud/energetic voice
  • Test tags with your chosen voice

SSML Not Supported

v3 does NOT support SSML break tags. Use audio tags and punctuation instead.

Punctuation Effects (use with tags!)

Punctuation enhances audio tags:

  • Ellipses (...) → dramatic pauses: [nervous] I... I don't know...
  • CAPS → emphasis: [excited] That's AMAZING!
  • Dashes (—) → interruptions: [explaining] So what you do is— [interrupting] Wait!
  • Question marks → uncertainty: [nervous] Are you sure about this?
  • Exclamation! → energy boost: [happy] We did it!

Combine tags + punctuation for maximum effect:

[tired] It was a long day... [sighs] Nobody listens anymore.

WhatsApp Voice Messages

Complete Workflow

  1. Generate with tts tool (returns MP3)
  2. Convert to Opus (required for Android!)
  3. Send with message tool

Step-by-Step

1. Generate TTS (add [pause] at end to prevent cutoff):

tts text="[excited] This is amazing! [pause]" channel=whatsapp

Returns: MEDIA:/tmp/tts-xxx/voice-123.mp3

2. Convert MP3 → Opus using the included converter:

python3 lib/audio_convert.py convert /tmp/tts-xxx/voice-123.mp3 /tmp/tts-xxx/voice-123.ogg

3. Send the Opus file:

message action=send channel=whatsapp target="+972..." filePath="/tmp/tts-xxx/voice-123.ogg" asVoice=true message="‎"

Why Opus?

| Format | iOS | Android | Transcribe | |--------|-----|---------|------------| | MP3 | ✅ Works | ❌ May fail | ❌ No | | Opus (.ogg) | ✅ Works | ✅ Works | ✅ Yes |

Always convert to Opus - it's the only format that:

  • Works on all devices (iOS + Android)
  • Supports WhatsApp's transcribe button

Audio Cutoff Fix

ElevenLabs sometimes cuts off the last word. Always add [pause] or ... at the end:

[excited] This is amazing! [pause]

Long-Form Audio (Podcasts)

For content >800 chars:

  1. Split into short segments (<800 chars each)
  2. Generate each with tts tool
  3. Concatenate using the included converter:
    python3 lib/audio_convert.py concat /tmp/final.mp3 /tmp/part1.mp3 /tmp/part2.mp3
    
  4. Convert to Opus for WhatsApp:
    python3 lib/audio_convert.py convert /tmp/final.mp3 /tmp/final.ogg
    
  5. Send as single voice message

Important: Don't mention "part 2" or "chapter" - keep it seamless.

Multi-Speaker Dialogue

v3 can handle multiple characters in one generation:

Jessica: [whispers] Did you hear that?
Chris: [interrupting] —I heard it too!
Jessica: [panicking] We need to hide!

Dialogue tags: [interrupting], [overlapping], [cuts in], [interjecting]

Audio Tags Quick Reference

| Category | Tags | When to Use | |----------|------|-------------| | Emotions | [excited], [happy], [sad], [angry], [nervous], [curious] | Main emotional state - use 1 per section | | Delivery | [whispers], [shouts], [soft], [rushed], [drawn out] | Volume/speed changes | | Reactions | [laughs], [sighs], [gasps], [clears throat], [gulps] | Natural human moments - sprinkle sparingly | | Pacing | [pause], [hesitates], [stammers], [breathes] | Dramatic timing | | Character | [French accent], [British accent], [robotic tone] | Character voice shifts | | Dialogue | [interrupting], [overlapping], [cuts in] | Multi-speaker conversations |

Most effective tags (reliable results):

  • Emotions: [excited], [nervous], [sad], [happy]
  • Reactions: [laughs], [sighs], [whispers]
  • Pacing: [pause]

Less reliable (test and regenerate):

  • Sound effects: [explosion], [gunshot]
  • Accents: results vary by voice

Full tag list: See references/audio-tags.md

Troubleshooting

Tags read aloud?

  • Verify using eleven_v3 model
  • Use IVC/premade voices, not PVC
  • Simplify tags (no "tone" suffix)
  • Increase text length (250+ chars)

Voice inconsistent?

  • Segment is too long - split at <800 chars
  • Regenerate (v3 is non-deterministic)
  • Try lower stability setting

WhatsApp won't play?

  • Convert to Opus format (see above)

No emotion despite tags?

  • Voice may not match tag style
  • Try Creative stability mode (0.0)
  • Add more context around the tag

File v2.1.0:_meta.json

{ "ownerId": "kn77700wny92h2kvpav2am1yjx80ewfp", "slug": "elevenlabs-tts", "version": "2.1.0", "publishedAt": 1770644808695 }

File v2.1.0:references/audio-tags.md

Audio Tags Reference

Complete guide to ElevenLabs v3 audio tags.

Prerequisites

  • Model: eleven_v3 (alpha) - ONLY this model supports audio tags
  • Voice Type: IVC (Instant Voice Clone) or designed voices - PVC not optimized yet
  • Prompt Length: 250+ characters for consistent results
  • Stability: Creative or Natural mode (Robust reduces tag responsiveness)

Core Principle

Write NATURAL sentences that tags modify, NOT explanations.

❌ WRONG: [excited] אני מתרגש! ✅ RIGHT: [excited] זה ממש מדהים מה שעשינו היום!


Tag Categories

Emotions (High Reliability)

| Tag | Description | |-----|-------------| | [excited] | Energy, enthusiasm | | [happy] | Joy, cheerfulness | | [happily] | Speaking with happiness | | [sad] | Sadness, melancholy | | [sorrowful] | Deep sadness | | [angry] | Anger, intensity | | [curious] | Curiosity, interest | | [nervous] | Nervousness, anxiety | | [sarcastic] | Sarcasm, irony | | [tired] | Fatigue, weariness | | [serious] | Seriousness | | [confident] | Confidence | | [frustrated] | Frustration | | [mischievous] | Playful mischief | | [awe] | Wonder, amazement | | [resigned] | Acceptance, giving up | | [flustered] | Confused embarrassment | | [casual] | Relaxed, informal | | [annoyed] | Irritation |

Delivery & Volume (High Reliability)

| Tag | Description | |-----|-------------| | [whispers] | Quiet, intimate | | [shouts] | Loud, intense | | [dramatic tone] | Theatrical | | [dramatic] | Dramatic delivery | | [matter-of-fact] | Plain, factual | | [whiny] | Complaining tone | | [flatly] | No emotion | | [quietly] | Soft voice | | [suspiciously] | Suspicious tone |

Pacing & Timing (High Reliability)

| Tag | Description | |-----|-------------| | [pause] | Brief silence | | [breathes] | Breathing sound | | [continues after a beat] | Pause then continue | | [rushed] | Fast, urgent | | [slows down] | Decreasing speed | | [deliberate] | Careful, intentional | | [rapid-fire] | Very fast | | [drawn out] | Stretched, slow | | [stammers] | Stuttering | | [hesitates] | Uncertainty | | [timidly] | Shy, tentative | | [repeats] | Repetition |

Emphasis (Medium Reliability)

| Tag | Description | |-----|-------------| | [emphasized] | Strong emphasis | | [stress on next word] | Emphasize following word | | [understated] | Downplayed delivery |

Reactions & Sounds (Very High Reliability)

| Tag | Description | |-----|-------------| | [laughs] | Laughter | | [laughs softly] | Gentle laugh | | [laughs harder] | Increasing laughter | | [starts laughing] | Beginning to laugh | | [nervous laugh] | Anxious laughter | | [giggles] | Small laugh | | [wheezing] | Breathless laugh | | [sighs] | Exhale of emotion | | [sigh] | Single sigh | | [gasps] | Sharp intake | | [exhales] | Breathing out | | [clears throat] | Throat clearing | | [gulps] | Swallowing | | [swallows] | Swallowing sound | | [snorts] | Snorting sound | | [crying] | Sobbing |

Character & Accents (Medium Reliability)

| Tag | Description | |-----|-------------| | [French accent] | French accent | | [American accent] | American accent | | [British accent] | British accent | | [Australian accent] | Australian accent | | [Southern US accent] | Southern American | | [strong X accent] | Replace X with accent | | [pirate voice] | Pirate character | | [evil scientist voice] | Mad scientist | | [childlike tone] | Child-like voice | | [robotic tone] | Robot voice | | [deep voice] | Lower pitch |

Narrative & Genre (Medium Reliability)

| Tag | Description | |-----|-------------| | [storytelling tone] | Narrator voice | | [voice-over style] | Documentary style | | [fantasy narrator] | Epic fantasy | | [sci-fi AI voice] | Futuristic AI | | [classic film noir] | 1940s detective | | [epic build-up] | Building intensity | | [narrative flourish] | Dramatic narration |

Multi-Speaker Dialogue

| Tag | Description | |-----|-------------| | [interrupting] | Cutting off speaker | | [overlapping] | Speaking over | | [cuts in] | Interjecting | | [interjecting] | Jumping in | | [fast-paced] | Quick exchange |

Sound Effects (Low-Medium Reliability)

| Tag | Description | |-----|-------------| | [gunshot] | Gun sound | | [clapping] | Applause | | [applause] | Audience clapping | | [explosion] | Blast sound | | [thunder] | Thunder |

Experimental (Test First)

| Tag | Description | |-----|-------------| | [sings] | Singing | | [woo] | Exclamation | | [fart] | Sound effect | | [panicked] | Panic | | [trembling] | Shaking voice |


Usage Guidelines

✅ DO:

  • Use simple tags: [excited] not [excited tone]
  • Write natural sentences that work without tags
  • Use 2-4 tags per paragraph max
  • Place tags at sentence start or key moment
  • Match tags to voice character
  • Test and regenerate (v3 is non-deterministic)
  • Combine tags: [whispering][pause] Did you hear that?

❌ DON'T:

  • Don't add "tone" suffix: [serious tone]
  • Don't overload with tags
  • Don't explain what the tag does
  • Don't use incompatible combos (whisper voice + shout tag)
  • Don't expect consistency (regenerate if needed)

Examples

Emotional Monologue

[sighs] I've been thinking about what you said. [pause] 
And you're right. [sadly] I should have listened earlier.
[determined] But I'm going to fix this. Starting now.

Multi-Character Dialogue

Sarah: [whispers] I think someone's coming.
Mike: [interrupting] —I heard it too! [panicked] Hide!
Sarah: [annoyed] I was TRYING to tell you that!

Comedic Timing

[confident] So I walked up to the boss and said... 
[pause] [nervous laugh] Actually, I didn't say anything. 
[sighs] I just stood there. [laughs] Classic me.

Accent Performance

[British accent] Terribly sorry, but I must insist.
[switches to Southern US accent] Well now, that's mighty kind of y'all.
[French accent] Mon ami, you simply must try ze croissant!

Singing

The [singing] tag can produce melodic intonation. Results are inconsistent - v3 is a TTS model, not a music model.

Format (tag on its own line before lyrics):


[singing]
Oh Tommy boy, the pipes the pipes are calling,
from glen to glen and down the mountain side.

Best settings for singing:

  • Stability: Creative (0.0) - most expressive, best for singing
  • Voice: Use v3-optimized premade voices (Adam, Charlotte, etc.)
  • Language: English works best; Hebrew is less reliable for singing
  • Non-deterministic: Generate multiple times - each result is different

Tips:

  • Put [singing] on its own line before lyrics
  • Use known songs the model might recognize
  • Stack with emotion: [happy]\n[singing]\nlyrics...
  • Keep lyrics short per generation

Limitations:

  • Not real singing with full melody - more like melodic speech
  • Results vary heavily by voice and generation
  • For actual music generation, use Suno or Udio

Troubleshooting

Tags being read aloud?

  • Check you're using eleven_v3 (not turbo_v3 or v2.5)
  • Use IVC/designed voices, not PVC
  • Simplify tags (remove "tone", "sound", etc.)
  • Increase prompt length (250+ chars)

Tags not working?

  • Generate multiple times (v3 is variable)
  • Use Creative or Natural stability (not Robust)
  • Add surrounding context text
  • Try different tag placement
  • Voice may not match tag style

Multi-speaker not distinct?

  • Add character cues: [deep voice], [higher pitch]
  • Use accent tags for differentiation
  • Add emotional contrast between speakers

API & Reliability

Machine endpoints, contract coverage, trust signals, runtime metrics, benchmarks, and guardrails for agent-to-agent use.

MissingCLAWHUB

Machine interfaces

Contract & API

Contract coverage

Status

missing

Auth

None

Streaming

No

Data region

Unspecified

Protocol support

No protocol metadata captured.

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.
Invocation examples
curl -s "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/snapshot"
curl -s "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/contract"
curl -s "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/trust"

Operational fit

Reliability & Benchmarks

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.
No benchmark suites or observed failure patterns are available.

Machine Appendix

Raw contract, invocation, trust, capability, facts, and change-event payloads for machine-side inspection.

MissingCLAWHUB

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": []
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "CLAWHUB",
      "generatedAt": "2026-04-17T02:55:20.513Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [],
  "flattenedTokens": ""
}

Facts JSON

[
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Clawhub",
    "href": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
    "sourceUrl": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "traction",
    "category": "adoption",
    "label": "Adoption signal",
    "value": "4.5K downloads",
    "href": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
    "sourceUrl": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "latest_release",
    "category": "release",
    "label": "Latest release",
    "value": "2.2.0",
    "href": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
    "sourceUrl": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
    "sourceType": "release",
    "confidence": "medium",
    "observedAt": "2026-02-14T16:49:34.137Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "release",
    "title": "Release 2.2.0",
    "description": "Security scan fixes",
    "href": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
    "sourceUrl": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
    "sourceType": "release",
    "confidence": "medium",
    "observedAt": "2026-02-14T16:49:34.137Z",
    "isPublic": true
  }
]

Sponsored

Ads related to Elevenlabs Tts and adjacent AI workflows.