Xpersona Agent
Elevenlabs Tts
ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Skill: Elevenlabs Tts Owner: Shaharsha Summary: ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Tags: ai-voice:2.1.0, audio:2.1.0, elevenlabs:2.1.0, elevenlabs-tts:1.3.2, hebrew:2.1.0, latest:2.2.0, multilingual:2.1.0, nikud:2.1.0, openclaw:1.3.2, podcast:1.2.1, singing:2.1.0, speech:2.1.0, text-to-speech:
clawhub skill install kn77700wny92h2kvpav2am1yjx80ewfp:elevenlabs-ttsOverall rank
#62
Adoption
4.5K downloads
Trust
Unknown
Freshness
Feb 28, 2026
Freshness
Last checked Feb 28, 2026
Best For
Elevenlabs Tts is best for general automation workflows where documented compatibility matters.
Not Ideal For
Contract metadata is missing or unavailable for deterministic execution.
Evidence Sources Checked
editorial-content, CLAWHUB, runtime-metrics, public facts pack
Overview
Key links, install path, reliability highlights, and the shortest practical read before diving into the crawl record.
Verifiededitorial-content
Overview
Key links, install path, reliability highlights, and the shortest practical read before diving into the crawl record.
Overview
Executive Summary
ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Skill: Elevenlabs Tts Owner: Shaharsha Summary: ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Tags: ai-voice:2.1.0, audio:2.1.0, elevenlabs:2.1.0, elevenlabs-tts:1.3.2, hebrew:2.1.0, latest:2.2.0, multilingual:2.1.0, nikud:2.1.0, openclaw:1.3.2, podcast:1.2.1, singing:2.1.0, speech:2.1.0, text-to-speech: Capability contract not published. No trust telemetry is available yet. 4.5K downloads reported by the source. Last updated 4/15/2026.
Trust score
Unknown
Compatibility
Profile only
Freshness
Feb 28, 2026
Vendor
Clawhub
Artifacts
0
Benchmarks
0
Last release
2.2.0
Install & run
Setup Snapshot
clawhub skill install kn77700wny92h2kvpav2am1yjx80ewfp:elevenlabs-tts- 1
Setup complexity is classified as HIGH. You must provision dedicated cloud infrastructure or an isolated VM. Do not run this directly on your local workstation.
- 2
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.
Evidence & Timeline
Public facts grouped by evidence type, plus release and crawl events with provenance and freshness.
Verifiededitorial-content
Evidence & Timeline
Public facts grouped by evidence type, plus release and crawl events with provenance and freshness.
Public facts
Evidence Ledger
Vendor (1)
Vendor
Clawhub
Release (1)
Latest release
2.2.0
Adoption (1)
Adoption signal
4.5K downloads
Security (1)
Handshake status
UNKNOWN
Events
Release & Crawl Timeline
Artifacts & Docs
Parameters, dependencies, examples, extracted files, editorial overview, and the complete README when available.
Self-declaredCLAWHUB
Artifacts & Docs
Parameters, dependencies, examples, extracted files, editorial overview, and the complete README when available.
Captured outputs
Artifacts Archive
Extracted files
3
Examples
6
Snippets
0
Languages
Unknown
Executable Examples
text
[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!
text
[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!
text
[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.
text
[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?
text
[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.
json
{
"messages": {
"tts": {
"provider": "elevenlabs",
"elevenlabs": {
"apiKey": "sk_your_api_key_here",
"voiceId": "pNInz6obpgDQGcFmaJgB",
"modelId": "eleven_v3",
"languageCode": "en",
"voiceSettings": {
"stability": 0.5,
"similarityBoost": 0.75,
"style": 0,
"useSpeakerBoost": true,
"speed": 1
}
}
}
}
}Extracted Files
SKILL.md
---
name: elevenlabs-tts
description: ElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp, ElevenLabs multilingual support. Generate realistic AI voices using ElevenLabs API.
tags: [elevenlabs, tts, voice, text-to-speech, audio, speech, whatsapp, multilingual, ai-voice]
metadata: {"clawdbot":{"emoji":"🎙️","requires":{"env":["ELEVENLABS_API_KEY"],"system":["ffmpeg"]},"primaryEnv":"ELEVENLABS_API_KEY"}}
allowed-tools: [exec, tts, message]
---
# ElevenLabs TTS (Text-to-Speech)
Generate expressive voice messages using ElevenLabs v3 with audio tags.
## Prerequisites
- **ElevenLabs API Key** (`ELEVENLABS_API_KEY`): Required. Get one at [elevenlabs.io](https://elevenlabs.io) → Profile → API Keys. Configure in `openclaw.json` under `messages.tts.elevenlabs.apiKey`.
- **ffmpeg**: Required for audio format conversion (MP3 → Opus for WhatsApp compatibility). Must be installed and available on PATH.
## Quick Start Examples
**Storytelling (emotional journey):**
```
[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!
```
**Horror/Suspense (building dread):**
```
[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!
```
**Conversation with reactions:**
```
[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.
```
**Hebrew (romantic moment):**
```
[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?
```
**Spanish (celebration to reflection):**
```
[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.
```
## Configuration (OpenClaw)
In `openclaw.json`, configure TTS under `messages.tts`:
```json
{
"messages": {
"tts": {
"provider": "elevenlabs",
"elevenlabs": {
"apiKey": "sk_your_api_key_here",
"voiceId": "pNInz6obpgDQGcFmaJgB",
"modelId": "eleven_v3",
"languageCode": "en",
"voiceSettings": {
"stability": 0.5,
"similarityBoost": 0.75,
"style": 0,
"useSpeakerBoost": true,
"speed": 1
}
}
}
}
}
```
**Getting your API Key:**
1. Go to https://elevenlabs.io
2. Sign up/login
3. Click profile → API Keys
4. Copy your key
## Recommended Voices for v3
These premade voices are optimized for v3 and wo_meta.json
{
"ownerId": "kn77700wny92h2kvpav2am1yjx80ewfp",
"slug": "elevenlabs-tts",
"version": "2.2.0",
"publishedAt": 1771087774137
}references/audio-tags.md
# Audio Tags Reference Complete guide to ElevenLabs v3 audio tags. ## Prerequisites - **Model**: `eleven_v3` (alpha) - ONLY this model supports audio tags - **Voice Type**: IVC (Instant Voice Clone) or designed voices - PVC not optimized yet - **Prompt Length**: 250+ characters for consistent results - **Stability**: Creative or Natural mode (Robust reduces tag responsiveness) ## Core Principle Write NATURAL sentences that tags modify, NOT explanations. ❌ WRONG: `[excited] אני מתרגש!` ✅ RIGHT: `[excited] זה ממש מדהים מה שעשינו היום!` --- ## Tag Categories ### Emotions (High Reliability) | Tag | Description | |-----|-------------| | `[excited]` | Energy, enthusiasm | | `[happy]` | Joy, cheerfulness | | `[happily]` | Speaking with happiness | | `[sad]` | Sadness, melancholy | | `[sorrowful]` | Deep sadness | | `[angry]` | Anger, intensity | | `[curious]` | Curiosity, interest | | `[nervous]` | Nervousness, anxiety | | `[sarcastic]` | Sarcasm, irony | | `[tired]` | Fatigue, weariness | | `[serious]` | Seriousness | | `[confident]` | Confidence | | `[frustrated]` | Frustration | | `[mischievous]` | Playful mischief | | `[awe]` | Wonder, amazement | | `[resigned]` | Acceptance, giving up | | `[flustered]` | Confused embarrassment | | `[casual]` | Relaxed, informal | | `[annoyed]` | Irritation | ### Delivery & Volume (High Reliability) | Tag | Description | |-----|-------------| | `[whispers]` | Quiet, intimate | | `[shouts]` | Loud, intense | | `[dramatic tone]` | Theatrical | | `[dramatic]` | Dramatic delivery | | `[matter-of-fact]` | Plain, factual | | `[whiny]` | Complaining tone | | `[flatly]` | No emotion | | `[quietly]` | Soft voice | | `[suspiciously]` | Suspicious tone | ### Pacing & Timing (High Reliability) | Tag | Description | |-----|-------------| | `[pause]` | Brief silence | | `[breathes]` | Breathing sound | | `[continues after a beat]` | Pause then continue | | `[rushed]` | Fast, urgent | | `[slows down]` | Decreasing speed | | `[deliberate]` | Careful, intentional | | `[rapid-fire]` | Very fast | | `[drawn out]` | Stretched, slow | | `[stammers]` | Stuttering | | `[hesitates]` | Uncertainty | | `[timidly]` | Shy, tentative | | `[repeats]` | Repetition | ### Emphasis (Medium Reliability) | Tag | Description | |-----|-------------| | `[emphasized]` | Strong emphasis | | `[stress on next word]` | Emphasize following word | | `[understated]` | Downplayed delivery | ### Reactions & Sounds (Very High Reliability) | Tag | Description | |-----|-------------| | `[laughs]` | Laughter | | `[laughs softly]` | Gentle laugh | | `[laughs harder]` | Increasing laughter | | `[starts laughing]` | Beginning to laugh | | `[nervous laugh]` | Anxious laughter | | `[giggles]` | Small laugh | | `[wheezing]` | Breathless laugh | | `[sighs]` | Exhale of emotion | | `[sigh]` | Single sigh | | `[gasps]` | Sharp intake | | `[exhales]` | Breathing out | | `[clears throat]` | Throat clearing | | `[gulps]` | Swallowing | | `[swallows]` | Swallowin
Editorial read
Docs & README
Docs source
CLAWHUB
Editorial quality
ready
ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Skill: Elevenlabs Tts Owner: Shaharsha Summary: ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7... Tags: ai-voice:2.1.0, audio:2.1.0, elevenlabs:2.1.0, elevenlabs-tts:1.3.2, hebrew:2.1.0, latest:2.2.0, multilingual:2.1.0, nikud:2.1.0, openclaw:1.3.2, podcast:1.2.1, singing:2.1.0, speech:2.1.0, text-to-speech:
Full README
Skill: Elevenlabs Tts
Owner: Shaharsha
Summary: ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 7...
Tags: ai-voice:2.1.0, audio:2.1.0, elevenlabs:2.1.0, elevenlabs-tts:1.3.2, hebrew:2.1.0, latest:2.2.0, multilingual:2.1.0, nikud:2.1.0, openclaw:1.3.2, podcast:1.2.1, singing:2.1.0, speech:2.1.0, text-to-speech:2.1.0, tts:2.1.0, voice:2.1.0, whatsapp:2.1.0
Version history:
v2.2.0 | 2026-02-14T16:49:34.137Z | user
Security scan fixes
v2.1.0 | 2026-02-09T13:46:48.695Z | user
Comprehensive Hebrew nikud guide: dagesh (B/V, K/Kh, P/F), gender suffixes, homographs, stress placement, foreign names. Clear principle: only nikud where ambiguity exists.
v2.0.0 | 2026-02-09T13:35:12.104Z | user
Major polish: improved description for discoverability, added allowed-tools declaration, fixed stability value in troubleshooting (0.0 not 0.5), selective nikud in Hebrew example, security-clean SKILL.md with lib/audio_convert.py wrapper.
v1.6.0 | 2026-02-09T13:33:27.653Z | user
Security: moved all ffmpeg shell commands into lib/audio_convert.py wrapper script. SKILL.md no longer contains raw bash commands. Added convert and concat CLI utilities.
v1.5.0 | 2026-02-09T13:27:37.042Z | user
Fixed stability values (v3 only accepts 0.0/0.5/1.0). Added singing guide with correct format ([singing] on own line). Updated audio-tags reference with singing tips and limitations.
v1.4.0 | 2026-02-09T12:52:07.278Z | user
Added Hebrew nikud (vowel points) support for accurate pronunciation. Updated Hebrew example with full nikud. Changed config example to use placeholder voiceId.
v1.3.2 | 2026-02-04T12:14:22.141Z | auto
- Updated the skill description in SKILL.md for improved clarity and focus on key ElevenLabs/OpenClaw features.
- Revised keywords in tags for better search relevance.
- No changes to code or functionality—documentation change only.
v1.3.1 | 2026-02-04T12:13:02.242Z | auto
Version 1.3.1
- No file changes detected in this release.
- Documentation and usage instructions remain the same.
- No new features, bug fixes, or updates in this version.
v1.2.9 | 2026-02-04T12:12:26.571Z | auto
elevenlabs-tts 1.2.9
- Updated SKILL.md with improved and concise description, adding tags for discoverability.
- Enhanced feature summary to emphasize WhatsApp, multilingual support, and OpenClaw integration.
- No code changes; documentation only.
v1.2.8 | 2026-02-03T22:09:57.449Z | user
Added: WhatsApp transcribe button only works with Opus format
v1.2.7 | 2026-02-03T22:07:12.207Z | user
Complete WhatsApp workflow: generate→convert to Opus→send. MP3 fails on Android, Opus works everywhere.
v1.2.6 | 2026-02-03T22:05:53.551Z | user
Added WhatsApp sending instructions (message tool with asVoice), audio cutoff fix (add pause at end)
v1.2.5 | 2026-02-03T22:02:39.924Z | user
Improved Quick Start examples with more audio tags demonstrating emotional transitions
v1.3.0 | 2026-02-03T21:59:24.517Z | user
Major update: Added comprehensive best practices for natural-sounding audio tags - how many to use, where to place them, context tips, regeneration strategies, punctuation effects, and updated examples with emotional progressions
v1.2.4 | 2026-02-03T21:56:25.048Z | user
Updated examples to show multiple audio tags per message
v1.2.3 | 2026-02-03T18:08:09.266Z | user
Fixed display name
v1.2.2 | 2026-02-03T18:07:40.253Z | user
Added (Text-to-Speech) to title for clarity
v1.2.1 | 2026-02-03T18:03:39.803Z | user
Added TTS explanation (Text-to-Speech) in description for clarity
v1.2.0 | 2026-02-03T17:54:31.286Z | user
Added: 3 language examples (EN/HE/ES), OpenClaw config guide, 5 recommended voice IDs with table, voice selection tips, how to get API key
v1.1.0 | 2026-02-03T17:50:29.026Z | user
Major update: Added multi-speaker dialogue, 50+ new tags, stability modes guide, speed control, punctuation effects, fixed API limit (10K not 5K), 70+ languages support
v1.0.1 | 2026-02-03T17:49:17.446Z | user
Added references/audio-tags.md
v1.0.0 | 2026-02-03T17:46:32.654Z | auto
- Initial release of ElevenLabs-TTS with integrated audio tag support for expressive voice synthesis.
- Supports creation of voice messages, podcasts, audiobooks, and other spoken content with emotional expression.
- Handles WhatsApp voice compatibility, including guidance on Opus conversion.
- Provides instructions for segmenting and concatenating long-form audio.
- Includes quick reference for critical audio tags and troubleshooting tips.
Archive index:
Archive v2.2.0: 3 files, 8483 bytes
Files: references/audio-tags.md (6750b), SKILL.md (10531b), _meta.json (133b)
File v2.2.0:SKILL.md
name: elevenlabs-tts description: ElevenLabs TTS - the best ElevenLabs integration for OpenClaw. ElevenLabs Text-to-Speech with emotional audio tags, ElevenLabs voice synthesis for WhatsApp, ElevenLabs multilingual support. Generate realistic AI voices using ElevenLabs API. tags: [elevenlabs, tts, voice, text-to-speech, audio, speech, whatsapp, multilingual, ai-voice] metadata: {"clawdbot":{"emoji":"🎙️","requires":{"env":["ELEVENLABS_API_KEY"],"system":["ffmpeg"]},"primaryEnv":"ELEVENLABS_API_KEY"}} allowed-tools: [exec, tts, message]
ElevenLabs TTS (Text-to-Speech)
Generate expressive voice messages using ElevenLabs v3 with audio tags.
Prerequisites
- ElevenLabs API Key (
ELEVENLABS_API_KEY): Required. Get one at elevenlabs.io → Profile → API Keys. Configure inopenclaw.jsonundermessages.tts.elevenlabs.apiKey. - ffmpeg: Required for audio format conversion (MP3 → Opus for WhatsApp compatibility). Must be installed and available on PATH.
Quick Start Examples
Storytelling (emotional journey):
[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!
Horror/Suspense (building dread):
[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!
Conversation with reactions:
[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.
Hebrew (romantic moment):
[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון?
Spanish (celebration to reflection):
[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.
Configuration (OpenClaw)
In openclaw.json, configure TTS under messages.tts:
{
"messages": {
"tts": {
"provider": "elevenlabs",
"elevenlabs": {
"apiKey": "sk_your_api_key_here",
"voiceId": "pNInz6obpgDQGcFmaJgB",
"modelId": "eleven_v3",
"languageCode": "en",
"voiceSettings": {
"stability": 0.5,
"similarityBoost": 0.75,
"style": 0,
"useSpeakerBoost": true,
"speed": 1
}
}
}
}
}
Getting your API Key:
- Go to https://elevenlabs.io
- Sign up/login
- Click profile → API Keys
- Copy your key
Recommended Voices for v3
These premade voices are optimized for v3 and work well with audio tags:
| Voice | ID | Gender | Accent | Best For |
|-------|-----|--------|--------|----------|
| Adam | pNInz6obpgDQGcFmaJgB | Male | American | Deep narration, general use |
| Rachel | 21m00Tcm4TlvDq8ikWAM | Female | American | Calm narration, conversational |
| Brian | nPczCjzI2devNBz1zQrb | Male | American | Deep narration, podcasts |
| Charlotte | XB0fDUnXU5powFXDhCwa | Female | English-Swedish | Expressive, video games |
| George | JBFqnCBsd6RMkjVDRZzb | Male | British | Raspy narration, storytelling |
Finding more voices:
- Browse: https://elevenlabs.io/voice-library
- v3-optimized collection: https://elevenlabs.io/app/voice-library/collections/aF6JALq9R6tXwCczjhKH
- API:
GET https://api.elevenlabs.io/v1/voices
Voice selection tips:
- Use IVC (Instant Voice Clone) or premade voices - PVC not optimized for v3 yet
- Match voice character to your use case (whispering voice won't shout well)
- For expressive IVCs, include varied emotional tones in training samples
Model Settings
- Model:
eleven_v3(alpha) - ONLY model supporting audio tags - Languages: 70+ supported with full audio tag control
Stability Modes
| Mode | Stability | Description | |------|-----------|-------------| | Creative | 0.3-0.5 | More emotional/expressive, may hallucinate | | Natural | 0.5-0.7 | Balanced, closest to original voice | | Robust | 0.7-1.0 | Highly stable, less responsive to tags |
For audio tags, use Creative (0.5) or Natural. Higher stability reduces tag responsiveness.
Speed Control
Range: 0.7 (slow) to 1.2 (fast), default 1.0
Extreme values affect quality. For pacing, prefer audio tags like [rushed] or [drawn out].
Critical Rules
Length Limits
- Optimal: <800 characters per segment (best quality)
- Maximum: 10,000 characters (API hard limit)
- Quality degrades with longer text - voice becomes inconsistent
Audio Tags - Best Practices for Natural Sound
How many tags to use:
- 1-2 tags per sentence or phrase (not more!)
- Tags persist until the next tag - no need to repeat
- Overusing tags sounds unnatural and robotic
Where to place tags:
- At emotional transition points
- Before key dramatic moments
- When energy/pace changes
Context matters:
- Write text that matches the tag emotion
- Longer text with context = better interpretation
- Example:
[nervous] I... I'm not sure about this. What if it doesn't work?works better than[nervous] Hello.
Combine tags for nuance:
[nervously][whispers]= nervous whispering[excited][laughs]= excited laughter- Keep combinations to 2 tags max
Regenerate for best results:
- v3 is non-deterministic - same text = different outputs
- Generate 3+ versions, pick the best
- Small text tweaks can improve results
Match tag to voice:
- Don't use
[shouts]on a whispering voice - Don't use
[whispers]on a loud/energetic voice - Test tags with your chosen voice
SSML Not Supported
v3 does NOT support SSML break tags. Use audio tags and punctuation instead.
Punctuation Effects (use with tags!)
Punctuation enhances audio tags:
- Ellipses (...) → dramatic pauses:
[nervous] I... I don't know... - CAPS → emphasis:
[excited] That's AMAZING! - Dashes (—) → interruptions:
[explaining] So what you do is— [interrupting] Wait! - Question marks → uncertainty:
[nervous] Are you sure about this? - Exclamation! → energy boost:
[happy] We did it!
Combine tags + punctuation for maximum effect:
[tired] It was a long day... [sighs] Nobody listens anymore.
WhatsApp Voice Messages
Complete Workflow
- Generate with
ttstool (returns MP3) - Convert to Opus (required for Android!)
- Send with
messagetool
Step-by-Step
1. Generate TTS (add [pause] at end to prevent cutoff):
tts text="[excited] This is amazing! [pause]" channel=whatsapp
Returns: MEDIA:/tmp/tts-xxx/voice-123.mp3
2. Convert MP3 → Opus:
ffmpeg -i /tmp/tts-xxx/voice-123.mp3 -c:a libopus -b:a 64k -vbr on -application voip /tmp/tts-xxx/voice-123.ogg
3. Send the Opus file:
Note: The
messagefield below contains a Unicode Left-to-Right Mark (U+200E) between the quotes. This is intentional — WhatsApp requires a non-empty message body to send voice notes. The LTR mark is invisible but satisfies this requirement without displaying any text.
message action=send channel=whatsapp target="+972..." filePath="/tmp/tts-xxx/voice-123.ogg" asVoice=true message=""
Why Opus?
| Format | iOS | Android | Transcribe | |--------|-----|---------|------------| | MP3 | ✅ Works | ❌ May fail | ❌ No | | Opus (.ogg) | ✅ Works | ✅ Works | ✅ Yes |
Always convert to Opus - it's the only format that:
- Works on all devices (iOS + Android)
- Supports WhatsApp's transcribe button
Audio Cutoff Fix
ElevenLabs sometimes cuts off the last word. Always add [pause] or ... at the end:
[excited] This is amazing! [pause]
Long-Form Audio (Podcasts)
For content >800 chars:
- Split into short segments (<800 chars each)
- Generate each with
ttstool - Concatenate with ffmpeg:
cat > list.txt << EOF file '/path/file1.mp3' file '/path/file2.mp3' EOF ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3 - Convert to Opus for WhatsApp
- Send as single voice message
Important: Don't mention "part 2" or "chapter" - keep it seamless.
Multi-Speaker Dialogue
v3 can handle multiple characters in one generation:
Jessica: [whispers] Did you hear that?
Chris: [interrupting] —I heard it too!
Jessica: [panicking] We need to hide!
Dialogue tags: [interrupting], [overlapping], [cuts in], [interjecting]
Audio Tags Quick Reference
| Category | Tags | When to Use | |----------|------|-------------| | Emotions | [excited], [happy], [sad], [angry], [nervous], [curious] | Main emotional state - use 1 per section | | Delivery | [whispers], [shouts], [soft], [rushed], [drawn out] | Volume/speed changes | | Reactions | [laughs], [sighs], [gasps], [clears throat], [gulps] | Natural human moments - sprinkle sparingly | | Pacing | [pause], [hesitates], [stammers], [breathes] | Dramatic timing | | Character | [French accent], [British accent], [robotic tone] | Character voice shifts | | Dialogue | [interrupting], [overlapping], [cuts in] | Multi-speaker conversations |
Most effective tags (reliable results):
- Emotions:
[excited],[nervous],[sad],[happy] - Reactions:
[laughs],[sighs],[whispers] - Pacing:
[pause]
Less reliable (test and regenerate):
- Sound effects:
[explosion],[gunshot] - Accents: results vary by voice
Full tag list: See references/audio-tags.md
Troubleshooting
Tags read aloud?
- Verify using
eleven_v3model - Use IVC/premade voices, not PVC
- Simplify tags (no "tone" suffix)
- Increase text length (250+ chars)
Voice inconsistent?
- Segment is too long - split at <800 chars
- Regenerate (v3 is non-deterministic)
- Try lower stability setting
WhatsApp won't play?
- Convert to Opus format (see above)
No emotion despite tags?
- Voice may not match tag style
- Try Creative stability mode (0.5)
- Add more context around the tag
File v2.2.0:_meta.json
{ "ownerId": "kn77700wny92h2kvpav2am1yjx80ewfp", "slug": "elevenlabs-tts", "version": "2.2.0", "publishedAt": 1771087774137 }
File v2.2.0:references/audio-tags.md
Audio Tags Reference
Complete guide to ElevenLabs v3 audio tags.
Prerequisites
- Model:
eleven_v3(alpha) - ONLY this model supports audio tags - Voice Type: IVC (Instant Voice Clone) or designed voices - PVC not optimized yet
- Prompt Length: 250+ characters for consistent results
- Stability: Creative or Natural mode (Robust reduces tag responsiveness)
Core Principle
Write NATURAL sentences that tags modify, NOT explanations.
❌ WRONG: [excited] אני מתרגש!
✅ RIGHT: [excited] זה ממש מדהים מה שעשינו היום!
Tag Categories
Emotions (High Reliability)
| Tag | Description |
|-----|-------------|
| [excited] | Energy, enthusiasm |
| [happy] | Joy, cheerfulness |
| [happily] | Speaking with happiness |
| [sad] | Sadness, melancholy |
| [sorrowful] | Deep sadness |
| [angry] | Anger, intensity |
| [curious] | Curiosity, interest |
| [nervous] | Nervousness, anxiety |
| [sarcastic] | Sarcasm, irony |
| [tired] | Fatigue, weariness |
| [serious] | Seriousness |
| [confident] | Confidence |
| [frustrated] | Frustration |
| [mischievous] | Playful mischief |
| [awe] | Wonder, amazement |
| [resigned] | Acceptance, giving up |
| [flustered] | Confused embarrassment |
| [casual] | Relaxed, informal |
| [annoyed] | Irritation |
Delivery & Volume (High Reliability)
| Tag | Description |
|-----|-------------|
| [whispers] | Quiet, intimate |
| [shouts] | Loud, intense |
| [dramatic tone] | Theatrical |
| [dramatic] | Dramatic delivery |
| [matter-of-fact] | Plain, factual |
| [whiny] | Complaining tone |
| [flatly] | No emotion |
| [quietly] | Soft voice |
| [suspiciously] | Suspicious tone |
Pacing & Timing (High Reliability)
| Tag | Description |
|-----|-------------|
| [pause] | Brief silence |
| [breathes] | Breathing sound |
| [continues after a beat] | Pause then continue |
| [rushed] | Fast, urgent |
| [slows down] | Decreasing speed |
| [deliberate] | Careful, intentional |
| [rapid-fire] | Very fast |
| [drawn out] | Stretched, slow |
| [stammers] | Stuttering |
| [hesitates] | Uncertainty |
| [timidly] | Shy, tentative |
| [repeats] | Repetition |
Emphasis (Medium Reliability)
| Tag | Description |
|-----|-------------|
| [emphasized] | Strong emphasis |
| [stress on next word] | Emphasize following word |
| [understated] | Downplayed delivery |
Reactions & Sounds (Very High Reliability)
| Tag | Description |
|-----|-------------|
| [laughs] | Laughter |
| [laughs softly] | Gentle laugh |
| [laughs harder] | Increasing laughter |
| [starts laughing] | Beginning to laugh |
| [nervous laugh] | Anxious laughter |
| [giggles] | Small laugh |
| [wheezing] | Breathless laugh |
| [sighs] | Exhale of emotion |
| [sigh] | Single sigh |
| [gasps] | Sharp intake |
| [exhales] | Breathing out |
| [clears throat] | Throat clearing |
| [gulps] | Swallowing |
| [swallows] | Swallowing sound |
| [snorts] | Snorting sound |
| [crying] | Sobbing |
Character & Accents (Medium Reliability)
| Tag | Description |
|-----|-------------|
| [French accent] | French accent |
| [American accent] | American accent |
| [British accent] | British accent |
| [Australian accent] | Australian accent |
| [Southern US accent] | Southern American |
| [strong X accent] | Replace X with accent |
| [pirate voice] | Pirate character |
| [evil scientist voice] | Mad scientist |
| [childlike tone] | Child-like voice |
| [robotic tone] | Robot voice |
| [deep voice] | Lower pitch |
Narrative & Genre (Medium Reliability)
| Tag | Description |
|-----|-------------|
| [storytelling tone] | Narrator voice |
| [voice-over style] | Documentary style |
| [fantasy narrator] | Epic fantasy |
| [sci-fi AI voice] | Futuristic AI |
| [classic film noir] | 1940s detective |
| [epic build-up] | Building intensity |
| [narrative flourish] | Dramatic narration |
Multi-Speaker Dialogue
| Tag | Description |
|-----|-------------|
| [interrupting] | Cutting off speaker |
| [overlapping] | Speaking over |
| [cuts in] | Interjecting |
| [interjecting] | Jumping in |
| [fast-paced] | Quick exchange |
Sound Effects (Low-Medium Reliability)
| Tag | Description |
|-----|-------------|
| [gunshot] | Gun sound |
| [clapping] | Applause |
| [applause] | Audience clapping |
| [explosion] | Blast sound |
| [thunder] | Thunder |
Experimental (Test First)
| Tag | Description |
|-----|-------------|
| [sings] | Singing |
| [woo] | Exclamation |
| [fart] | Sound effect |
| [panicked] | Panic |
| [trembling] | Shaking voice |
Usage Guidelines
✅ DO:
- Use simple tags:
[excited]not[excited tone] - Write natural sentences that work without tags
- Use 2-4 tags per paragraph max
- Place tags at sentence start or key moment
- Match tags to voice character
- Test and regenerate (v3 is non-deterministic)
- Combine tags:
[whispering][pause] Did you hear that?
❌ DON'T:
- Don't add "tone" suffix:
[serious tone]❌ - Don't overload with tags
- Don't explain what the tag does
- Don't use incompatible combos (whisper voice + shout tag)
- Don't expect consistency (regenerate if needed)
Examples
Emotional Monologue
[sighs] I've been thinking about what you said. [pause]
And you're right. [sadly] I should have listened earlier.
[determined] But I'm going to fix this. Starting now.
Multi-Character Dialogue
Sarah: [whispers] I think someone's coming.
Mike: [interrupting] —I heard it too! [panicked] Hide!
Sarah: [annoyed] I was TRYING to tell you that!
Comedic Timing
[confident] So I walked up to the boss and said...
[pause] [nervous laugh] Actually, I didn't say anything.
[sighs] I just stood there. [laughs] Classic me.
Accent Performance
[British accent] Terribly sorry, but I must insist.
[switches to Southern US accent] Well now, that's mighty kind of y'all.
[French accent] Mon ami, you simply must try ze croissant!
Troubleshooting
Tags being read aloud?
- Check you're using
eleven_v3(not turbo_v3 or v2.5) - Use IVC/designed voices, not PVC
- Simplify tags (remove "tone", "sound", etc.)
- Increase prompt length (250+ chars)
Tags not working?
- Generate multiple times (v3 is variable)
- Use Creative or Natural stability (not Robust)
- Add surrounding context text
- Try different tag placement
- Voice may not match tag style
Multi-speaker not distinct?
- Add character cues:
[deep voice],[higher pitch] - Use accent tags for differentiation
- Add emotional contrast between speakers
Archive v2.1.0: 4 files, 10870 bytes
Files: lib/audio_convert.py (4003b), references/audio-tags.md (7739b), SKILL.md (11540b), _meta.json (133b)
File v2.1.0:SKILL.md
name: elevenlabs-tts description: ElevenLabs TTS (Text-to-Speech) with emotional audio tags for expressive voice synthesis. WhatsApp-compatible voice messages with Opus conversion. Supports 70+ languages, Hebrew with selective nikud, multi-speaker dialogue, and singing. Includes audio converter utility. tags: [elevenlabs, tts, voice, text-to-speech, audio, speech, whatsapp, multilingual, ai-voice, hebrew, nikud, singing] allowed-tools: [tts, message, exec]
ElevenLabs TTS (Text-to-Speech)
Generate expressive voice messages using ElevenLabs v3 with audio tags.
Quick Start Examples
Storytelling (emotional journey):
[soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything!
Horror/Suspense (building dread):
[whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself!
Conversation with reactions:
[curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now.
Hebrew (romantic moment - selective nikud only where needed):
[soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] אַתְּ יודעת שאני אוהב אותָךְ, נכון?
Spanish (celebration to reflection):
[excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento.
Configuration (OpenClaw)
In openclaw.json, configure TTS under messages.tts:
{
"messages": {
"tts": {
"provider": "elevenlabs",
"elevenlabs": {
"apiKey": "sk_your_api_key_here",
"voiceId": "YOUR_VOICE_ID",
"modelId": "eleven_v3",
"languageCode": "en",
"voiceSettings": {
"stability": 0.5,
"similarityBoost": 0.75,
"style": 0,
"useSpeakerBoost": true,
"speed": 1
}
}
}
}
}
Getting your API Key:
- Go to https://elevenlabs.io
- Sign up/login
- Click profile → API Keys
- Copy your key
Recommended Voices for v3
These premade voices are optimized for v3 and work well with audio tags:
| Voice | ID | Gender | Accent | Best For |
|-------|-----|--------|--------|----------|
| Adam | pNInz6obpgDQGcFmaJgB | Male | American | Deep narration, general use |
| Rachel | 21m00Tcm4TlvDq8ikWAM | Female | American | Calm narration, conversational |
| Brian | nPczCjzI2devNBz1zQrb | Male | American | Deep narration, podcasts |
| Charlotte | XB0fDUnXU5powFXDhCwa | Female | English-Swedish | Expressive, video games |
| George | JBFqnCBsd6RMkjVDRZzb | Male | British | Raspy narration, storytelling |
Finding more voices:
- Browse: https://elevenlabs.io/voice-library
- v3-optimized collection: https://elevenlabs.io/app/voice-library/collections/aF6JALq9R6tXwCczjhKH
- API:
GET https://api.elevenlabs.io/v1/voices
Voice selection tips:
- Use IVC (Instant Voice Clone) or premade voices - PVC not optimized for v3 yet
- Match voice character to your use case (whispering voice won't shout well)
- For expressive IVCs, include varied emotional tones in training samples
Model Settings
- Model:
eleven_v3(alpha) - ONLY model supporting audio tags - Languages: 70+ supported with full audio tag control
Stability Modes
v3 only accepts three values: 0.0, 0.5, 1.0
| Mode | Value | Description | |------|-------|-------------| | Creative | 0.0 | Most emotional/expressive, best for singing, may hallucinate | | Natural | 0.5 | Balanced, closest to original voice | | Robust | 1.0 | Highly stable, less responsive to tags |
For audio tags, use Creative (0.0) or Natural (0.5). Robust reduces tag responsiveness.
Speed Control
Range: 0.7 (slow) to 1.2 (fast), default 1.0
Extreme values affect quality. For pacing, prefer audio tags like [rushed] or [drawn out].
Hebrew Nikud (Vowel Points)
Use nikud selectively - only on words where pronunciation is ambiguous. Full nikud on every word can degrade quality.
The rule: only add nikud where the model might guess wrong.
Common cases where nikud helps:
- Gender suffixes - שלומֵךְ (f) vs שלומְךָ (m), לָךְ (f) vs לְךָ (m), אותָךְ (f) vs אותְךָ (m)
- Dagesh (hard/soft consonants) - letters בכפ change sound with dagesh:
- פּ = P, פ = F: פִּיצה (pizza), פִּייר (Pierre)
- בּ = B, ב = V: בְּרָכָה (brakha), בְּדִיוּק (bediyuk)
- כּ = K, כ = Kh: כּוֹס (kos), כַּמָּה (kama)
- Homographs - same spelling, different meaning/pronunciation:
- בּוֹקֶר (morning) vs בּוֹקֵר (cowboy)
- עוֹלָם (world) vs עוֹלֵם (concealing)
- סֵפֶר (book) vs סָפַר (counted)
- Foreign names and loanwords - the model often guesses wrong
- Stress placement - when it changes meaning or sounds unnatural
When NOT to add nikud:
- Common words with only one pronunciation (מה, יש, הרבה, שלום, אני, הוא, etc.)
- Context makes pronunciation obvious
- Most of the sentence - keep it clean
Example:
❌ Full nikud: מַה שְׁלוֹמְךָ? יֵשׁ לְךָ הַרְבֵּה כֶּסֶף.
✅ Selective: מה שלומְךָ? יש לְךָ הרבה כסף.
✅ Dagesh: ז'אן-פִּייר אפה פִּיצה מושלמת.
Principle: If you read the word and there's only one way to say it - skip the nikud. If there's ambiguity - add it.
Critical Rules
Length Limits
- Optimal: <800 characters per segment (best quality)
- Maximum: 10,000 characters (API hard limit)
- Quality degrades with longer text - voice becomes inconsistent
Audio Tags - Best Practices for Natural Sound
How many tags to use:
- 1-2 tags per sentence or phrase (not more!)
- Tags persist until the next tag - no need to repeat
- Overusing tags sounds unnatural and robotic
Where to place tags:
- At emotional transition points
- Before key dramatic moments
- When energy/pace changes
Context matters:
- Write text that matches the tag emotion
- Longer text with context = better interpretation
- Example:
[nervous] I... I'm not sure about this. What if it doesn't work?works better than[nervous] Hello.
Combine tags for nuance:
[nervously][whispers]= nervous whispering[excited][laughs]= excited laughter- Keep combinations to 2 tags max
Regenerate for best results:
- v3 is non-deterministic - same text = different outputs
- Generate 3+ versions, pick the best
- Small text tweaks can improve results
Match tag to voice:
- Don't use
[shouts]on a whispering voice - Don't use
[whispers]on a loud/energetic voice - Test tags with your chosen voice
SSML Not Supported
v3 does NOT support SSML break tags. Use audio tags and punctuation instead.
Punctuation Effects (use with tags!)
Punctuation enhances audio tags:
- Ellipses (...) → dramatic pauses:
[nervous] I... I don't know... - CAPS → emphasis:
[excited] That's AMAZING! - Dashes (—) → interruptions:
[explaining] So what you do is— [interrupting] Wait! - Question marks → uncertainty:
[nervous] Are you sure about this? - Exclamation! → energy boost:
[happy] We did it!
Combine tags + punctuation for maximum effect:
[tired] It was a long day... [sighs] Nobody listens anymore.
WhatsApp Voice Messages
Complete Workflow
- Generate with
ttstool (returns MP3) - Convert to Opus (required for Android!)
- Send with
messagetool
Step-by-Step
1. Generate TTS (add [pause] at end to prevent cutoff):
tts text="[excited] This is amazing! [pause]" channel=whatsapp
Returns: MEDIA:/tmp/tts-xxx/voice-123.mp3
2. Convert MP3 → Opus using the included converter:
python3 lib/audio_convert.py convert /tmp/tts-xxx/voice-123.mp3 /tmp/tts-xxx/voice-123.ogg
3. Send the Opus file:
message action=send channel=whatsapp target="+972..." filePath="/tmp/tts-xxx/voice-123.ogg" asVoice=true message=""
Why Opus?
| Format | iOS | Android | Transcribe | |--------|-----|---------|------------| | MP3 | ✅ Works | ❌ May fail | ❌ No | | Opus (.ogg) | ✅ Works | ✅ Works | ✅ Yes |
Always convert to Opus - it's the only format that:
- Works on all devices (iOS + Android)
- Supports WhatsApp's transcribe button
Audio Cutoff Fix
ElevenLabs sometimes cuts off the last word. Always add [pause] or ... at the end:
[excited] This is amazing! [pause]
Long-Form Audio (Podcasts)
For content >800 chars:
- Split into short segments (<800 chars each)
- Generate each with
ttstool - Concatenate using the included converter:
python3 lib/audio_convert.py concat /tmp/final.mp3 /tmp/part1.mp3 /tmp/part2.mp3 - Convert to Opus for WhatsApp:
python3 lib/audio_convert.py convert /tmp/final.mp3 /tmp/final.ogg - Send as single voice message
Important: Don't mention "part 2" or "chapter" - keep it seamless.
Multi-Speaker Dialogue
v3 can handle multiple characters in one generation:
Jessica: [whispers] Did you hear that?
Chris: [interrupting] —I heard it too!
Jessica: [panicking] We need to hide!
Dialogue tags: [interrupting], [overlapping], [cuts in], [interjecting]
Audio Tags Quick Reference
| Category | Tags | When to Use | |----------|------|-------------| | Emotions | [excited], [happy], [sad], [angry], [nervous], [curious] | Main emotional state - use 1 per section | | Delivery | [whispers], [shouts], [soft], [rushed], [drawn out] | Volume/speed changes | | Reactions | [laughs], [sighs], [gasps], [clears throat], [gulps] | Natural human moments - sprinkle sparingly | | Pacing | [pause], [hesitates], [stammers], [breathes] | Dramatic timing | | Character | [French accent], [British accent], [robotic tone] | Character voice shifts | | Dialogue | [interrupting], [overlapping], [cuts in] | Multi-speaker conversations |
Most effective tags (reliable results):
- Emotions:
[excited],[nervous],[sad],[happy] - Reactions:
[laughs],[sighs],[whispers] - Pacing:
[pause]
Less reliable (test and regenerate):
- Sound effects:
[explosion],[gunshot] - Accents: results vary by voice
Full tag list: See references/audio-tags.md
Troubleshooting
Tags read aloud?
- Verify using
eleven_v3model - Use IVC/premade voices, not PVC
- Simplify tags (no "tone" suffix)
- Increase text length (250+ chars)
Voice inconsistent?
- Segment is too long - split at <800 chars
- Regenerate (v3 is non-deterministic)
- Try lower stability setting
WhatsApp won't play?
- Convert to Opus format (see above)
No emotion despite tags?
- Voice may not match tag style
- Try Creative stability mode (0.0)
- Add more context around the tag
File v2.1.0:_meta.json
{ "ownerId": "kn77700wny92h2kvpav2am1yjx80ewfp", "slug": "elevenlabs-tts", "version": "2.1.0", "publishedAt": 1770644808695 }
File v2.1.0:references/audio-tags.md
Audio Tags Reference
Complete guide to ElevenLabs v3 audio tags.
Prerequisites
- Model:
eleven_v3(alpha) - ONLY this model supports audio tags - Voice Type: IVC (Instant Voice Clone) or designed voices - PVC not optimized yet
- Prompt Length: 250+ characters for consistent results
- Stability: Creative or Natural mode (Robust reduces tag responsiveness)
Core Principle
Write NATURAL sentences that tags modify, NOT explanations.
❌ WRONG: [excited] אני מתרגש!
✅ RIGHT: [excited] זה ממש מדהים מה שעשינו היום!
Tag Categories
Emotions (High Reliability)
| Tag | Description |
|-----|-------------|
| [excited] | Energy, enthusiasm |
| [happy] | Joy, cheerfulness |
| [happily] | Speaking with happiness |
| [sad] | Sadness, melancholy |
| [sorrowful] | Deep sadness |
| [angry] | Anger, intensity |
| [curious] | Curiosity, interest |
| [nervous] | Nervousness, anxiety |
| [sarcastic] | Sarcasm, irony |
| [tired] | Fatigue, weariness |
| [serious] | Seriousness |
| [confident] | Confidence |
| [frustrated] | Frustration |
| [mischievous] | Playful mischief |
| [awe] | Wonder, amazement |
| [resigned] | Acceptance, giving up |
| [flustered] | Confused embarrassment |
| [casual] | Relaxed, informal |
| [annoyed] | Irritation |
Delivery & Volume (High Reliability)
| Tag | Description |
|-----|-------------|
| [whispers] | Quiet, intimate |
| [shouts] | Loud, intense |
| [dramatic tone] | Theatrical |
| [dramatic] | Dramatic delivery |
| [matter-of-fact] | Plain, factual |
| [whiny] | Complaining tone |
| [flatly] | No emotion |
| [quietly] | Soft voice |
| [suspiciously] | Suspicious tone |
Pacing & Timing (High Reliability)
| Tag | Description |
|-----|-------------|
| [pause] | Brief silence |
| [breathes] | Breathing sound |
| [continues after a beat] | Pause then continue |
| [rushed] | Fast, urgent |
| [slows down] | Decreasing speed |
| [deliberate] | Careful, intentional |
| [rapid-fire] | Very fast |
| [drawn out] | Stretched, slow |
| [stammers] | Stuttering |
| [hesitates] | Uncertainty |
| [timidly] | Shy, tentative |
| [repeats] | Repetition |
Emphasis (Medium Reliability)
| Tag | Description |
|-----|-------------|
| [emphasized] | Strong emphasis |
| [stress on next word] | Emphasize following word |
| [understated] | Downplayed delivery |
Reactions & Sounds (Very High Reliability)
| Tag | Description |
|-----|-------------|
| [laughs] | Laughter |
| [laughs softly] | Gentle laugh |
| [laughs harder] | Increasing laughter |
| [starts laughing] | Beginning to laugh |
| [nervous laugh] | Anxious laughter |
| [giggles] | Small laugh |
| [wheezing] | Breathless laugh |
| [sighs] | Exhale of emotion |
| [sigh] | Single sigh |
| [gasps] | Sharp intake |
| [exhales] | Breathing out |
| [clears throat] | Throat clearing |
| [gulps] | Swallowing |
| [swallows] | Swallowing sound |
| [snorts] | Snorting sound |
| [crying] | Sobbing |
Character & Accents (Medium Reliability)
| Tag | Description |
|-----|-------------|
| [French accent] | French accent |
| [American accent] | American accent |
| [British accent] | British accent |
| [Australian accent] | Australian accent |
| [Southern US accent] | Southern American |
| [strong X accent] | Replace X with accent |
| [pirate voice] | Pirate character |
| [evil scientist voice] | Mad scientist |
| [childlike tone] | Child-like voice |
| [robotic tone] | Robot voice |
| [deep voice] | Lower pitch |
Narrative & Genre (Medium Reliability)
| Tag | Description |
|-----|-------------|
| [storytelling tone] | Narrator voice |
| [voice-over style] | Documentary style |
| [fantasy narrator] | Epic fantasy |
| [sci-fi AI voice] | Futuristic AI |
| [classic film noir] | 1940s detective |
| [epic build-up] | Building intensity |
| [narrative flourish] | Dramatic narration |
Multi-Speaker Dialogue
| Tag | Description |
|-----|-------------|
| [interrupting] | Cutting off speaker |
| [overlapping] | Speaking over |
| [cuts in] | Interjecting |
| [interjecting] | Jumping in |
| [fast-paced] | Quick exchange |
Sound Effects (Low-Medium Reliability)
| Tag | Description |
|-----|-------------|
| [gunshot] | Gun sound |
| [clapping] | Applause |
| [applause] | Audience clapping |
| [explosion] | Blast sound |
| [thunder] | Thunder |
Experimental (Test First)
| Tag | Description |
|-----|-------------|
| [sings] | Singing |
| [woo] | Exclamation |
| [fart] | Sound effect |
| [panicked] | Panic |
| [trembling] | Shaking voice |
Usage Guidelines
✅ DO:
- Use simple tags:
[excited]not[excited tone] - Write natural sentences that work without tags
- Use 2-4 tags per paragraph max
- Place tags at sentence start or key moment
- Match tags to voice character
- Test and regenerate (v3 is non-deterministic)
- Combine tags:
[whispering][pause] Did you hear that?
❌ DON'T:
- Don't add "tone" suffix:
[serious tone]❌ - Don't overload with tags
- Don't explain what the tag does
- Don't use incompatible combos (whisper voice + shout tag)
- Don't expect consistency (regenerate if needed)
Examples
Emotional Monologue
[sighs] I've been thinking about what you said. [pause]
And you're right. [sadly] I should have listened earlier.
[determined] But I'm going to fix this. Starting now.
Multi-Character Dialogue
Sarah: [whispers] I think someone's coming.
Mike: [interrupting] —I heard it too! [panicked] Hide!
Sarah: [annoyed] I was TRYING to tell you that!
Comedic Timing
[confident] So I walked up to the boss and said...
[pause] [nervous laugh] Actually, I didn't say anything.
[sighs] I just stood there. [laughs] Classic me.
Accent Performance
[British accent] Terribly sorry, but I must insist.
[switches to Southern US accent] Well now, that's mighty kind of y'all.
[French accent] Mon ami, you simply must try ze croissant!
Singing
The [singing] tag can produce melodic intonation. Results are inconsistent - v3 is a TTS model, not a music model.
Format (tag on its own line before lyrics):
[singing]
Oh Tommy boy, the pipes the pipes are calling,
from glen to glen and down the mountain side.
Best settings for singing:
- Stability: Creative (0.0) - most expressive, best for singing
- Voice: Use v3-optimized premade voices (Adam, Charlotte, etc.)
- Language: English works best; Hebrew is less reliable for singing
- Non-deterministic: Generate multiple times - each result is different
Tips:
- Put
[singing]on its own line before lyrics - Use known songs the model might recognize
- Stack with emotion:
[happy]\n[singing]\nlyrics... - Keep lyrics short per generation
Limitations:
- Not real singing with full melody - more like melodic speech
- Results vary heavily by voice and generation
- For actual music generation, use Suno or Udio
Troubleshooting
Tags being read aloud?
- Check you're using
eleven_v3(not turbo_v3 or v2.5) - Use IVC/designed voices, not PVC
- Simplify tags (remove "tone", "sound", etc.)
- Increase prompt length (250+ chars)
Tags not working?
- Generate multiple times (v3 is variable)
- Use Creative or Natural stability (not Robust)
- Add surrounding context text
- Try different tag placement
- Voice may not match tag style
Multi-speaker not distinct?
- Add character cues:
[deep voice],[higher pitch] - Use accent tags for differentiation
- Add emotional contrast between speakers
API & Reliability
Machine endpoints, contract coverage, trust signals, runtime metrics, benchmarks, and guardrails for agent-to-agent use.
MissingCLAWHUB
API & Reliability
Machine endpoints, contract coverage, trust signals, runtime metrics, benchmarks, and guardrails for agent-to-agent use.
Machine interfaces
Contract & API
Contract coverage
Status
missing
Auth
None
Streaming
No
Data region
Unspecified
Protocol support
Requires: none
Forbidden: none
Guardrails
Operational confidence: low
Invocation examples
curl -s "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/snapshot"
curl -s "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/contract"
curl -s "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/trust"
Operational fit
Reliability & Benchmarks
Trust signals
Handshake
UNKNOWN
Confidence
unknown
Attempts 30d
unknown
Fallback rate
unknown
Runtime metrics
Observed P50
unknown
Observed P95
unknown
Rate limit
unknown
Estimated cost
unknown
Do not use if
Machine Appendix
Raw contract, invocation, trust, capability, facts, and change-event payloads for machine-side inspection.
MissingCLAWHUB
Machine Appendix
Raw contract, invocation, trust, capability, facts, and change-event payloads for machine-side inspection.
Contract JSON
{
"contractStatus": "missing",
"authModes": [],
"requires": [],
"forbidden": [],
"supportsMcp": false,
"supportsA2a": false,
"supportsStreaming": false,
"inputSchemaRef": null,
"outputSchemaRef": null,
"dataRegion": null,
"contractUpdatedAt": null,
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Invocation Guide
{
"preferredApi": {
"snapshotUrl": "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/snapshot",
"contractUrl": "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/contract",
"trustUrl": "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/trust"
},
"curlExamples": [
"curl -s \"https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/snapshot\"",
"curl -s \"https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/contract\"",
"curl -s \"https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/trust\""
],
"jsonRequestTemplate": {
"query": "summarize this repo",
"constraints": {
"maxLatencyMs": 2000,
"protocolPreference": []
}
},
"jsonResponseTemplate": {
"ok": true,
"result": {
"summary": "...",
"confidence": 0.9
},
"meta": {
"source": "CLAWHUB",
"generatedAt": "2026-04-17T02:55:20.513Z"
}
},
"retryPolicy": {
"maxAttempts": 3,
"backoffMs": [
500,
1500,
3500
],
"retryableConditions": [
"HTTP_429",
"HTTP_503",
"NETWORK_TIMEOUT"
]
}
}Trust JSON
{
"status": "unavailable",
"handshakeStatus": "UNKNOWN",
"verificationFreshnessHours": null,
"reputationScore": null,
"p95LatencyMs": null,
"successRate30d": null,
"fallbackRate": null,
"attempts30d": null,
"trustUpdatedAt": null,
"trustConfidence": "unknown",
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Capability Matrix
{
"rows": [],
"flattenedTokens": ""
}Facts JSON
[
{
"factKey": "vendor",
"category": "vendor",
"label": "Vendor",
"value": "Clawhub",
"href": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
"sourceUrl": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-04-15T00:45:39.800Z",
"isPublic": true
},
{
"factKey": "traction",
"category": "adoption",
"label": "Adoption signal",
"value": "4.5K downloads",
"href": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
"sourceUrl": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-04-15T00:45:39.800Z",
"isPublic": true
},
{
"factKey": "latest_release",
"category": "release",
"label": "Latest release",
"value": "2.2.0",
"href": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
"sourceUrl": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
"sourceType": "release",
"confidence": "medium",
"observedAt": "2026-02-14T16:49:34.137Z",
"isPublic": true
},
{
"factKey": "handshake_status",
"category": "security",
"label": "Handshake status",
"value": "UNKNOWN",
"href": "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/trust",
"sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-shaharsha-elevenlabs-tts/trust",
"sourceType": "trust",
"confidence": "medium",
"observedAt": null,
"isPublic": true
}
]Change Events JSON
[
{
"eventType": "release",
"title": "Release 2.2.0",
"description": "Security scan fixes",
"href": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
"sourceUrl": "https://clawhub.ai/Shaharsha/elevenlabs-tts",
"sourceType": "release",
"confidence": "medium",
"observedAt": "2026-02-14T16:49:34.137Z",
"isPublic": true
}
]