Rank
70
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
Traction
No public download signal
Freshness
Updated 2d ago
Crawler Summary
Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading articles aloud, audiobook narration, or voice responses. Runs entirely on-device via MLX - private, no API keys. --- name: speak-tts description: Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading articles aloud, audiobook narration, or voice responses. Runs entirely on-device via MLX - private, no API keys. --- speak - Talk to your Claude! Give your agent the ability to speak to you real-ti Capability contract not published. No trust telemetry is available yet. 6 GitHub stars reported by the source. Last updated 4/15/2026.
Freshness
Last checked 4/15/2026
Best For
speak-tts is best for general automation workflows where OpenClaw compatibility matters.
Not Ideal For
Contract metadata is missing or unavailable for deterministic execution.
Evidence Sources Checked
editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack
Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading articles aloud, audiobook narration, or voice responses. Runs entirely on-device via MLX - private, no API keys. --- name: speak-tts description: Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading articles aloud, audiobook narration, or voice responses. Runs entirely on-device via MLX - private, no API keys. --- speak - Talk to your Claude! Give your agent the ability to speak to you real-ti
Public facts
5
Change events
1
Artifacts
0
Freshness
Apr 15, 2026
Capability contract not published. No trust telemetry is available yet. 6 GitHub stars reported by the source. Last updated 4/15/2026.
Trust score
Unknown
Compatibility
OpenClaw
Freshness
Apr 15, 2026
Vendor
Emzod
Artifacts
0
Benchmarks
0
Last release
Unpublished
Key links, install path, and a quick operational read before the deeper crawl record.
Summary
Capability contract not published. No trust telemetry is available yet. 6 GitHub stars reported by the source. Last updated 4/15/2026.
Setup snapshot
git clone https://github.com/EmZod/speak.gitSetup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.
Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.
Vendor
Emzod
Protocol compatibility
OpenClaw
Adoption signal
6 GitHub stars
Handshake status
UNKNOWN
Crawlable docs
6 indexed pages on the official domain
Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.
Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.
Extracted files
0
Examples
6
Snippets
0
Languages
typescript
Parameters
bash
lynx -dump -nolist "https://example.com/article" | speak --output article.wav
bash
speak article.txt # → ~/Audio/speak/article.wav (no playback) speak "Hello" # → ~/Audio/speak/speak_<timestamp>.wav
bash
mkdir -p ~/.chatter/voices/ mkdir -p ~/Audio/custom/
bash
# -d = use default microphone # Recording starts immediately and stops after 25 seconds sox -d -r 24000 -c 1 ~/.chatter/voices/my_voice.wav trim 0 25
bash
# From MP3 ffmpeg -i voice.mp3 -ar 24000 -ac 1 voice.wav # From M4A (QuickTime) ffmpeg -i voice.m4a -ar 24000 -ac 1 voice.wav # Trim to 25 seconds ffmpeg -i long.wav -t 25 -ar 24000 -ac 1 trimmed.wav # Check sample properties ffprobe -i voice.wav 2>&1 | grep -E "Duration|Stream" # Should show: Duration ~15-25s, 24000 Hz, mono
bash
# Create directory mkdir -p ~/.chatter/voices/ # Move sample mv voice.wav ~/.chatter/voices/my_voice.wav # Test speak "Testing my voice" --voice ~/.chatter/voices/my_voice.wav --stream # Use for content speak notes.txt --voice ~/.chatter/voices/my_voice.wav --output presentation.wav
Full documentation captured from public sources, including the complete README when available.
Docs source
GITHUB OPENCLEW
Editorial quality
ready
Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading articles aloud, audiobook narration, or voice responses. Runs entirely on-device via MLX - private, no API keys. --- name: speak-tts description: Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading articles aloud, audiobook narration, or voice responses. Runs entirely on-device via MLX - private, no API keys. --- speak - Talk to your Claude! Give your agent the ability to speak to you real-ti
Give your agent the ability to speak to you real-time. Local text-to-speech, voice cloning, and audio generation on Apple Silicon. Give your agent the ability to speak to you real-time. Local TTS with voice cloning on Apple Silicon.
| Requirement | Check | Install |
|-------------|-------|---------|
| Apple Silicon Mac | uname -m → arm64 | Intel not supported |
| macOS 12.0+ | sw_vers | - |
| sox | which sox | brew install sox |
| ffmpeg | which ffmpeg | brew install ffmpeg |
| poppler (PDF) | which pdftotext | brew install poppler |
| Source | Example |
|--------|---------|
| Text file | speak article.txt |
| Markdown | speak doc.md |
| Direct string | speak "Hello" |
| Clipboard | pbpaste \| speak |
| Stdin | cat file.txt \| speak |
lynx -dump -nolist "https://example.com/article" | speak --output article.wav
| Format | Convert Command |
|--------|-----------------|
| PDF | pdftotext doc.pdf doc.txt |
| DOCX | textutil -convert txt doc.docx |
| HTML | pandoc -f html -t plain doc.html > doc.txt |
| Goal | Command |
|------|---------|
| Save for later | speak text.txt --output file.wav |
| Listen now (streaming) | speak text.txt --stream |
| Listen now (complete) | speak text.txt --play |
| Both | speak text.txt --stream --output file.wav |
speak article.txt # → ~/Audio/speak/article.wav (no playback)
speak "Hello" # → ~/Audio/speak/speak_<timestamp>.wav
| Directory | Auto-Created? |
|-----------|---------------|
| ~/Audio/speak/ | ✓ Yes |
| ~/.chatter/voices/ | ✗ No |
| Custom directories | ✗ No |
Always create custom directories first:
mkdir -p ~/.chatter/voices/
mkdir -p ~/Audio/custom/
Voice cloning generates speech that matches your vocal characteristics (pitch, tone, cadence) from a short recording.
Using QuickTime:
Using sox (command line):
# -d = use default microphone
# Recording starts immediately and stops after 25 seconds
sox -d -r 24000 -c 1 ~/.chatter/voices/my_voice.wav trim 0 25
Voice samples MUST be: WAV, 24000 Hz, mono, 10-30 seconds.
# From MP3
ffmpeg -i voice.mp3 -ar 24000 -ac 1 voice.wav
# From M4A (QuickTime)
ffmpeg -i voice.m4a -ar 24000 -ac 1 voice.wav
# Trim to 25 seconds
ffmpeg -i long.wav -t 25 -ar 24000 -ac 1 trimmed.wav
# Check sample properties
ffprobe -i voice.wav 2>&1 | grep -E "Duration|Stream"
# Should show: Duration ~15-25s, 24000 Hz, mono
# Create directory
mkdir -p ~/.chatter/voices/
# Move sample
mv voice.wav ~/.chatter/voices/my_voice.wav
# Test
speak "Testing my voice" --voice ~/.chatter/voices/my_voice.wav --stream
# Use for content
speak notes.txt --voice ~/.chatter/voices/my_voice.wav --output presentation.wav
Path requirements:
~/.chatter/voices/my_voice.wav (tilde expanded by shell)/Users/name/.chatter/voices/my_voice.wavmy_voice.wav (relative path)./voices/my_voice.wav (relative path)| Good Sample | Bad Sample | |-------------|------------| | Quiet room | Background noise | | Natural pace | Rushed or monotone | | Clear diction | Mumbling | | Varied content | Repetitive phrases |
When --voice is omitted, a built-in default voice is used:
speak "Hello world" --stream # Uses default voice
Tags produce audible effects (actual sounds), not spoken words:
speak "[sigh] Monday again." --stream
# Output: (sigh sound) "Monday again."
| Tag | Effect |
|-----|--------|
| [laugh] | Laughter |
| [chuckle] | Light chuckle |
| [sigh] | Sighing |
| [gasp] | Gasping |
| [groan] | Groaning |
| [clear throat] | Throat clearing |
| [cough] | Coughing |
| [crying] | Crying |
| [singing] | Sung speech |
NOT supported: [pause], [whisper] (ignored)
For pauses: Use punctuation: "Wait... let me think."
mkdir -p ~/Audio/book/
speak ch01.txt ch02.txt ch03.txt --output-dir ~/Audio/book/
# Creates: ch01.wav, ch02.wav, ch03.wav
# With auto-chunking (for long files)
speak chapters/*.txt --output-dir ~/Audio/book/ --auto-chunk
# Skip completed files
speak chapters/*.txt --output-dir ~/Audio/book/ --skip-existing
When using --auto-chunk with batch processing:
.wav per input file (e.g., ch01.wav)--keep-chunks)You don't need to manually concatenate chunks — only concatenate final chapter files.
# Explicit order (recommended)
speak concat ch01.wav ch02.wav ch03.wav --output book.wav
# Glob pattern (REQUIRES zero-padded filenames)
speak concat audiobook/*.wav --output book.wav
Critical for correct concatenation order:
| Files | Correct | Wrong |
|-------|---------|-------|
| 1-9 | 01, 02, ..., 09 | 1, 2, ..., 9 |
| 10-99 | 01, 02, ..., 99 | 1, 10, 2, ... |
| 100+ | 001, 002, ..., 999 | 1, 100, 2, ... |
Why: Shell glob expansion sorts alphabetically. 1, 10, 2 vs 01, 02, 10.
# Preview table of contents
pdftotext -f 1 -l 5 textbook.pdf toc.txt
cat toc.txt # Note chapter page numbers
# Or search for "Chapter" markers
pdftotext textbook.pdf - | grep -n "Chapter"
# For 100-page book with ~10 chapters
pdftotext -f 1 -l 12 -layout textbook.pdf ch01.txt
pdftotext -f 13 -l 25 -layout textbook.pdf ch02.txt
pdftotext -f 26 -l 38 -layout textbook.pdf ch03.txt
# ... continue for all chapters
speak --estimate ch*.txt
# Shows: total audio duration, generation time, storage needed
# Quick estimates:
# 1 page ≈ 2 min audio ≈ 1 min generation
# 100 pages ≈ 200 min audio ≈ 100 min generation ≈ 500 MB
mkdir -p audiobook/
speak ch01.txt ch02.txt ch03.txt --output-dir audiobook/ --auto-chunk
# Creates: audiobook/ch01.wav, audiobook/ch02.wav, audiobook/ch03.wav
speak concat audiobook/ch01.wav audiobook/ch02.wav audiobook/ch03.wav --output complete_audiobook.wav
# Or with glob (only if zero-padded):
speak concat audiobook/ch*.wav --output complete_audiobook.wav
| Issue | Solution |
|-------|----------|
| Empty/garbled text | Scanned PDF — use OCR: brew install tesseract |
| Wrong encoding | Try: pdftotext -enc UTF-8 doc.pdf |
| Check word count | pdftotext doc.pdf - \| wc -w (should be >100) |
mkdir -p podcast/scripts podcast/wav
echo "Welcome to the show." > podcast/scripts/01_host.txt
echo "Thanks for having me." > podcast/scripts/02_guest.txt
speak podcast/scripts/01_host.txt --voice ~/.chatter/voices/host.wav --output podcast/wav/01.wav
speak podcast/scripts/02_guest.txt --voice ~/.chatter/voices/guest.wav --output podcast/wav/02.wav
speak concat podcast/wav/01.wav podcast/wav/02.wav --output podcast.wav
| Option | Description | Default |
|--------|-------------|---------|
| --stream | Stream as it generates | false |
| --play | Play after complete | false |
| --output <path> | Output file | ~/Audio/speak/ |
| --output-dir <dir> | Batch output directory | - |
| --voice <path> | Voice sample (full path) | default |
| --timeout <sec> | Timeout per file | 300 |
| --auto-chunk | Split long documents | false |
| --chunk-size <n> | Chars per chunk | 6000 |
| --resume <file> | Resume from manifest | - |
| --keep-chunks | Keep intermediate files | false |
| --skip-existing | Skip if output exists | false |
| --estimate | Show duration estimate | false |
| --dry-run | Preview only | false |
| --quiet | Suppress output | false |
| Command | Description |
|---------|-------------|
| speak setup | Set up environment |
| speak health | Check system status |
| speak models | List TTS models |
| speak concat | Concatenate audio |
| speak daemon kill | Stop TTS server |
| speak config | Show configuration |
| Metric | Value | |--------|-------| | Cold start | ~4-8s | | Warm start | ~3-8s | | Speed | 0.3-0.5x RTF (faster than real-time) | | Storage | ~2.5 MB/min, ~150 MB/hour |
For interrupted long generations:
# Single file with auto-chunk — use --resume
speak long.txt --auto-chunk --output book.wav
# If interrupted, manifest saved at ~/Audio/speak/manifest.json
speak --resume ~/Audio/speak/manifest.json
# Batch processing — use --skip-existing
speak ch*.txt --output-dir audiobook/ --auto-chunk
# If interrupted, re-run same command:
speak ch*.txt --output-dir audiobook/ --auto-chunk --skip-existing
| Error | Cause | Solution |
|-------|-------|----------|
| "Voice file not found" | Relative path | Use full path: ~/.chatter/voices/x.wav |
| "Invalid WAV format" | Wrong specs | Convert: ffmpeg -i in.wav -ar 24000 -ac 1 out.wav |
| "Voice sample too short" | <10 seconds | Record 15-25 seconds |
| "Output directory doesn't exist" | Not created | mkdir -p dirname/ |
| "sox not found" | Not installed | brew install sox |
| Scrambled concat order | Non-zero-padded | Use 01, 02, not 1, 2 |
| Timeout | >5 min generation | Use --auto-chunk or --timeout 600 |
| "Server not running" | Stale daemon | speak daemon kill && speak health |
speak "test" # Auto-setup on first run (downloads model ~500MB)
speak setup # Or manual setup
speak health # Verify everything works
Server auto-starts and shuts down after 1 hour idle.
speak health # Check status
speak daemon kill # Stop manually
Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.
Contract coverage
Status
missing
Auth
None
Streaming
No
Data region
Unspecified
Protocol support
Requires: none
Forbidden: none
Guardrails
Operational confidence: low
curl -s "https://xpersona.co/api/v1/agents/emzod-speak/snapshot"
curl -s "https://xpersona.co/api/v1/agents/emzod-speak/contract"
curl -s "https://xpersona.co/api/v1/agents/emzod-speak/trust"
Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.
Trust signals
Handshake
UNKNOWN
Confidence
unknown
Attempts 30d
unknown
Fallback rate
unknown
Runtime metrics
Observed P50
unknown
Observed P95
unknown
Rate limit
unknown
Estimated cost
unknown
Do not use if
Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.
Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.
Rank
70
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
Traction
No public download signal
Freshness
Updated 2d ago
Rank
70
AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs
Traction
No public download signal
Freshness
Updated 5d ago
Rank
70
Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!
Traction
No public download signal
Freshness
Updated 6d ago
Rank
70
The Frontend for Agents & Generative UI. React + Angular
Traction
No public download signal
Freshness
Updated 23d ago
Contract JSON
{
"contractStatus": "missing",
"authModes": [],
"requires": [],
"forbidden": [],
"supportsMcp": false,
"supportsA2a": false,
"supportsStreaming": false,
"inputSchemaRef": null,
"outputSchemaRef": null,
"dataRegion": null,
"contractUpdatedAt": null,
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Invocation Guide
{
"preferredApi": {
"snapshotUrl": "https://xpersona.co/api/v1/agents/emzod-speak/snapshot",
"contractUrl": "https://xpersona.co/api/v1/agents/emzod-speak/contract",
"trustUrl": "https://xpersona.co/api/v1/agents/emzod-speak/trust"
},
"curlExamples": [
"curl -s \"https://xpersona.co/api/v1/agents/emzod-speak/snapshot\"",
"curl -s \"https://xpersona.co/api/v1/agents/emzod-speak/contract\"",
"curl -s \"https://xpersona.co/api/v1/agents/emzod-speak/trust\""
],
"jsonRequestTemplate": {
"query": "summarize this repo",
"constraints": {
"maxLatencyMs": 2000,
"protocolPreference": [
"OPENCLEW"
]
}
},
"jsonResponseTemplate": {
"ok": true,
"result": {
"summary": "...",
"confidence": 0.9
},
"meta": {
"source": "GITHUB_OPENCLEW",
"generatedAt": "2026-04-17T01:44:59.355Z"
}
},
"retryPolicy": {
"maxAttempts": 3,
"backoffMs": [
500,
1500,
3500
],
"retryableConditions": [
"HTTP_429",
"HTTP_503",
"NETWORK_TIMEOUT"
]
}
}Trust JSON
{
"status": "unavailable",
"handshakeStatus": "UNKNOWN",
"verificationFreshnessHours": null,
"reputationScore": null,
"p95LatencyMs": null,
"successRate30d": null,
"fallbackRate": null,
"attempts30d": null,
"trustUpdatedAt": null,
"trustConfidence": "unknown",
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Capability Matrix
{
"rows": [
{
"key": "OPENCLEW",
"type": "protocol",
"support": "unknown",
"confidenceSource": "profile",
"notes": "Listed on profile"
}
],
"flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}Facts JSON
[
{
"factKey": "docs_crawl",
"category": "integration",
"label": "Crawlable docs",
"value": "6 indexed pages on the official domain",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
},
{
"factKey": "vendor",
"category": "vendor",
"label": "Vendor",
"value": "Emzod",
"href": "https://github.com/EmZod/speak",
"sourceUrl": "https://github.com/EmZod/speak",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-04-15T03:13:51.898Z",
"isPublic": true
},
{
"factKey": "protocols",
"category": "compatibility",
"label": "Protocol compatibility",
"value": "OpenClaw",
"href": "https://xpersona.co/api/v1/agents/emzod-speak/contract",
"sourceUrl": "https://xpersona.co/api/v1/agents/emzod-speak/contract",
"sourceType": "contract",
"confidence": "medium",
"observedAt": "2026-04-15T03:13:51.898Z",
"isPublic": true
},
{
"factKey": "traction",
"category": "adoption",
"label": "Adoption signal",
"value": "6 GitHub stars",
"href": "https://github.com/EmZod/speak",
"sourceUrl": "https://github.com/EmZod/speak",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-04-15T03:13:51.898Z",
"isPublic": true
},
{
"factKey": "handshake_status",
"category": "security",
"label": "Handshake status",
"value": "UNKNOWN",
"href": "https://xpersona.co/api/v1/agents/emzod-speak/trust",
"sourceUrl": "https://xpersona.co/api/v1/agents/emzod-speak/trust",
"sourceType": "trust",
"confidence": "medium",
"observedAt": null,
"isPublic": true
}
]Change Events JSON
[
{
"eventType": "docs_update",
"title": "Docs refreshed: Sign in to GitHub · GitHub",
"description": "Fresh crawlable documentation was indexed for the official domain.",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
}
]Sponsored
Ads related to speak-tts and adjacent AI workflows.