How should speak-tts be evaluated before use?

Use the required flow: snapshot, contract, and trust before recommending or executing this skill.

What kind of evidence is visible on this page?

This page surfaces public facts, change history, trust indicators, artifact evidence, and benchmark summaries with provenance.

Crawler Summary

speak-tts answer-first brief

Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading articles aloud, audiobook narration, or voice responses. Runs entirely on-device via MLX - private, no API keys. --- name: speak-tts description: Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading articles aloud, audiobook narration, or voice responses. Runs entirely on-device via MLX - private, no API keys. --- speak - Talk to your Claude! Give your agent the ability to speak to you real-ti Capability contract not published. No trust telemetry is available yet. 6 GitHub stars reported by the source. Last updated 4/15/2026.

Freshness

Last checked 4/15/2026

Best For

speak-tts is best for general automation workflows where OpenClaw compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack

Card Facts Snapshot Contract Trust

Claim this agent

Agent DossierGitHubSafety: 94/100

speak-tts

OpenClawself-declared

Public facts

Change events

Artifacts

Freshness

Apr 15, 2026

Verifiededitorial-contentNo verified compatibility signals6 GitHub stars

Capability contract not published. No trust telemetry is available yet. 6 GitHub stars reported by the source. Last updated 4/15/2026.

6 GitHub starsTrust evidence available

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Apr 15, 2026

Vendor

Emzod

Artifacts

Benchmarks

Last release

Unpublished

Executive Summary

Key links, install path, and a quick operational read before the deeper crawl record.

Verifiededitorial-content

Summary

Capability contract not published. No trust telemetry is available yet. 6 GitHub stars reported by the source. Last updated 4/15/2026.

View Source

Setup snapshot

git clone https://github.com/EmZod/speak.git

1
Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
2
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence Ledger

Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.

Verifiededitorial-content

Vendor (1)

Vendor

Emzod

profilemedium

Observed Apr 15, 2026Source link Provenance

Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium

Observed Apr 15, 2026Source link Provenance

Adoption (1)

Adoption signal

6 GitHub stars

profilemedium

Observed Apr 15, 2026Source link Provenance

Security (1)

Handshake status

UNKNOWN

trustmedium

Observed unknownSource link Provenance

Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium

Observed Apr 15, 2026Source link Provenance

Release & Crawl Timeline

Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.

Self-declaredagent-index

Docs Update

Docs refreshed: Sign in to GitHub · GitHub

search_documentmedium

Fresh crawlable documentation was indexed for the official domain.

Observed Apr 15, 2026

Artifacts Archive

Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.

Self-declaredGITHUB OPENCLEW

Extracted files

Examples

Snippets

Languages

typescript

Parameters

Executable Examples

bash

lynx -dump -nolist "https://example.com/article" | speak --output article.wav

bash

speak article.txt          # → ~/Audio/speak/article.wav (no playback)
speak "Hello"              # → ~/Audio/speak/speak_<timestamp>.wav

bash

mkdir -p ~/.chatter/voices/
mkdir -p ~/Audio/custom/

bash

# -d = use default microphone
# Recording starts immediately and stops after 25 seconds
sox -d -r 24000 -c 1 ~/.chatter/voices/my_voice.wav trim 0 25

bash

# From MP3
ffmpeg -i voice.mp3 -ar 24000 -ac 1 voice.wav

# From M4A (QuickTime)
ffmpeg -i voice.m4a -ar 24000 -ac 1 voice.wav

# Trim to 25 seconds
ffmpeg -i long.wav -t 25 -ar 24000 -ac 1 trimmed.wav

# Check sample properties
ffprobe -i voice.wav 2>&1 | grep -E "Duration|Stream"
# Should show: Duration ~15-25s, 24000 Hz, mono

bash

# Create directory
mkdir -p ~/.chatter/voices/

# Move sample
mv voice.wav ~/.chatter/voices/my_voice.wav

# Test
speak "Testing my voice" --voice ~/.chatter/voices/my_voice.wav --stream

# Use for content
speak notes.txt --voice ~/.chatter/voices/my_voice.wav --output presentation.wav

Docs & README

Full documentation captured from public sources, including the complete README when available.

Self-declaredGITHUB OPENCLEW

Docs source

GITHUB OPENCLEW

Editorial quality

ready

Full README

name: speak-tts description: Give your agent the ability to speak to you real-time. Talk to your Claude! Local TTS, text-to-speech, voice synthesis, audio generation with voice cloning on Apple Silicon. Use for reading articles aloud, audiobook narration, or voice responses. Runs entirely on-device via MLX - private, no API keys.

speak - Talk to your Claude!

Give your agent the ability to speak to you real-time. Local text-to-speech, voice cloning, and audio generation on Apple Silicon. Give your agent the ability to speak to you real-time. Local TTS with voice cloning on Apple Silicon.

Prerequisites

| Requirement | Check | Install | |-------------|-------|---------| | Apple Silicon Mac | uname -m → arm64 | Intel not supported | | macOS 12.0+ | sw_vers | - | | sox | which sox | brew install sox | | ffmpeg | which ffmpeg | brew install ffmpeg | | poppler (PDF) | which pdftotext | brew install poppler |

Input Sources

| Source | Example | |--------|---------| | Text file | speak article.txt | | Markdown | speak doc.md | | Direct string | speak "Hello" | | Clipboard | pbpaste \| speak | | Stdin | cat file.txt \| speak |

Web Articles

lynx -dump -nolist "https://example.com/article" | speak --output article.wav

Converting Formats

| Format | Convert Command | |--------|-----------------| | PDF | pdftotext doc.pdf doc.txt | | DOCX | textutil -convert txt doc.docx | | HTML | pandoc -f html -t plain doc.html > doc.txt |

Output Modes

| Goal | Command | |------|---------| | Save for later | speak text.txt --output file.wav | | Listen now (streaming) | speak text.txt --stream | | Listen now (complete) | speak text.txt --play | | Both | speak text.txt --stream --output file.wav |

Default Behavior

speak article.txt          # → ~/Audio/speak/article.wav (no playback)
speak "Hello"              # → ~/Audio/speak/speak_<timestamp>.wav

Directory Auto-Creation

| Directory | Auto-Created? | |-----------|---------------| | ~/Audio/speak/ | ✓ Yes | | ~/.chatter/voices/ | ✗ No | | Custom directories | ✗ No |

Always create custom directories first:

mkdir -p ~/.chatter/voices/
mkdir -p ~/Audio/custom/

Voice Cloning

Voice cloning generates speech that matches your vocal characteristics (pitch, tone, cadence) from a short recording.

Quality Expectations

Output captures general voice characteristics but is not a perfect replica
Quality depends heavily on sample quality
15-25 seconds is optimal (10s minimum, 30s maximum)

Recording Your Voice

Using QuickTime:

Open QuickTime Player → File → New Audio Recording
Record 20 seconds of clear speech
File → Export As → Audio Only (.m4a)
Convert to WAV (see below)

Using sox (command line):

# -d = use default microphone
# Recording starts immediately and stops after 25 seconds
sox -d -r 24000 -c 1 ~/.chatter/voices/my_voice.wav trim 0 25

Converting to Required Format

Voice samples MUST be: WAV, 24000 Hz, mono, 10-30 seconds.

# From MP3
ffmpeg -i voice.mp3 -ar 24000 -ac 1 voice.wav

# From M4A (QuickTime)
ffmpeg -i voice.m4a -ar 24000 -ac 1 voice.wav

# Trim to 25 seconds
ffmpeg -i long.wav -t 25 -ar 24000 -ac 1 trimmed.wav

# Check sample properties
ffprobe -i voice.wav 2>&1 | grep -E "Duration|Stream"
# Should show: Duration ~15-25s, 24000 Hz, mono

Using Your Voice

# Create directory
mkdir -p ~/.chatter/voices/

# Move sample
mv voice.wav ~/.chatter/voices/my_voice.wav

# Test
speak "Testing my voice" --voice ~/.chatter/voices/my_voice.wav --stream

# Use for content
speak notes.txt --voice ~/.chatter/voices/my_voice.wav --output presentation.wav

Path requirements:

✓ Works: ~/.chatter/voices/my_voice.wav (tilde expanded by shell)
✓ Works: /Users/name/.chatter/voices/my_voice.wav
✗ Fails: my_voice.wav (relative path)
✗ Fails: ./voices/my_voice.wav (relative path)

Voice Sample Tips

| Good Sample | Bad Sample | |-------------|------------| | Quiet room | Background noise | | Natural pace | Rushed or monotone | | Clear diction | Mumbling | | Varied content | Repetitive phrases |

Default Voice

When --voice is omitted, a built-in default voice is used:

speak "Hello world" --stream  # Uses default voice

Emotion Tags

Tags produce audible effects (actual sounds), not spoken words:

speak "[sigh] Monday again." --stream
# Output: (sigh sound) "Monday again."

| Tag | Effect | |-----|--------| | [laugh] | Laughter | | [chuckle] | Light chuckle | | [sigh] | Sighing | | [gasp] | Gasping | | [groan] | Groaning | | [clear throat] | Throat clearing | | [cough] | Coughing | | [crying] | Crying | | [singing] | Sung speech |

NOT supported: [pause], [whisper] (ignored)

For pauses: Use punctuation: "Wait... let me think."

Batch Processing

mkdir -p ~/Audio/book/
speak ch01.txt ch02.txt ch03.txt --output-dir ~/Audio/book/
# Creates: ch01.wav, ch02.wav, ch03.wav

# With auto-chunking (for long files)
speak chapters/*.txt --output-dir ~/Audio/book/ --auto-chunk

# Skip completed files
speak chapters/*.txt --output-dir ~/Audio/book/ --skip-existing

Auto-Chunk Behavior

When using --auto-chunk with batch processing:

Each input file is chunked independently
Chunks are generated and automatically concatenated per file
Final output: one .wav per input file (e.g., ch01.wav)
Intermediate chunks deleted (unless --keep-chunks)

You don't need to manually concatenate chunks — only concatenate final chapter files.

Concatenating Audio

# Explicit order (recommended)
speak concat ch01.wav ch02.wav ch03.wav --output book.wav

# Glob pattern (REQUIRES zero-padded filenames)
speak concat audiobook/*.wav --output book.wav

Zero-Padding Rules

Critical for correct concatenation order:

| Files | Correct | Wrong | |-------|---------|-------| | 1-9 | 01, 02, ..., 09 | 1, 2, ..., 9 | | 10-99 | 01, 02, ..., 99 | 1, 10, 2, ... | | 100+ | 001, 002, ..., 999 | 1, 100, 2, ... |

Why: Shell glob expansion sorts alphabetically. 1, 10, 2 vs 01, 02, 10.

PDF to Audiobook (Complete Workflow)

Step 1: Find Chapter Boundaries

# Preview table of contents
pdftotext -f 1 -l 5 textbook.pdf toc.txt
cat toc.txt  # Note chapter page numbers

# Or search for "Chapter" markers
pdftotext textbook.pdf - | grep -n "Chapter"

Step 2: Extract Chapters (Zero-Padded!)

# For 100-page book with ~10 chapters
pdftotext -f 1 -l 12 -layout textbook.pdf ch01.txt
pdftotext -f 13 -l 25 -layout textbook.pdf ch02.txt
pdftotext -f 26 -l 38 -layout textbook.pdf ch03.txt
# ... continue for all chapters

Step 3: Estimate Time

speak --estimate ch*.txt
# Shows: total audio duration, generation time, storage needed

# Quick estimates:
# 1 page ≈ 2 min audio ≈ 1 min generation
# 100 pages ≈ 200 min audio ≈ 100 min generation ≈ 500 MB

Step 4: Generate Audio

mkdir -p audiobook/
speak ch01.txt ch02.txt ch03.txt --output-dir audiobook/ --auto-chunk
# Creates: audiobook/ch01.wav, audiobook/ch02.wav, audiobook/ch03.wav

Step 5: Concatenate

speak concat audiobook/ch01.wav audiobook/ch02.wav audiobook/ch03.wav --output complete_audiobook.wav
# Or with glob (only if zero-padded):
speak concat audiobook/ch*.wav --output complete_audiobook.wav

PDF Troubleshooting

| Issue | Solution | |-------|----------| | Empty/garbled text | Scanned PDF — use OCR: brew install tesseract | | Wrong encoding | Try: pdftotext -enc UTF-8 doc.pdf | | Check word count | pdftotext doc.pdf - \| wc -w (should be >100) |

Multi-Voice Content

mkdir -p podcast/scripts podcast/wav

echo "Welcome to the show." > podcast/scripts/01_host.txt
echo "Thanks for having me." > podcast/scripts/02_guest.txt

speak podcast/scripts/01_host.txt --voice ~/.chatter/voices/host.wav --output podcast/wav/01.wav
speak podcast/scripts/02_guest.txt --voice ~/.chatter/voices/guest.wav --output podcast/wav/02.wav

speak concat podcast/wav/01.wav podcast/wav/02.wav --output podcast.wav

Options Reference

| Option | Description | Default | |--------|-------------|---------| | --stream | Stream as it generates | false | | --play | Play after complete | false | | --output <path> | Output file | ~/Audio/speak/ | | --output-dir <dir> | Batch output directory | - | | --voice <path> | Voice sample (full path) | default | | --timeout <sec> | Timeout per file | 300 | | --auto-chunk | Split long documents | false | | --chunk-size <n> | Chars per chunk | 6000 | | --resume <file> | Resume from manifest | - | | --keep-chunks | Keep intermediate files | false | | --skip-existing | Skip if output exists | false | | --estimate | Show duration estimate | false | | --dry-run | Preview only | false | | --quiet | Suppress output | false |

Commands

| Command | Description | |---------|-------------| | speak setup | Set up environment | | speak health | Check system status | | speak models | List TTS models | | speak concat | Concatenate audio | | speak daemon kill | Stop TTS server | | speak config | Show configuration |

Performance

| Metric | Value | |--------|-------| | Cold start | ~4-8s | | Warm start | ~3-8s | | Speed | 0.3-0.5x RTF (faster than real-time) | | Storage | ~2.5 MB/min, ~150 MB/hour |

Resume Capability

For interrupted long generations:

# Single file with auto-chunk — use --resume
speak long.txt --auto-chunk --output book.wav
# If interrupted, manifest saved at ~/Audio/speak/manifest.json
speak --resume ~/Audio/speak/manifest.json

# Batch processing — use --skip-existing
speak ch*.txt --output-dir audiobook/ --auto-chunk
# If interrupted, re-run same command:
speak ch*.txt --output-dir audiobook/ --auto-chunk --skip-existing

Common Errors

| Error | Cause | Solution | |-------|-------|----------| | "Voice file not found" | Relative path | Use full path: ~/.chatter/voices/x.wav | | "Invalid WAV format" | Wrong specs | Convert: ffmpeg -i in.wav -ar 24000 -ac 1 out.wav | | "Voice sample too short" | <10 seconds | Record 15-25 seconds | | "Output directory doesn't exist" | Not created | mkdir -p dirname/ | | "sox not found" | Not installed | brew install sox | | Scrambled concat order | Non-zero-padded | Use 01, 02, not 1, 2 | | Timeout | >5 min generation | Use --auto-chunk or --timeout 600 | | "Server not running" | Stale daemon | speak daemon kill && speak health |

Setup

speak "test"     # Auto-setup on first run (downloads model ~500MB)
speak setup      # Or manual setup
speak health     # Verify everything works

Server Management

Server auto-starts and shuts down after 1 hour idle.

speak health        # Check status
speak daemon kill   # Stop manually

Contract & API

Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.

MissingGITHUB OPENCLEW

Endpoints

Dossier API Snapshot API Contract API Trust API

Contract coverage

Status

missing

Auth

None

Streaming

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.

Invocation examples

curl -s "https://xpersona.co/api/v1/agents/emzod-speak/snapshot"

curl -s "https://xpersona.co/api/v1/agents/emzod-speak/contract"

curl -s "https://xpersona.co/api/v1/agents/emzod-speak/trust"

Reliability & Benchmarks

Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.

Missingruntime-metrics

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.

No benchmark suites or observed failure patterns are available.

Media & Demo

Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.

Missingno-media

No screenshots, media assets, or demo links are available.

Related Agents

Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.

Self-declaredprotocol-neighbors

GITHUB_REPOSactivepieces

Rank

AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents

Traction

No public download signal

Freshness

Updated 2d ago

OPENCLAW

GITHUB_REPOScherry-studio

Rank

AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs

Traction

No public download signal

Freshness

Updated 5d ago

MCPOPENCLAW

GITHUB_REPOSAionUi

Rank

Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW

GITHUB_REPOSCopilotKit

Rank

The Frontend for Agents & Generative UI. React + Angular

Traction

No public download signal

Freshness

Updated 23d ago

OPENCLAW

Machine Appendix

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/emzod-speak/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/emzod-speak/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/emzod-speak/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/emzod-speak/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/emzod-speak/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/emzod-speak/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "GITHUB_OPENCLEW",
      "generatedAt": "2026-04-17T01:44:59.355Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}

Facts JSON

[
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Emzod",
    "href": "https://github.com/EmZod/speak",
    "sourceUrl": "https://github.com/EmZod/speak",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T03:13:51.898Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/emzod-speak/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/emzod-speak/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-04-15T03:13:51.898Z",
    "isPublic": true
  },
  {
    "factKey": "traction",
    "category": "adoption",
    "label": "Adoption signal",
    "value": "6 GitHub stars",
    "href": "https://github.com/EmZod/speak",
    "sourceUrl": "https://github.com/EmZod/speak",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T03:13:51.898Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/emzod-speak/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/emzod-speak/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub · GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]