Crawler Summary

parakeet answer-first brief

Local speech-to-text using NVIDIA Parakeet TDT (NeMo). 600M-param multilingual ASR with automatic punctuation/capitalization, word-level timestamps, and ~3380x realtime speed on GPU. Supports 25 European languages with auto-detection, long-form audio up to 3 hours, streaming output, SRT/VTT subtitles, batch processing, and URL/YouTube input. --- name: parakeet description: "Local speech-to-text using NVIDIA Parakeet TDT (NeMo). 600M-param multilingual ASR with automatic punctuation/capitalization, word-level timestamps, and ~3380x realtime speed on GPU. Supports 25 European languages with auto-detection, long-form audio up to 3 hours, streaming output, SRT/VTT subtitles, batch processing, and URL/YouTube input." version: 1.0.0 author: ThePlasmak tags: [ Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.

Freshness

Last checked 2/25/2026

Best For

parakeet is best for with, the, two workflows where OpenClaw compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack

Claim this agent
Agent DossierGitHubSafety: 89/100

parakeet

Local speech-to-text using NVIDIA Parakeet TDT (NeMo). 600M-param multilingual ASR with automatic punctuation/capitalization, word-level timestamps, and ~3380x realtime speed on GPU. Supports 25 European languages with auto-detection, long-form audio up to 3 hours, streaming output, SRT/VTT subtitles, batch processing, and URL/YouTube input. --- name: parakeet description: "Local speech-to-text using NVIDIA Parakeet TDT (NeMo). 600M-param multilingual ASR with automatic punctuation/capitalization, word-level timestamps, and ~3380x realtime speed on GPU. Supports 25 European languages with auto-detection, long-form audio up to 3 hours, streaming output, SRT/VTT subtitles, batch processing, and URL/YouTube input." version: 1.0.0 author: ThePlasmak tags: [

OpenClawself-declared

Public facts

4

Change events

1

Artifacts

0

Freshness

Feb 25, 2026

Verifiededitorial-contentNo verified compatibility signals

Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.

Trust evidence available

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Feb 25, 2026

Vendor

Theplasmak

Artifacts

0

Benchmarks

0

Last release

Unpublished

Executive Summary

Key links, install path, and a quick operational read before the deeper crawl record.

Verifiededitorial-content

Summary

Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.

Setup snapshot

git clone https://github.com/ThePlasmak/parakeet.git
  1. 1

    Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.

  2. 2

    Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence Ledger

Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.

Verifiededitorial-content
Vendor (1)

Vendor

Theplasmak

profilemedium
Observed Feb 25, 2026Source linkProvenance
Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium
Observed Feb 25, 2026Source linkProvenance
Security (1)

Handshake status

UNKNOWN

trustmedium
Observed unknownSource linkProvenance
Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium
Observed Apr 15, 2026Source linkProvenance

Release & Crawl Timeline

Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.

Self-declaredagent-index

Artifacts Archive

Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.

Self-declaredGITHUB OPENCLEW

Extracted files

0

Examples

6

Snippets

0

Languages

typescript

Parameters

Executable Examples

bash

# Full install (creates venv, installs NeMo + CUDA PyTorch)
./setup.sh

# Verify installation
./setup.sh --check

# Upgrade NeMo toolkit
./setup.sh --update

bash

# For CUDA 12.x
uv pip install --python ./venv/bin/python torch torchaudio --index-url https://download.pytorch.org/whl/cu121

bash

# Basic transcription (auto-punctuated, auto-capitalized)
./scripts/transcribe audio.wav

# Transcribe an MP3 (auto-converts to WAV via ffmpeg)
./scripts/transcribe recording.mp3

# SRT subtitles
./scripts/transcribe audio.wav --format srt -o subtitles.srt

# WebVTT subtitles
./scripts/transcribe audio.wav --format vtt -o subtitles.vtt

# Full JSON with word/segment/char timestamps
./scripts/transcribe audio.wav --timestamps --format json -o result.json

# Transcribe from YouTube URL
./scripts/transcribe https://youtube.com/watch?v=dQw4w9WgXcQ

# Long lecture (>24 min, up to 3 hours)
./scripts/transcribe lecture.wav --long-form

# Streaming mode (print segments as they're transcribed)
./scripts/transcribe audio.wav --streaming

# Batch process a directory
./scripts/transcribe ./recordings/ -o ./transcripts/

# Batch with glob, skip already-done files
./scripts/transcribe *.wav --skip-existing -o ./transcripts/

# Use English-only v2 model
./scripts/transcribe audio.wav --model nvidia/parakeet-tdt-0.6b-v2

# JSON output with metadata
./scripts/transcribe audio.wav --format json -o result.json

# Specify expected language (validation only; v3 auto-detects)
./scripts/transcribe audio.wav --language fr

text

Input:
  AUDIO                 Audio file(s), directory, glob pattern, or URL
                        Native: .wav, .flac (16kHz mono preferred)
                        Converts via ffmpeg: .mp3, .m4a, .mp4, .mkv, .ogg, .webm, .aac, .wma, .avi, .opus
                        URLs auto-download via yt-dlp (YouTube, direct links, etc.)

Model & Language:
  -m, --model NAME      NeMo ASR model (default: nvidia/parakeet-tdt-0.6b-v3)
                        Also: nvidia/parakeet-tdt-0.6b-v2 (English), nvidia/parakeet-tdt-1.1b (larger)
  -l, --language CODE   Expected language code, e.g. en, es, fr (v3 auto-detects if omitted)
                        Used for validation; does not force the model

Output Format:
  -f, --format FMT      text | json | srt | vtt (default: text)
  --timestamps          Enable word/segment/char timestamps (auto-enabled for srt, vtt, json)
  --max-words-per-line N  For SRT/VTT, split segments into sub-cues of at most N words
  --max-chars-per-line N  For SRT/VTT, split lines so each fits within N characters
                        Takes priority over --max-words-per-line when both are set
  -o, --output PATH     Output file or directory (directory for batch mode)

Long-Form & Streaming:
  --long-form           Enable local attention for audio >24 min (up to ~3 hours)
                        Changes attention model to rel_pos_local_attn with [256,256] context
  --streaming           Print segments as they are transcribed (chunked inference)

Inference:
  --batch-size N        Inference batch size (default: 32; reduce if OOM)

Batch Processing:
  --skip-existing       Skip files whose output already exists (batch mode)

Device:
  --device DEV          auto | cpu | cuda (default: auto)
  -q, --quiet           Suppress progress and status messages

Utility:
  --version             Print version info and exit

text

Hello, welcome to the meeting. Today we'll discuss the quarterly results
and our plans for next quarter.

json

{
  "file": "audio.wav",
  "text": "Hello, welcome to the meeting.",
  "duration": 600.5,
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, welcome to the meeting.",
      "words": [
        {"word": "Hello,", "start": 0.0, "end": 0.4},
        {"word": "welcome", "start": 0.5, "end": 0.9},
        {"word": "to", "start": 0.95, "end": 1.1},
        {"word": "the", "start": 1.15, "end": 1.3},
        {"word": "meeting.", "start": 1.35, "end": 2.0}
      ]
    }
  ],
  "words": [...],
  "chars": [...],
  "stats": {
    "processing_time": 0.18,
    "realtime_factor": 3380.0
  }
}

Docs & README

Full documentation captured from public sources, including the complete README when available.

Self-declaredGITHUB OPENCLEW

Docs source

GITHUB OPENCLEW

Editorial quality

ready

Local speech-to-text using NVIDIA Parakeet TDT (NeMo). 600M-param multilingual ASR with automatic punctuation/capitalization, word-level timestamps, and ~3380x realtime speed on GPU. Supports 25 European languages with auto-detection, long-form audio up to 3 hours, streaming output, SRT/VTT subtitles, batch processing, and URL/YouTube input. --- name: parakeet description: "Local speech-to-text using NVIDIA Parakeet TDT (NeMo). 600M-param multilingual ASR with automatic punctuation/capitalization, word-level timestamps, and ~3380x realtime speed on GPU. Supports 25 European languages with auto-detection, long-form audio up to 3 hours, streaming output, SRT/VTT subtitles, batch processing, and URL/YouTube input." version: 1.0.0 author: ThePlasmak tags: [

Full README

name: parakeet description: "Local speech-to-text using NVIDIA Parakeet TDT (NeMo). 600M-param multilingual ASR with automatic punctuation/capitalization, word-level timestamps, and ~3380x realtime speed on GPU. Supports 25 European languages with auto-detection, long-form audio up to 3 hours, streaming output, SRT/VTT subtitles, batch processing, and URL/YouTube input." version: 1.0.0 author: ThePlasmak tags: [ "audio", "transcription", "speech-to-text", "parakeet", "nemo", "nvidia", "ml", "cuda", "gpu", "subtitles", "multilingual", "asr", "timestamps", ] platforms: ["linux", "wsl2"] metadata: { "openclaw": { "emoji": "๐Ÿฆœ", "requires": { "bins": ["python3"], "optionalBins": ["ffmpeg", "yt-dlp"], }, }, }

๐Ÿฆœ Parakeet โ€” NVIDIA Speech-to-Text

Local speech-to-text using NVIDIA's Parakeet TDT models via the NeMo toolkit. The default parakeet-tdt-0.6b-v3 is a 600-million-parameter multilingual ASR model that delivers state-of-the-art accuracy with automatic punctuation and capitalization, word-level timestamps, and an insane ~3380ร— realtime inference speed on GPU.

Only needs ~2GB VRAM to run โ€” plenty of room on an RTX 3070 (8GB).

When to Use

Use this skill when you need:

  • Best-accuracy transcription โ€” Parakeet v3 achieves 6.34% average WER on the Open ASR Leaderboard
  • Automatic punctuation & capitalization โ€” no post-processing needed, output is clean and readable
  • European language transcription โ€” 25 languages with automatic detection (no prompting required)
  • Word/segment/char timestamps โ€” precise timing at every granularity level
  • Long audio files โ€” up to 24 min with full attention, up to 3 hours with --long-form local attention
  • Streaming output โ€” see segments as they're transcribed with --streaming
  • Subtitle generation โ€” SRT and VTT formats with word-level line splitting
  • Batch processing โ€” glob patterns, directories, skip-existing support with ETA
  • URL/YouTube transcription โ€” auto-downloads via yt-dlp
  • Blazing speed โ€” ~3380ร— realtime means a 10-minute file transcribes in under 0.2 seconds on GPU

Trigger phrases: "transcribe with parakeet", "nemo transcribe", "nvidia speech to text", "parakeet transcribe", "transcribe this audio", "convert speech to text", "what did they say", "make a transcript", "audio to text", "subtitle this video", "transcribe in [language]", "European language transcription", "best accuracy transcription", "high accuracy speech to text", "transcribe long audio", "transcribe lecture", "transcribe meeting", "word timestamps"

๐Ÿฆœ Parakeet vs ๐Ÿ—ฃ๏ธ faster-whisper โ€” When to Use Which

| Feature | Parakeet | faster-whisper | |---|---|---| | Accuracy | โœ… Best (6.34% avg WER) | Good (distil: 7.08% WER) | | Speed | โœ… ~3380ร— realtime | ~20ร— realtime | | Auto punctuation | โœ… Built-in | โŒ Requires post-processing | | Languages | 25 European | โœ… 99+ worldwide | | Diarization | โŒ Not supported | โœ… pyannote speaker ID | | Chapters/search | โŒ Not supported | โœ… Chapter detection, search | | Output formats | text, JSON, SRT, VTT | โœ… text, JSON, SRT, VTT, ASS, LRC, TTML, CSV, TSV, HTML | | Translation | โŒ Not supported | โœ… Any language โ†’ English | | VRAM usage | ~2GB | ~1.5GB (distil) | | Long audio | โœ… Up to 3 hours | Limited by VRAM | | Streaming | โœ… Chunked inference | โœ… Segment streaming | | Noise handling | Good (built-in robustness) | โœ… --denoise, --normalize | | Filler removal | โŒ | โœ… --clean-filler |

Rule of thumb:

  • Parakeet when you want: best accuracy, fastest speed, auto-punctuation, European languages
  • faster-whisper when you need: diarization, 99+ languages, translation, chapters/search, more output formats, audio preprocessing

โš ๏ธ Agent guidance โ€” keep invocations minimal:

CORE RULE: default command (./scripts/transcribe audio.wav) is the fastest path โ€” add flags only when the user explicitly asks for that capability.

Transcription:

  • Only add --timestamps if the user asks for word-level or segment-level timestamps (auto-enabled for srt, vtt, json formats)
  • Only add --format srt/vtt if the user asks for subtitles/captions in that format
  • Only add --format json if the user wants structured/programmatic output
  • Only add --long-form if the audio is longer than 24 minutes
  • Only add --streaming if the user wants live/progressive output for long files
  • Only add -l/--language CODE if the user specifies a language (auto-detection is usually fine)
  • Only add --model if the user wants a specific model variant (v2, 1.1b)
  • Only add --device cpu if GPU is not available or user requests CPU
  • Only add --batch-size N if the user reports OOM errors
  • Only add --max-words-per-line or --max-chars-per-line for subtitle readability on long segments

Batch processing:

  • Only add --skip-existing when resuming interrupted batch jobs
  • For batch output, always use -o <dir> (directory, not file)
  • ETA is shown automatically for batch jobs (no flag needed)

Output format for agent relay:

  • Text (default) โ†’ safe to show directly to user for short files; summarise for long ones
  • JSON โ†’ useful for programmatic post-processing; not ideal to paste in full to user
  • SRT/VTT โ†’ always write to -o file; tell the user the output path, never paste raw subtitle content

When NOT to use:

  • Need speaker diarization โ†’ use faster-whisper with --diarize
  • Non-European languages (Asian, African, etc.) โ†’ use faster-whisper
  • Need translation to English โ†’ use faster-whisper with --translate
  • Cloud-only environments without local compute

Quick Reference

| Task | Command | Notes | |---|---|---| | Basic transcription | ./scripts/transcribe audio.wav | Auto-punctuated, GPU-accelerated | | Transcribe MP3 | ./scripts/transcribe audio.mp3 | Auto-converts via ffmpeg | | SRT subtitles | ./scripts/transcribe audio.wav --format srt -o subs.srt | Timestamps auto-enabled | | VTT subtitles | ./scripts/transcribe audio.wav --format vtt -o subs.vtt | WebVTT format | | Word timestamps | ./scripts/transcribe audio.wav --timestamps --format json -o out.json | Word/segment/char level | | JSON output | ./scripts/transcribe audio.wav --format json -o result.json | Full metadata + timestamps | | Long audio (>24 min) | ./scripts/transcribe lecture.wav --long-form | Local attention, up to ~3 hours | | Streaming output | ./scripts/transcribe audio.wav --streaming | Print segments as transcribed | | Specify language | ./scripts/transcribe audio.wav -l fr | Validate language (v3 auto-detects) | | YouTube/URL | ./scripts/transcribe https://youtube.com/watch?v=... | Auto-downloads via yt-dlp | | Batch process | ./scripts/transcribe *.wav -o ./transcripts/ | Output to directory | | Batch with skip | ./scripts/transcribe *.wav --skip-existing -o ./out/ | Resume interrupted batches | | English-only model | ./scripts/transcribe audio.wav -m nvidia/parakeet-tdt-0.6b-v2 | v2 English-only | | Larger model | ./scripts/transcribe audio.wav -m nvidia/parakeet-tdt-1.1b | 1.1B param English model | | CPU mode | ./scripts/transcribe audio.wav --device cpu | If no GPU available | | Quiet mode | ./scripts/transcribe audio.wav -q | Suppress progress messages | | Subtitle word wrapping | ./scripts/transcribe audio.wav --format srt --max-words-per-line 8 -o subs.srt | Split long subtitle cues | | Char-based wrapping | ./scripts/transcribe audio.wav --format srt --max-chars-per-line 42 -o subs.srt | Character limit per line | | Show version | ./scripts/transcribe --version | Print NeMo version | | System check | ./setup.sh --check | Verify GPU, Python, NeMo, ffmpeg | | Upgrade NeMo | ./setup.sh --update | Upgrade without full reinstall |

Model Selection

Available Models

| Model | Params | Languages | Speed | Use Case | |---|---|---|---|---| | nvidia/parakeet-tdt-0.6b-v3 | 600M | 25 EU languages | ~3380ร— RT | Default, best multilingual | | nvidia/parakeet-tdt-0.6b-v2 | 600M | English only | ~3380ร— RT | English-only, slightly better English WER | | nvidia/parakeet-tdt-1.1b | 1.1B | English only | Slower | Maximum English accuracy |

Supported Languages (v3)

Bulgarian (bg), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hungarian (hu), Italian (it), Latvian (lv), Lithuanian (lt), Maltese (mt), Polish (pl), Portuguese (pt), Romanian (ro), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Russian (ru), Ukrainian (uk)

Language is auto-detected โ€” no prompting or configuration needed. The model identifies the language from the audio content automatically.

Accuracy Benchmarks (v3)

| Benchmark | WER | |---|---| | Open ASR Leaderboard (avg) | 6.34% | | LibriSpeech test-clean | 1.93% | | LibriSpeech test-other | 3.59% | | AMI | 11.31% | | GigaSpeech | 9.59% | | SPGI Speech | 3.97% | | TED-LIUM v3 | 2.75% | | VoxPopuli | 6.14% |

Setup

Linux / WSL2

# Full install (creates venv, installs NeMo + CUDA PyTorch)
./setup.sh

# Verify installation
./setup.sh --check

# Upgrade NeMo toolkit
./setup.sh --update

Requirements:

  • Python 3.10+
  • NVIDIA GPU with CUDA (strongly recommended; CPU works but is much slower)
  • ffmpeg (recommended for mp3/m4a/mp4 format conversion; not needed for .wav/.flac)
  • Optional: yt-dlp (for URL/YouTube input)

Platform Support

| Platform | Acceleration | Notes | |---|---|---| | Linux + NVIDIA GPU | CUDA | ~3380ร— realtime ๐Ÿš€ | | WSL2 + NVIDIA GPU | CUDA | ~3380ร— realtime ๐Ÿš€ | | Linux (no GPU) | CPU | Functional but slow | | macOS | โŒ | NeMo has limited macOS support |

VRAM: The 600M model only needs ~2GB VRAM to load. An RTX 3070 (8GB) has plenty of headroom.

GPU Support

The setup script auto-detects your GPU and installs PyTorch with CUDA. Always use GPU if available.

If setup didn't detect your GPU, manually install PyTorch with CUDA:

# For CUDA 12.x
uv pip install --python ./venv/bin/python torch torchaudio --index-url https://download.pytorch.org/whl/cu121

Usage

# Basic transcription (auto-punctuated, auto-capitalized)
./scripts/transcribe audio.wav

# Transcribe an MP3 (auto-converts to WAV via ffmpeg)
./scripts/transcribe recording.mp3

# SRT subtitles
./scripts/transcribe audio.wav --format srt -o subtitles.srt

# WebVTT subtitles
./scripts/transcribe audio.wav --format vtt -o subtitles.vtt

# Full JSON with word/segment/char timestamps
./scripts/transcribe audio.wav --timestamps --format json -o result.json

# Transcribe from YouTube URL
./scripts/transcribe https://youtube.com/watch?v=dQw4w9WgXcQ

# Long lecture (>24 min, up to 3 hours)
./scripts/transcribe lecture.wav --long-form

# Streaming mode (print segments as they're transcribed)
./scripts/transcribe audio.wav --streaming

# Batch process a directory
./scripts/transcribe ./recordings/ -o ./transcripts/

# Batch with glob, skip already-done files
./scripts/transcribe *.wav --skip-existing -o ./transcripts/

# Use English-only v2 model
./scripts/transcribe audio.wav --model nvidia/parakeet-tdt-0.6b-v2

# JSON output with metadata
./scripts/transcribe audio.wav --format json -o result.json

# Specify expected language (validation only; v3 auto-detects)
./scripts/transcribe audio.wav --language fr

Options

Input:
  AUDIO                 Audio file(s), directory, glob pattern, or URL
                        Native: .wav, .flac (16kHz mono preferred)
                        Converts via ffmpeg: .mp3, .m4a, .mp4, .mkv, .ogg, .webm, .aac, .wma, .avi, .opus
                        URLs auto-download via yt-dlp (YouTube, direct links, etc.)

Model & Language:
  -m, --model NAME      NeMo ASR model (default: nvidia/parakeet-tdt-0.6b-v3)
                        Also: nvidia/parakeet-tdt-0.6b-v2 (English), nvidia/parakeet-tdt-1.1b (larger)
  -l, --language CODE   Expected language code, e.g. en, es, fr (v3 auto-detects if omitted)
                        Used for validation; does not force the model

Output Format:
  -f, --format FMT      text | json | srt | vtt (default: text)
  --timestamps          Enable word/segment/char timestamps (auto-enabled for srt, vtt, json)
  --max-words-per-line N  For SRT/VTT, split segments into sub-cues of at most N words
  --max-chars-per-line N  For SRT/VTT, split lines so each fits within N characters
                        Takes priority over --max-words-per-line when both are set
  -o, --output PATH     Output file or directory (directory for batch mode)

Long-Form & Streaming:
  --long-form           Enable local attention for audio >24 min (up to ~3 hours)
                        Changes attention model to rel_pos_local_attn with [256,256] context
  --streaming           Print segments as they are transcribed (chunked inference)

Inference:
  --batch-size N        Inference batch size (default: 32; reduce if OOM)

Batch Processing:
  --skip-existing       Skip files whose output already exists (batch mode)

Device:
  --device DEV          auto | cpu | cuda (default: auto)
  -q, --quiet           Suppress progress and status messages

Utility:
  --version             Print version info and exit

Output Formats

Text (default)

Clean, automatically punctuated and capitalized transcript:

Hello, welcome to the meeting. Today we'll discuss the quarterly results
and our plans for next quarter.

JSON (--format json)

Full metadata including segments, word/char timestamps, and performance stats:

{
  "file": "audio.wav",
  "text": "Hello, welcome to the meeting.",
  "duration": 600.5,
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, welcome to the meeting.",
      "words": [
        {"word": "Hello,", "start": 0.0, "end": 0.4},
        {"word": "welcome", "start": 0.5, "end": 0.9},
        {"word": "to", "start": 0.95, "end": 1.1},
        {"word": "the", "start": 1.15, "end": 1.3},
        {"word": "meeting.", "start": 1.35, "end": 2.0}
      ]
    }
  ],
  "words": [...],
  "chars": [...],
  "stats": {
    "processing_time": 0.18,
    "realtime_factor": 3380.0
  }
}

SRT (--format srt)

Standard subtitle format for video players:

1
00:00:00,000 --> 00:00:02,500
Hello, welcome to the meeting.

2
00:00:02,800 --> 00:00:05,200
Today we'll discuss the quarterly results.

VTT (--format vtt)

WebVTT format for web video players:

WEBVTT

1
00:00:00.000 --> 00:00:02.500
Hello, welcome to the meeting.

2
00:00:02.800 --> 00:00:05.200
Today we'll discuss the quarterly results.

Long-Form Audio

Parakeet v3 supports two attention modes:

| Mode | Max Duration | Flag | Notes | |---|---|---|---| | Full attention (default) | ~24 min | (none) | Best accuracy, requires more VRAM | | Local attention | ~3 hours | --long-form | Slightly reduced accuracy, much lower VRAM |

For audio over 24 minutes, use --long-form:

./scripts/transcribe long-lecture.wav --long-form

This changes the attention model to rel_pos_local_attn with a context window of [256, 256], allowing the model to process very long audio without running out of memory.

URL & YouTube Input

Pass any URL as input โ€” audio is downloaded automatically via yt-dlp:

# YouTube video
./scripts/transcribe https://youtube.com/watch?v=dQw4w9WgXcQ

# Direct audio URL
./scripts/transcribe https://example.com/podcast.mp3

# With options
./scripts/transcribe https://youtube.com/watch?v=... -l en --format srt -o subs.srt

Requires yt-dlp (checks PATH and ~/.local/share/pipx/venvs/yt-dlp/bin/yt-dlp).

Batch Processing

Process multiple files at once with glob patterns, directories, or multiple paths:

# All WAV files in current directory
./scripts/transcribe *.wav -o ./transcripts/

# Specific directory
./scripts/transcribe ./recordings/ -o ./transcripts/

# Resume interrupted batch
./scripts/transcribe *.wav --skip-existing -o ./transcripts/

# SRT subtitles for all files
./scripts/transcribe *.wav --format srt -o ./subtitles/

ETA is shown automatically for batch jobs after the first file completes.

Audio Format Support

| Format | Support | Notes | |---|---|---| | .wav | โœ… Native | Preferred; 16kHz mono | | .flac | โœ… Native | Lossless; works directly | | .mp3 | ๐Ÿ”„ Converts | Requires ffmpeg | | .m4a | ๐Ÿ”„ Converts | Requires ffmpeg | | .mp4 | ๐Ÿ”„ Converts | Requires ffmpeg | | .ogg | ๐Ÿ”„ Converts | Requires ffmpeg | | .webm | ๐Ÿ”„ Converts | Requires ffmpeg | | .aac | ๐Ÿ”„ Converts | Requires ffmpeg | | .wma | ๐Ÿ”„ Converts | Requires ffmpeg | | .opus | ๐Ÿ”„ Converts | Requires ffmpeg | | .mkv | ๐Ÿ”„ Converts | Requires ffmpeg |

Non-native formats are automatically converted to 16kHz mono WAV using ffmpeg before transcription.

License

The Parakeet TDT v3 model is released under CC-BY-4.0 โ€” free for commercial and non-commercial use.

Contract & API

Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.

MissingGITHUB OPENCLEW

Contract coverage

Status

missing

Auth

None

Streaming

No

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.
Invocation examples
curl -s "https://xpersona.co/api/v1/agents/theplasmak-parakeet/snapshot"
curl -s "https://xpersona.co/api/v1/agents/theplasmak-parakeet/contract"
curl -s "https://xpersona.co/api/v1/agents/theplasmak-parakeet/trust"

Reliability & Benchmarks

Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.

Missingruntime-metrics

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.
No benchmark suites or observed failure patterns are available.

Media & Demo

Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.

Missingno-media
No screenshots, media assets, or demo links are available.

Related Agents

Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.

Self-declaredprotocol-neighbors
GITHUB_REPOSactivepieces

Rank

70

AI Agents & MCPs & AI Workflow Automation โ€ข (~400 MCP servers for AI agents) โ€ข AI Automation / AI Agent with MCPs โ€ข AI Workflows & AI Agents โ€ข MCPs for AI Agents

Traction

No public download signal

Freshness

Updated 2d ago

OPENCLAW
GITHUB_REPOScherry-studio

Rank

70

AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs

Traction

No public download signal

Freshness

Updated 5d ago

MCPOPENCLAW
GITHUB_REPOSAionUi

Rank

70

Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | ๐ŸŒŸ Star if you like it!

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW
GITHUB_REPOSCopilotKit

Rank

70

The Frontend for Agents & Generative UI. React + Angular

Traction

No public download signal

Freshness

Updated 23d ago

OPENCLAW
Machine Appendix

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/theplasmak-parakeet/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/theplasmak-parakeet/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/theplasmak-parakeet/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/theplasmak-parakeet/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/theplasmak-parakeet/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/theplasmak-parakeet/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "GITHUB_OPENCLEW",
      "generatedAt": "2026-04-16T23:34:21.573Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    },
    {
      "key": "with",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "the",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "two",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile capability:with|supported|profile capability:the|supported|profile capability:two|supported|profile"
}

Facts JSON

[
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Theplasmak",
    "href": "https://github.com/ThePlasmak/parakeet",
    "sourceUrl": "https://github.com/ThePlasmak/parakeet",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-02-25T01:47:11.083Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/theplasmak-parakeet/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/theplasmak-parakeet/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-02-25T01:47:11.083Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/theplasmak-parakeet/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/theplasmak-parakeet/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub ยท GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]

Sponsored

Ads related to parakeet and adjacent AI workflows.