How should research-extract be evaluated before use?

Use the required flow: snapshot, contract, and trust before recommending or executing this skill.

What kind of evidence is visible on this page?

This page surfaces public facts, change history, trust indicators, artifact evidence, and benchmark summaries with provenance.

Crawler Summary

research-extract answer-first brief

Ingest and analyze content from YouTube, podcasts, blogs, PDFs, and audio files. Extract structured insights using parallel agent teams. Generate Show Notes and Cheat Sheet HTML variants. Use /research-extract when you want to analyze any content source and extract key insights, quotes, themes, challenges, solutions, frameworks, and external resources. --- name: research-extract description: >- Ingest and analyze content from YouTube, podcasts, blogs, PDFs, and audio files. Extract structured insights using parallel agent teams. Generate Show Notes and Cheat Sheet HTML variants. Use /research-extract when you want to analyze any content source and extract key insights, quotes, themes, challenges, solutions, frameworks, and external resources. --- Research Extract I Capability contract not published. No trust telemetry is available yet. 4 GitHub stars reported by the source. Last updated 4/15/2026.

Freshness

Last checked 4/15/2026

Best For

research-extract is best for general automation workflows where OpenClaw compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack

Card Facts Snapshot Contract Trust

Claim this agent

Agent DossierGitHubSafety: 94/100

research-extract

OpenClawself-declared

Public facts

Change events

Artifacts

Freshness

Apr 15, 2026

Verifiededitorial-contentNo verified compatibility signals4 GitHub stars

Capability contract not published. No trust telemetry is available yet. 4 GitHub stars reported by the source. Last updated 4/15/2026.

4 GitHub starsTrust evidence available

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Apr 15, 2026

Vendor

Katyella

Artifacts

Benchmarks

Last release

Unpublished

Executive Summary

Key links, install path, and a quick operational read before the deeper crawl record.

Verifiededitorial-content

Summary

Capability contract not published. No trust telemetry is available yet. 4 GitHub stars reported by the source. Last updated 4/15/2026.

View Source

Setup snapshot

git clone https://github.com/katyella/research-extract.git

1
Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
2
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence Ledger

Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.

Verifiededitorial-content

Vendor (1)

Vendor

Katyella

profilemedium

Observed Apr 15, 2026Source link Provenance

Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium

Observed Apr 15, 2026Source link Provenance

Adoption (1)

Adoption signal

4 GitHub stars

profilemedium

Observed Apr 15, 2026Source link Provenance

Security (1)

Handshake status

UNKNOWN

trustmedium

Observed unknownSource link Provenance

Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium

Observed Apr 15, 2026Source link Provenance

Release & Crawl Timeline

Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.

Self-declaredagent-index

Docs Update

Docs refreshed: Sign in to GitHub · GitHub

search_documentmedium

Fresh crawlable documentation was indexed for the official domain.

Observed Apr 15, 2026

Artifacts Archive

Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.

Self-declaredGITHUB OPENCLEW

Extracted files

Examples

Snippets

Languages

typescript

Parameters

Executable Examples

bash

bash .claude/skills/research-extract/scripts/setup.sh

bash

python3 .claude/skills/research-extract/scripts/ingest.py "[URL_OR_PATH]" --slug [SLUG]

bash

python3 .claude/skills/research-extract/scripts/extract.py chunk --slug [SLUG]

text

TeamCreate with team_name: "extract-[SLUG]"

text

For a transcript with 6 chunks, create 6 tasks:

TaskCreate: "Extract chunk 0 for [SLUG]"
TaskCreate: "Extract chunk 1 for [SLUG]"
...
TaskCreate: "Extract chunk 5 for [SLUG]"

text

Task tool (x3) with:
  team_name: "extract-[SLUG]"
  name: "extractor-1" / "extractor-2" / "extractor-3"
  subagent_type: "general-purpose"

Docs & README

Full documentation captured from public sources, including the complete README when available.

Self-declaredGITHUB OPENCLEW

Docs source

GITHUB OPENCLEW

Editorial quality

ready

Full README

name: research-extract description: >- Ingest and analyze content from YouTube, podcasts, blogs, PDFs, and audio files. Extract structured insights using parallel agent teams. Generate Show Notes and Cheat Sheet HTML variants. Use /research-extract when you want to analyze any content source and extract key insights, quotes, themes, challenges, solutions, frameworks, and external resources.

Research Extract

Ingest content from various sources and extract structured insights using parallel agent team processing.

First-Time Setup

Run once to install Python dependencies:

bash .claude/skills/research-extract/scripts/setup.sh

This checks for required system dependencies (yt-dlp, pdftotext, whisper).

Commands

When the user invokes this skill, determine their intent:

Ingest Commands

"ingest [url]" or "add [url]" → Ingest a new source
"ingest [url] as [slug]" → Ingest with custom slug name
"ingest [filepath]" → Ingest a local file (text, PDF, audio)

Analysis Commands

"analyze [slug]" or "extract from [slug]" → Run full extraction with agent team
"analyze [url] as [slug]" → Ingest AND run extraction in one step

Variant Commands

"variants [slug]" → Generate Show Notes + Cheat Sheet HTML from consolidated JSON

Query Commands

"list sources" → Show all ingested sources
"show [slug]" → Show source details and extractions
"progress" → Check extraction progress

Source Naming

Always use descriptive slugs, not numeric IDs.

When ingesting, derive or ask for a slug:

YouTube: mfm-6-ideas, lex-altman-interview
Blog: paul-graham-founder-mode
File: use filename without extension

The slug is used for all file paths and database lookups.

Parallel Extraction Workflow

This is the core workflow for extracting insights from content.

Step 1: Ingest the source

python3 .claude/skills/research-extract/scripts/ingest.py "[URL_OR_PATH]" --slug [SLUG]

This will:

Auto-detect source type (YouTube, blog, PDF, text, audio)
Download captions or transcribe with Whisper
Store metadata and transcript in .research-extract/sources/

Step 2: Chunk the transcript

python3 .claude/skills/research-extract/scripts/extract.py chunk --slug [SLUG]

This will:

Split transcript into ~15k character chunks
Save chunks to .research-extract/chunks/[slug]_chunk_N.json
Initialize progress tracking
Report number of chunks created

Step 3: Create team and spawn teammates

Create an Agent Team to coordinate parallel extraction:

3a. Create the team:

TeamCreate with team_name: "extract-[SLUG]"

3b. Create tasks - one task per chunk. This lets teammates self-balance workload:

For a transcript with 6 chunks, create 6 tasks:

TaskCreate: "Extract chunk 0 for [SLUG]"
TaskCreate: "Extract chunk 1 for [SLUG]"
...
TaskCreate: "Extract chunk 5 for [SLUG]"

Each task description should include the full processing instructions (see teammate prompt below).

3c. Spawn 3 teammates in a SINGLE message (parallel tool calls):

Task tool (x3) with:
  team_name: "extract-[SLUG]"
  name: "extractor-1" / "extractor-2" / "extractor-3"
  subagent_type: "general-purpose"

Teammate prompt template:

You are an extraction teammate on team "extract-[SLUG]".

Your workflow:
1. Check TaskList for pending tasks with no owner
2. Claim a task using TaskUpdate (set owner to your name, status to in_progress)
3. Process the chunk:
   a. Read chunk: python3 .claude/skills/research-extract/scripts/process_chunk.py show --slug [SLUG] --chunk-id [CHUNK_ID]
   b. Analyze the content for: key insights, notable quotes, themes, challenges, solutions/approaches, action items, frameworks/models, and external resources
   c. Save result: python3 .claude/skills/research-extract/scripts/process_chunk.py save --slug [SLUG] --chunk-id [CHUNK_ID] --result '<json>'
4. Mark task completed via TaskUpdate
5. Check TaskList again - claim the next pending task
6. Repeat until no pending tasks remain

JSON format for save:
{
  "chunk_id": X,
  "key_insights": [
    {"title": "...", "description": "...", "speaker": "Name", "quote": "...", "significance": "..."}
  ],
  "quotes": [
    {"quote": "...", "speaker": "Name", "context": "..."}
  ],
  "themes": ["theme1", "theme2"],
  "challenges": [
    {"title": "...", "description": "...", "speaker": "Name", "quote": "..."}
  ],
  "solutions_approaches": [
    {"title": "...", "description": "...", "speaker": "Name", "implementation": "...", "quote": "..."}
  ],
  "action_items": [
    {"action": "...", "context": "...", "speaker": "Name"}
  ],
  "frameworks_models": [
    {"name": "...", "description": "...", "speaker": "Name", "quote": "..."}
  ],
  "external_resources": [
    {"type": "book|podcast|tool|person|course|website|other", "name": "...", "author": "...", "speaker": "who mentioned", "context": "...", "quote": "..."}
  ]
}

IMPORTANT: Capture ALL external resources - books, people referenced/quoted, tools, platforms, podcasts, courses, frameworks.
Work autonomously. Claim tasks, process them, move to next. Stop when no tasks remain.

IMPORTANT:

Create 1 task per chunk (teammates self-balance by claiming work)
Spawn ALL teammates in a SINGLE message (parallel tool calls)
Use subagent_type="general-purpose" and team_name="extract-[SLUG]"
3 teammates is the default; adjust only for very small (1-2 chunks = 1 teammate) or very large (15+ chunks = 5 teammates) jobs

Step 4: Wait for completion

Teammates send automatic idle notifications as they finish work. Check progress via:

TaskList - shows task completion status across all teammates

Fallback:

python3 .claude/skills/research-extract/scripts/extract.py progress

Step 5: Merge results and clean up team

Once all tasks show completed in TaskList:

5a. Merge results:

python3 .claude/skills/research-extract/scripts/extract.py merge --slug [SLUG]

This combines all chunk extractions into .research-extract/exports/[slug]_merged.json

5b. Shut down teammates:

SendMessage type: "shutdown_request" to extractor-1
SendMessage type: "shutdown_request" to extractor-2
SendMessage type: "shutdown_request" to extractor-3

5c. Clean up team:

TeamDelete

Step 6: Consolidate and rank

After merging, create a consolidated analysis that:

Deduplicates similar insights, challenges, and solutions
Ranks by frequency/importance
Identifies top quotes
Groups external resources by type
Counts theme frequency

Save to .research-extract/exports/[slug]_consolidated.json with this structure:

{
  "source": "Source title",
  "speakers": ["Speaker 1", "Speaker 2"],
  "url": "https://...",

  "key_insights": [
    {
      "rank": 1,
      "title": "Insight title",
      "description": "Synthesized description",
      "evidence": ["quote 1", "quote 2"],
      "speakers": ["Speaker 1"]
    }
  ],

  "themes": [
    {"theme": "Theme name", "frequency": 5}
  ],

  "challenges": [
    {
      "rank": 1,
      "title": "Challenge title",
      "description": "What the challenge is",
      "evidence": ["quote"],
      "speakers": ["Speaker 1"]
    }
  ],

  "solutions_approaches": [
    {
      "rank": 1,
      "title": "Solution title",
      "description": "What it is",
      "implementation": "How to do it",
      "evidence": ["quote"],
      "speakers": ["Speaker 1"]
    }
  ],

  "action_items": [
    {"action": "What to do", "context": "Why", "speaker": "Who"}
  ],

  "frameworks_models": [
    {"name": "Framework name", "description": "How it works", "speaker": "Who"}
  ],

  "top_quotes": [
    {
      "quote": "...",
      "speaker": "...",
      "context": "Why this quote matters"
    }
  ],

  "external_resources": {
    "books": [
      {"title": "Book Title", "author": "Author", "mentioned_by": "Speaker", "context": "Why referenced"}
    ],
    "people": [
      {"name": "Person Name", "mentioned_by": "Speaker", "context": "Why referenced"}
    ],
    "tools": [
      {"name": "Tool Name", "mentioned_by": "Speaker", "context": "How used"}
    ],
    "other": [
      {"type": "podcast|course|website|framework", "name": "...", "mentioned_by": "Speaker", "context": "..."}
    ]
  },

  "metadata": {
    "extraction_date": "2025-01-01T00:00:00",
    "total_chunks": 6,
    "source_type": "youtube"
  }
}

Step 7: Generate writeup files

After consolidation, generate two markdown files:

File 1: .research-extract/writeups/[slug]-notes.md (Quick Reference)

Structure:

Header with source, speakers, date
Sections matching main topics from the content
Bullet points with key facts
Timestamps as clickable links where available: [(4:57)](url#t=4m57s)
Key quotes in blockquotes with speaker attribution
Table of best quotes at bottom

File 2: .research-extract/writeups/[slug]-writeup.md (Essay)

Structure:

Introduction framing the topic's significance
Body sections exploring each major theme with developed paragraphs
Quotes woven into prose with timestamp citations
Conclusion synthesizing implications

Writing style: See STYLE_GUIDE.md for detailed guidance on paragraph style, quote integration, voice, and punctuation rules.

Step 8: Generate variant pages (on demand)

When the user runs variants [slug]:

Read consolidated JSON from .research-extract/exports/[slug]_consolidated.json
Read raw transcript from .research-extract/sources/[slug].txt (for timestamps)
Read template specs from VARIANTS.md
Create output directory: .research-extract/variants/[slug]/
Generate two self-contained HTML files:
- show-notes.html — podcast companion page with timestamped sections, quote callouts, resource grids
- cheat-sheet.html — print-optimized landscape reference card with color-coded sections
Open both files in the browser: open .research-extract/variants/[slug]/show-notes.html .research-extract/variants/[slug]/cheat-sheet.html

Follow all layout, CSS, and generation rules in VARIANTS.md exactly. Every insight, quote, tool, framework, and action item from the consolidated JSON should appear in at least one template.

Step 9: Present summary to user

Show:

Source slug and title
Key insights found (ranked)
Challenges and solutions found (ranked)
Key themes with frequency
Top 3-5 quotes
External resources mentioned (books, people, tools, etc.)
Links to generated files

Storage Locations

All data stored in {project_root}/.research-extract/:

Source metadata: sources/[slug].json
Transcripts: sources/[slug].txt
Chunks: chunks/[slug]_chunk_N.json
Chunk results: exports/[slug]_chunk_N_result.json
Merged results: exports/[slug]_merged.json
Consolidated: exports/[slug]_consolidated.json
Progress: extraction_progress_[slug].json
Notes writeup: writeups/[slug]-notes.md
Essay writeup: writeups/[slug]-writeup.md
Variant pages: variants/[slug]/show-notes.html, variants/[slug]/cheat-sheet.html

For Queries

List sources (reads sources/*.json flat files):

python3 -c "
import sys; sys.path.insert(0, '.claude/skills/research-extract/scripts')
from db import list_sources
for s in list_sources():
    print(f'{s[\"slug\"]}: [{s[\"source_type\"]}] {s[\"title\"]}')"

Get source details (reads sources/{slug}.json + sources/{slug}.txt):

python3 -c "
import sys; sys.path.insert(0, '.claude/skills/research-extract/scripts')
from db import get_source_by_slug
source = get_source_by_slug('[SLUG]')
print(f'Title: {source[\"title\"]}')
print(f'Type: {source[\"source_type\"]}')"

Tips

Always use Agent Teams for large transcripts (>20k chars)
3 teammates is the default; scale to 1 for tiny jobs (1-2 chunks) or 5 for large (15+ chunks)
Create 1 task per chunk, teammates self-balance by claiming work from TaskList
Merge deduplicates overlapping insights automatically
Consolidation step is where ranking and synthesis happens
Writeups and variants require the consolidation step to complete first
Run variants [slug] to generate Show Notes + Cheat Sheet HTML pages
Always shut down teammates and call TeamDelete after merging

Contract & API

Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.

MissingGITHUB OPENCLEW

Endpoints

Dossier API Snapshot API Contract API Trust API

Contract coverage

Status

missing

Auth

None

Streaming

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.

Invocation examples

curl -s "https://xpersona.co/api/v1/agents/katyella-research-extract/snapshot"

curl -s "https://xpersona.co/api/v1/agents/katyella-research-extract/contract"

curl -s "https://xpersona.co/api/v1/agents/katyella-research-extract/trust"

Reliability & Benchmarks

Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.

Missingruntime-metrics

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.

No benchmark suites or observed failure patterns are available.

Media & Demo

Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.

Missingno-media

No screenshots, media assets, or demo links are available.

Related Agents

Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.

Self-declaredprotocol-neighbors

GITHUB_REPOSactivepieces

Rank

AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents

Traction

No public download signal

Freshness

Updated 2d ago

OPENCLAW

GITHUB_REPOScherry-studio

Rank

AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs

Traction

No public download signal

Freshness

Updated 5d ago

MCPOPENCLAW

GITHUB_REPOSAionUi

Rank

Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW

GITHUB_REPOSCopilotKit

Rank

The Frontend for Agents & Generative UI. React + Angular

Traction

No public download signal

Freshness

Updated 23d ago

OPENCLAW

Machine Appendix

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/katyella-research-extract/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/katyella-research-extract/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/katyella-research-extract/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/katyella-research-extract/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/katyella-research-extract/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/katyella-research-extract/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "GITHUB_OPENCLEW",
      "generatedAt": "2026-04-16T23:36:09.800Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}

Facts JSON

[
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Katyella",
    "href": "https://github.com/katyella/research-extract",
    "sourceUrl": "https://github.com/katyella/research-extract",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T01:15:06.609Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/katyella-research-extract/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/katyella-research-extract/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-04-15T01:15:06.609Z",
    "isPublic": true
  },
  {
    "factKey": "traction",
    "category": "adoption",
    "label": "Adoption signal",
    "value": "4 GitHub stars",
    "href": "https://github.com/katyella/research-extract",
    "sourceUrl": "https://github.com/katyella/research-extract",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T01:15:06.609Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/katyella-research-extract/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/katyella-research-extract/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub · GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]