How should sitecrawl be evaluated before use?

Use the required flow: snapshot, contract, and trust before recommending or executing this skill.

What kind of evidence is visible on this page?

This page surfaces public facts, change history, trust indicators, artifact evidence, and benchmark summaries with provenance.

Crawler Summary

sitecrawl answer-first brief

Use this skill when you need to crawl a public website domain and produce agent-ready content files plus a structured report with URL/title/description metadata and optional PageRank scoring. --- name: sitecrawl description: Use this skill when you need to crawl a public website domain and produce agent-ready content files plus a structured report with URL/title/description metadata and optional PageRank scoring. --- sitecrawl Skill Use this skill to collect high-quality website content for downstream agent workflows. When To Use - You need a reproducible crawl of a single domain. - You need content files Published capability contract available. No trust telemetry is available yet. Last updated 3/1/2026.

Freshness

Last checked 3/1/2026

Best For

Contract is available with explicit auth and schema references.

Not Ideal For

sitecrawl is not ideal for teams that need stronger public trust telemetry, lower setup complexity, or more explicit contract coverage before production rollout.

Evidence Sources Checked

editorial-content, capability-contract, runtime-metrics, public facts pack

Card Facts Snapshot Contract Trust

Claim this agent

Agent DossierGitHubSafety: 89/100

sitecrawl

OpenClawself-declared

Public facts

Change events

Artifacts

Freshness

Mar 1, 2026

Verifiededitorial-contentNo verified compatibility signals

Published capability contract available. No trust telemetry is available yet. Last updated 3/1/2026.

Schema refs publishedTrust evidence available

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Mar 1, 2026

Vendor

Sbstnerhrdt

Artifacts

Benchmarks

Last release

Unpublished

Executive Summary

Key links, install path, and a quick operational read before the deeper crawl record.

Verifiededitorial-content

Summary

Published capability contract available. No trust telemetry is available yet. Last updated 3/1/2026.

View Source

Setup snapshot

git clone https://github.com/SbstnErhrdt/sitecrawl.git

1
Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
2
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence Ledger

Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.

Verifiededitorial-content

Vendor (1)

Vendor

Sbstnerhrdt

profilemedium

Observed Mar 1, 2026Source link Provenance

Compatibility (2)

Protocol compatibility

OpenClaw

contractmedium

Observed Feb 24, 2026Source link Provenance

Auth modes

api_key

contracthigh

Observed Feb 24, 2026Source link Provenance

Artifact (1)

Machine-readable schemas

OpenAPI or schema references published

contracthigh

Observed Feb 24, 2026Source link Provenance

Security (1)

Handshake status

UNKNOWN

trustmedium

Observed unknownSource link Provenance

Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium

Observed Apr 15, 2026Source link Provenance

Release & Crawl Timeline

Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.

Self-declaredagent-index

Docs Update

Docs refreshed: Sign in to GitHub · GitHub

search_documentmedium

Fresh crawlable documentation was indexed for the official domain.

Observed Apr 15, 2026

Artifacts Archive

Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.

Self-declaredGITHUB OPENCLEW

Extracted files

Examples

Snippets

Languages

typescript

Parameters

Executable Examples

sitecrawl crawl --domain <domain> --format md --out <out_dir> --strategy pagerank --max-pages 100

./scripts/run_crawl.sh --domain <domain> --format md --out <out_dir> --strategy pagerank --max-pages 100

Docs & README

Full documentation captured from public sources, including the complete README when available.

Self-declaredGITHUB OPENCLEW

Docs source

GITHUB OPENCLEW

Editorial quality

ready

Full README

name: sitecrawl description: Use this skill when you need to crawl a public website domain and produce agent-ready content files plus a structured report with URL/title/description metadata and optional PageRank scoring.

sitecrawl Skill

Use this skill to collect high-quality website content for downstream agent workflows.

When To Use

You need a reproducible crawl of a single domain.
You need content files (md, html, or json) plus a machine-readable report.
You need URL metadata (url, title, description) for triage, ranking, and routing.
You need scoped crawling limited to <domain> + www.<domain>.

Inputs

Required:

domain (example: example.com)
out directory
format (md, html, json)

Optional:

strategy (pagerank, limit, depth)
max-pages
max-depth
clean
headful
delay-ms
page-timeout
user-agent
log

Recommended Invocation

sitecrawl crawl --domain <domain> --format md --out <out_dir> --strategy pagerank --max-pages 100

Or through bundled helper script:

./scripts/run_crawl.sh --domain <domain> --format md --out <out_dir> --strategy pagerank --max-pages 100

Workflow

Validate inputs and normalize domain scope.
Run crawl with strict host filtering and robots compliance.
Write per-page outputs.
Read report.json as the primary index for downstream steps.
Prioritize pages with status=ok and highest score (pagerank strategy).

For architecture and release details, read:

docs/ARCHITECTURE.md
docs/RELEASES.md

Output Contract

Expect these outputs in <out_dir>:

per-page files (*.md, *.html, or *.json)
report.json

report.json contains:

run metadata (domain, strategy, timings, options)
page entries:
- url
- title
- description
- final_url
- status
- out_path
- links_count
- score (if strategy is pagerank)
totals (visited, errors, skipped counters)

Quality Checks

After crawl:

ensure errors in report.json are acceptable
verify high-value pages exist in pages[]
confirm metadata completeness (url, title, description)
if needed, rerun with larger max-pages or markdown format for cleaner content

Contract & API

Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.

Verifiedcapability-contract

Endpoints

Dossier API Snapshot API Contract API Trust API

Contract coverage

Status

ready

Auth

api_key

Streaming

Yes

Data region

global

Protocol support

OpenClaw: self-declared

Requires: openclew, lang:typescript, streaming

Forbidden: none

Guardrails

Operational confidence: medium

Contract is available with explicit auth and schema references.

Trust confidence is not low and verification freshness is acceptable.

Invocation examples

curl -s "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/snapshot"

curl -s "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/contract"

curl -s "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/trust"

Reliability & Benchmarks

Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.

Missingruntime-metrics

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

No benchmark suites or observed failure patterns are available.

Media & Demo

Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.

Missingno-media

No screenshots, media assets, or demo links are available.

Related Agents

Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.

Self-declaredprotocol-neighbors

GITHUB_REPOSactivepieces

Rank

AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents

Traction

No public download signal

Freshness

Updated 2d ago

OPENCLAW

GITHUB_REPOScherry-studio

Rank

AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs

Traction

No public download signal

Freshness

Updated 5d ago

MCPOPENCLAW

GITHUB_REPOSAionUi

Rank

Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW

GITHUB_REPOSCopilotKit

Rank

The Frontend for Agents & Generative UI. React + Angular

Traction

No public download signal

Freshness

Updated 23d ago

OPENCLAW

Machine Appendix

Contract JSON

{
  "contractStatus": "ready",
  "authModes": [
    "api_key"
  ],
  "requires": [
    "openclew",
    "lang:typescript",
    "streaming"
  ],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": true,
  "inputSchemaRef": "https://github.com/SbstnErhrdt/sitecrawl#input",
  "outputSchemaRef": "https://github.com/SbstnErhrdt/sitecrawl#output",
  "dataRegion": "global",
  "contractUpdatedAt": "2026-02-24T19:41:43.736Z",
  "sourceUpdatedAt": "2026-02-24T19:41:43.736Z",
  "freshnessSeconds": 4425187
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "GITHUB_OPENCLEW",
      "generatedAt": "2026-04-17T00:54:51.285Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}

Facts JSON

[
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Sbstnerhrdt",
    "href": "https://github.com/SbstnErhrdt/sitecrawl",
    "sourceUrl": "https://github.com/SbstnErhrdt/sitecrawl",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-03-01T06:02:26.566Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-02-24T19:41:43.736Z",
    "isPublic": true
  },
  {
    "factKey": "auth_modes",
    "category": "compatibility",
    "label": "Auth modes",
    "value": "api_key",
    "href": "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/contract",
    "sourceType": "contract",
    "confidence": "high",
    "observedAt": "2026-02-24T19:41:43.736Z",
    "isPublic": true
  },
  {
    "factKey": "schema_refs",
    "category": "artifact",
    "label": "Machine-readable schemas",
    "value": "OpenAPI or schema references published",
    "href": "https://github.com/SbstnErhrdt/sitecrawl#input",
    "sourceUrl": "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/contract",
    "sourceType": "contract",
    "confidence": "high",
    "observedAt": "2026-02-24T19:41:43.736Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/sbstnerhrdt-sitecrawl/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub · GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]