Crawler Summary

skill-web-scraper answer-first brief

SKILL: skill-web-scraper SKILL: skill-web-scraper **Extract anything. Understand everything.** *Powered by Jnana (Sanskrit: ज्ञान — knowledge/wisdom)* By $1 --- Overview skill-web-scraper is a production-grade intelligent web scraping skill for OpenClaw. It exposes the **Jnana extraction engine** — a pipeline that combines CSS selectors, Zod schema validation, and optional LLM post-processing to extract typed, structured data from any web pa Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.

Freshness

Last checked 2/25/2026

Best For

skill-web-scraper is best for general automation workflows where OpenClaw compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack

Claim this agent
Agent DossierGitHubSafety: 89/100

skill-web-scraper

SKILL: skill-web-scraper SKILL: skill-web-scraper **Extract anything. Understand everything.** *Powered by Jnana (Sanskrit: ज्ञान — knowledge/wisdom)* By $1 --- Overview skill-web-scraper is a production-grade intelligent web scraping skill for OpenClaw. It exposes the **Jnana extraction engine** — a pipeline that combines CSS selectors, Zod schema validation, and optional LLM post-processing to extract typed, structured data from any web pa

OpenClawself-declared

Public facts

4

Change events

1

Artifacts

0

Freshness

Feb 25, 2026

Verifiededitorial-contentNo verified compatibility signals

Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.

Trust evidence available

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Feb 25, 2026

Vendor

Darshjme Codes

Artifacts

0

Benchmarks

0

Last release

Unpublished

Executive Summary

Key links, install path, and a quick operational read before the deeper crawl record.

Verifiededitorial-content

Summary

Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.

Setup snapshot

git clone https://github.com/darshjme-codes/skill-web-scraper.git
  1. 1

    Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.

  2. 2

    Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence Ledger

Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.

Verifiededitorial-content
Vendor (1)

Vendor

Darshjme Codes

profilemedium
Observed Feb 25, 2026Source linkProvenance
Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium
Observed Feb 25, 2026Source linkProvenance
Security (1)

Handshake status

UNKNOWN

trustmedium
Observed unknownSource linkProvenance
Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium
Observed Apr 15, 2026Source linkProvenance

Release & Crawl Timeline

Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.

Self-declaredagent-index

Artifacts Archive

Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.

Self-declaredGITHUB OPENCLEW

Extracted files

0

Examples

6

Snippets

0

Languages

typescript

Parameters

Executable Examples

bash

npm install skill-web-scraper
# Optional: for JS rendering
npm install playwright
npx playwright install chromium

typescript

import { WebScraper, articleExtractor } from 'skill-web-scraper';

const scraper = new WebScraper();

const result = await scraper.extract({
  url: 'https://example.com/article',
  schema: articleExtractor(),
});

console.log(result.data.title);    // "Article Title"
console.log(result.data.body);     // "Full article text..."
console.log(result.fromCache);     // false (first fetch)

typescript

// In your OpenClaw skill handler
import { WebScraper, articleExtractor, productExtractor } from 'skill-web-scraper';

export async function handleScrapeRequest(input: { url: string; type: string }) {
  const scraper = new WebScraper({
    antiDetection: { minDelayMs: 1000, maxDelayMs: 3000 },
    llmComplete: async (prompt) => {
      // Hook into your LLM provider
      return await callYourLLM(prompt);
    },
  });

  const schema = input.type === 'product' ? productExtractor() : articleExtractor();
  const result = await scraper.extract({ url: input.url, schema });
  return result.data;
}

typescript

const scraper = new WebScraper(config?: WebScraperConfig);

typescript

const result: ExtractResult<T> = await scraper.extract({
  url: string,
  schema: ExtractionSchema<T>,
  renderMode?: 'static' | 'playwright',
  requestOptions?: RequestOptions,
  outputFormat?: 'json' | 'csv' | 'markdown' | 'text',
});

typescript

const result: CrawlResult<T> = await scraper.crawl({
  startUrls: string | string[],
  schema: ExtractionSchema<T>,
  nextPageSelector?: string,   // CSS selector for next-page link
  maxPages?: number,           // default: 50
  maxDepth?: number,           // default: 3
  sameDomainOnly?: boolean,    // default: true
  renderMode?: 'static' | 'playwright',
  onPage?: (result, pageIndex) => void,
});

Docs & README

Full documentation captured from public sources, including the complete README when available.

Self-declaredGITHUB OPENCLEW

Docs source

GITHUB OPENCLEW

Editorial quality

ready

SKILL: skill-web-scraper SKILL: skill-web-scraper **Extract anything. Understand everything.** *Powered by Jnana (Sanskrit: ज्ञान — knowledge/wisdom)* By $1 --- Overview skill-web-scraper is a production-grade intelligent web scraping skill for OpenClaw. It exposes the **Jnana extraction engine** — a pipeline that combines CSS selectors, Zod schema validation, and optional LLM post-processing to extract typed, structured data from any web pa

Full README

SKILL: skill-web-scraper

Extract anything. Understand everything.
Powered by Jnana (Sanskrit: ज्ञान — knowledge/wisdom)
By Darshj.me


Overview

skill-web-scraper is a production-grade intelligent web scraping skill for OpenClaw. It exposes the Jnana extraction engine — a pipeline that combines CSS selectors, Zod schema validation, and optional LLM post-processing to extract typed, structured data from any web page.

Tagline: Extract anything. Understand everything.


Capabilities

| Capability | Description | |---|---| | extract(url, schema) | Extract structured data from a single URL using CSS selectors + Zod validation | | crawl(url, schema) | Follow pagination/next-page links, depth-limited BFS crawl | | batch(urls, schema) | Concurrent extraction from multiple URLs with configurable parallelism | | Article extractor | Title, author, date, body, tags, image — works on most blogs/news sites | | Product extractor | Name, price, currency, SKU, availability, rating — broad e-commerce coverage | | Table extractor | Parse HTML tables into typed {headers, rows} structures | | Link extractor | All <a> links, resolved, filtered, with rel/title attributes | | Contact extractor | Emails (regex), phones, addresses, social profile links | | Anti-detection | Rotating user agents, stealth headers, per-domain rate limiting, robots.txt compliance | | JS rendering | Optional Playwright integration for SPAs and JS-heavy sites | | Output formats | JSON, CSV, Markdown, plain text | | Caching | ETag + Last-Modified aware — skip unchanged pages | | LLM fallback | When CSS selectors miss, pass page text to an LLM for extraction |


Installation

npm install skill-web-scraper
# Optional: for JS rendering
npm install playwright
npx playwright install chromium

Quick Start

import { WebScraper, articleExtractor } from 'skill-web-scraper';

const scraper = new WebScraper();

const result = await scraper.extract({
  url: 'https://example.com/article',
  schema: articleExtractor(),
});

console.log(result.data.title);    // "Article Title"
console.log(result.data.body);     // "Full article text..."
console.log(result.fromCache);     // false (first fetch)

OpenClaw Integration

// In your OpenClaw skill handler
import { WebScraper, articleExtractor, productExtractor } from 'skill-web-scraper';

export async function handleScrapeRequest(input: { url: string; type: string }) {
  const scraper = new WebScraper({
    antiDetection: { minDelayMs: 1000, maxDelayMs: 3000 },
    llmComplete: async (prompt) => {
      // Hook into your LLM provider
      return await callYourLLM(prompt);
    },
  });

  const schema = input.type === 'product' ? productExtractor() : articleExtractor();
  const result = await scraper.extract({ url: input.url, schema });
  return result.data;
}

API Reference

WebScraper

Main class. Instantiate once and reuse across requests.

const scraper = new WebScraper(config?: WebScraperConfig);

Config options:

| Option | Type | Default | Description | |---|---|---|---| | antiDetection | AntiDetectionConfig | {} | Rate limiting, user agents, robots.txt | | cache | CacheStore | MemoryCacheStore | Custom cache backend | | cacheEnabled | boolean | true | Enable/disable caching | | defaultRenderMode | 'static' \| 'playwright' | 'static' | Default render mode | | llmComplete | (prompt: string) => Promise<string> | undefined | LLM fallback for extraction |

scraper.extract(options)

const result: ExtractResult<T> = await scraper.extract({
  url: string,
  schema: ExtractionSchema<T>,
  renderMode?: 'static' | 'playwright',
  requestOptions?: RequestOptions,
  outputFormat?: 'json' | 'csv' | 'markdown' | 'text',
});

scraper.crawl(options)

const result: CrawlResult<T> = await scraper.crawl({
  startUrls: string | string[],
  schema: ExtractionSchema<T>,
  nextPageSelector?: string,   // CSS selector for next-page link
  maxPages?: number,           // default: 50
  maxDepth?: number,           // default: 3
  sameDomainOnly?: boolean,    // default: true
  renderMode?: 'static' | 'playwright',
  onPage?: (result, pageIndex) => void,
});

scraper.batch(options)

const result: BatchResult<T> = await scraper.batch({
  urls: string[],
  schema: ExtractionSchema<T>,
  concurrency?: number,        // default: 3
  renderMode?: 'static' | 'playwright',
});

Built-in Extractors

articleExtractor()

Extracts: title, author, publishedAt, body, summary, tags[], imageUrl, canonicalUrl

productExtractor()

Extracts: name, price, currency, description, sku, availability, imageUrl, rating, reviewCount

extractLinks(html, baseUrl?)

Returns: LinkData[]{ href, text, title?, rel? }

extractContacts(html)

Returns: ContactData{ emails[], phones[], addresses[], socialLinks{} }

extractTables(html)

Returns: TableData[]{ headers[], rows[][] }


Custom Schema

import { WebScraper } from 'skill-web-scraper';
import { z } from 'zod';

const scraper = new WebScraper();

const result = await scraper.extract({
  url: 'https://site.com/page',
  schema: {
    selectors: {
      title: 'h1',
      price: { selector: '.price', transform: (v) => parseFloat(v.replace('$', '')) },
      images: { selector: 'img', attr: 'src', multiple: true },
    },
    schema: z.object({
      title: z.string(),
      price: z.number(),
      images: z.array(z.string()),
    }),
    llmPrompt: 'Extract title, price (as number), and all image URLs from this page.',
  },
});

Anti-Detection Guide

The Jnana engine uses a layered anti-detection approach:

  1. User-Agent rotation — 15-agent pool covering Chrome, Firefox, Safari, Edge, mobile browsers
  2. Stealth headers — Sec-Fetch-*, Accept, Accept-Language, Accept-Encoding exactly mirroring browser behaviour
  3. Randomized delays — Per-domain jitter between configurable min/max bounds
  4. robots.txt compliance — Fetched and cached per-domain; configurable (respectRobotsTxt: false to disable)
  5. Conditional requests — ETag/Last-Modified; sends 304-aware headers to avoid re-fetching unchanged pages
  6. Playwright stealth — When using JS rendering, headers are passed directly into the browser context

Configuration

const scraper = new WebScraper({
  antiDetection: {
    rotateUserAgents: true,    // Enable UA rotation (default: true)
    minDelayMs: 1000,          // Min delay per domain (default: 1000ms)
    maxDelayMs: 3000,          // Max delay per domain (default: 3000ms)
    respectRobotsTxt: true,    // Respect robots.txt (default: true)
    extraHeaders: {            // Additional custom headers
      'X-Forwarded-For': '203.0.113.0',
    },
  },
});

Environment

  • Node.js: ≥ 18.0.0 (uses native Fetch API)
  • TypeScript: 5.x (strict mode)
  • ESM only
  • Optional: Playwright 1.x for JS rendering

Ethical Use Notice

This skill is built for legitimate data extraction use cases: research, monitoring, archiving, and integration. Always:

  • Respect robots.txt (enabled by default)
  • Keep rate limits reasonable (1–3 second delays by default)
  • Check a site's Terms of Service before scraping
  • Never use for unauthorized access, PII harvesting, or harmful purposes

skill-web-scraper — By Darshj.me | MIT License
Jnana engine: Extract anything. Understand everything.

Contract & API

Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.

MissingGITHUB OPENCLEW

Contract coverage

Status

missing

Auth

None

Streaming

No

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.
Invocation examples
curl -s "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/snapshot"
curl -s "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/contract"
curl -s "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/trust"

Reliability & Benchmarks

Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.

Missingruntime-metrics

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.
No benchmark suites or observed failure patterns are available.

Media & Demo

Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.

Missingno-media
No screenshots, media assets, or demo links are available.

Related Agents

Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.

Self-declaredprotocol-neighbors
GITHUB_REPOSactivepieces

Rank

70

AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents

Traction

No public download signal

Freshness

Updated 2d ago

OPENCLAW
GITHUB_REPOScherry-studio

Rank

70

AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs

Traction

No public download signal

Freshness

Updated 5d ago

MCPOPENCLAW
GITHUB_REPOSAionUi

Rank

70

Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW
GITHUB_REPOSCopilotKit

Rank

70

The Frontend for Agents & Generative UI. React + Angular

Traction

No public download signal

Freshness

Updated 23d ago

OPENCLAW
Machine Appendix

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "GITHUB_OPENCLEW",
      "generatedAt": "2026-04-16T23:31:33.592Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}

Facts JSON

[
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Darshjme Codes",
    "href": "https://github.com/darshjme-codes/skill-web-scraper",
    "sourceUrl": "https://github.com/darshjme-codes/skill-web-scraper",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-02-25T01:47:20.564Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-02-25T01:47:20.564Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub · GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]

Sponsored

Ads related to skill-web-scraper and adjacent AI workflows.