Rank
70
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
Traction
No public download signal
Freshness
Updated 2d ago
Crawler Summary
SKILL: skill-web-scraper SKILL: skill-web-scraper **Extract anything. Understand everything.** *Powered by Jnana (Sanskrit: ज्ञान — knowledge/wisdom)* By $1 --- Overview skill-web-scraper is a production-grade intelligent web scraping skill for OpenClaw. It exposes the **Jnana extraction engine** — a pipeline that combines CSS selectors, Zod schema validation, and optional LLM post-processing to extract typed, structured data from any web pa Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.
Freshness
Last checked 2/25/2026
Best For
skill-web-scraper is best for general automation workflows where OpenClaw compatibility matters.
Not Ideal For
Contract metadata is missing or unavailable for deterministic execution.
Evidence Sources Checked
editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack
SKILL: skill-web-scraper SKILL: skill-web-scraper **Extract anything. Understand everything.** *Powered by Jnana (Sanskrit: ज्ञान — knowledge/wisdom)* By $1 --- Overview skill-web-scraper is a production-grade intelligent web scraping skill for OpenClaw. It exposes the **Jnana extraction engine** — a pipeline that combines CSS selectors, Zod schema validation, and optional LLM post-processing to extract typed, structured data from any web pa
Public facts
4
Change events
1
Artifacts
0
Freshness
Feb 25, 2026
Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.
Trust score
Unknown
Compatibility
OpenClaw
Freshness
Feb 25, 2026
Vendor
Darshjme Codes
Artifacts
0
Benchmarks
0
Last release
Unpublished
Key links, install path, and a quick operational read before the deeper crawl record.
Summary
Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.
Setup snapshot
git clone https://github.com/darshjme-codes/skill-web-scraper.gitSetup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.
Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.
Vendor
Darshjme Codes
Protocol compatibility
OpenClaw
Handshake status
UNKNOWN
Crawlable docs
6 indexed pages on the official domain
Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.
Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.
Extracted files
0
Examples
6
Snippets
0
Languages
typescript
Parameters
bash
npm install skill-web-scraper # Optional: for JS rendering npm install playwright npx playwright install chromium
typescript
import { WebScraper, articleExtractor } from 'skill-web-scraper';
const scraper = new WebScraper();
const result = await scraper.extract({
url: 'https://example.com/article',
schema: articleExtractor(),
});
console.log(result.data.title); // "Article Title"
console.log(result.data.body); // "Full article text..."
console.log(result.fromCache); // false (first fetch)typescript
// In your OpenClaw skill handler
import { WebScraper, articleExtractor, productExtractor } from 'skill-web-scraper';
export async function handleScrapeRequest(input: { url: string; type: string }) {
const scraper = new WebScraper({
antiDetection: { minDelayMs: 1000, maxDelayMs: 3000 },
llmComplete: async (prompt) => {
// Hook into your LLM provider
return await callYourLLM(prompt);
},
});
const schema = input.type === 'product' ? productExtractor() : articleExtractor();
const result = await scraper.extract({ url: input.url, schema });
return result.data;
}typescript
const scraper = new WebScraper(config?: WebScraperConfig);
typescript
const result: ExtractResult<T> = await scraper.extract({
url: string,
schema: ExtractionSchema<T>,
renderMode?: 'static' | 'playwright',
requestOptions?: RequestOptions,
outputFormat?: 'json' | 'csv' | 'markdown' | 'text',
});typescript
const result: CrawlResult<T> = await scraper.crawl({
startUrls: string | string[],
schema: ExtractionSchema<T>,
nextPageSelector?: string, // CSS selector for next-page link
maxPages?: number, // default: 50
maxDepth?: number, // default: 3
sameDomainOnly?: boolean, // default: true
renderMode?: 'static' | 'playwright',
onPage?: (result, pageIndex) => void,
});Full documentation captured from public sources, including the complete README when available.
Docs source
GITHUB OPENCLEW
Editorial quality
ready
SKILL: skill-web-scraper SKILL: skill-web-scraper **Extract anything. Understand everything.** *Powered by Jnana (Sanskrit: ज्ञान — knowledge/wisdom)* By $1 --- Overview skill-web-scraper is a production-grade intelligent web scraping skill for OpenClaw. It exposes the **Jnana extraction engine** — a pipeline that combines CSS selectors, Zod schema validation, and optional LLM post-processing to extract typed, structured data from any web pa
Extract anything. Understand everything.
Powered by Jnana (Sanskrit: ज्ञान — knowledge/wisdom)
By Darshj.me
skill-web-scraper is a production-grade intelligent web scraping skill for OpenClaw. It exposes the Jnana extraction engine — a pipeline that combines CSS selectors, Zod schema validation, and optional LLM post-processing to extract typed, structured data from any web page.
Tagline: Extract anything. Understand everything.
| Capability | Description |
|---|---|
| extract(url, schema) | Extract structured data from a single URL using CSS selectors + Zod validation |
| crawl(url, schema) | Follow pagination/next-page links, depth-limited BFS crawl |
| batch(urls, schema) | Concurrent extraction from multiple URLs with configurable parallelism |
| Article extractor | Title, author, date, body, tags, image — works on most blogs/news sites |
| Product extractor | Name, price, currency, SKU, availability, rating — broad e-commerce coverage |
| Table extractor | Parse HTML tables into typed {headers, rows} structures |
| Link extractor | All <a> links, resolved, filtered, with rel/title attributes |
| Contact extractor | Emails (regex), phones, addresses, social profile links |
| Anti-detection | Rotating user agents, stealth headers, per-domain rate limiting, robots.txt compliance |
| JS rendering | Optional Playwright integration for SPAs and JS-heavy sites |
| Output formats | JSON, CSV, Markdown, plain text |
| Caching | ETag + Last-Modified aware — skip unchanged pages |
| LLM fallback | When CSS selectors miss, pass page text to an LLM for extraction |
npm install skill-web-scraper
# Optional: for JS rendering
npm install playwright
npx playwright install chromium
import { WebScraper, articleExtractor } from 'skill-web-scraper';
const scraper = new WebScraper();
const result = await scraper.extract({
url: 'https://example.com/article',
schema: articleExtractor(),
});
console.log(result.data.title); // "Article Title"
console.log(result.data.body); // "Full article text..."
console.log(result.fromCache); // false (first fetch)
// In your OpenClaw skill handler
import { WebScraper, articleExtractor, productExtractor } from 'skill-web-scraper';
export async function handleScrapeRequest(input: { url: string; type: string }) {
const scraper = new WebScraper({
antiDetection: { minDelayMs: 1000, maxDelayMs: 3000 },
llmComplete: async (prompt) => {
// Hook into your LLM provider
return await callYourLLM(prompt);
},
});
const schema = input.type === 'product' ? productExtractor() : articleExtractor();
const result = await scraper.extract({ url: input.url, schema });
return result.data;
}
WebScraperMain class. Instantiate once and reuse across requests.
const scraper = new WebScraper(config?: WebScraperConfig);
Config options:
| Option | Type | Default | Description |
|---|---|---|---|
| antiDetection | AntiDetectionConfig | {} | Rate limiting, user agents, robots.txt |
| cache | CacheStore | MemoryCacheStore | Custom cache backend |
| cacheEnabled | boolean | true | Enable/disable caching |
| defaultRenderMode | 'static' \| 'playwright' | 'static' | Default render mode |
| llmComplete | (prompt: string) => Promise<string> | undefined | LLM fallback for extraction |
scraper.extract(options)const result: ExtractResult<T> = await scraper.extract({
url: string,
schema: ExtractionSchema<T>,
renderMode?: 'static' | 'playwright',
requestOptions?: RequestOptions,
outputFormat?: 'json' | 'csv' | 'markdown' | 'text',
});
scraper.crawl(options)const result: CrawlResult<T> = await scraper.crawl({
startUrls: string | string[],
schema: ExtractionSchema<T>,
nextPageSelector?: string, // CSS selector for next-page link
maxPages?: number, // default: 50
maxDepth?: number, // default: 3
sameDomainOnly?: boolean, // default: true
renderMode?: 'static' | 'playwright',
onPage?: (result, pageIndex) => void,
});
scraper.batch(options)const result: BatchResult<T> = await scraper.batch({
urls: string[],
schema: ExtractionSchema<T>,
concurrency?: number, // default: 3
renderMode?: 'static' | 'playwright',
});
articleExtractor()Extracts: title, author, publishedAt, body, summary, tags[], imageUrl, canonicalUrl
productExtractor()Extracts: name, price, currency, description, sku, availability, imageUrl, rating, reviewCount
extractLinks(html, baseUrl?)Returns: LinkData[] — { href, text, title?, rel? }
extractContacts(html)Returns: ContactData — { emails[], phones[], addresses[], socialLinks{} }
extractTables(html)Returns: TableData[] — { headers[], rows[][] }
import { WebScraper } from 'skill-web-scraper';
import { z } from 'zod';
const scraper = new WebScraper();
const result = await scraper.extract({
url: 'https://site.com/page',
schema: {
selectors: {
title: 'h1',
price: { selector: '.price', transform: (v) => parseFloat(v.replace('$', '')) },
images: { selector: 'img', attr: 'src', multiple: true },
},
schema: z.object({
title: z.string(),
price: z.number(),
images: z.array(z.string()),
}),
llmPrompt: 'Extract title, price (as number), and all image URLs from this page.',
},
});
The Jnana engine uses a layered anti-detection approach:
const scraper = new WebScraper({
antiDetection: {
rotateUserAgents: true, // Enable UA rotation (default: true)
minDelayMs: 1000, // Min delay per domain (default: 1000ms)
maxDelayMs: 3000, // Max delay per domain (default: 3000ms)
respectRobotsTxt: true, // Respect robots.txt (default: true)
extraHeaders: { // Additional custom headers
'X-Forwarded-For': '203.0.113.0',
},
},
});
This skill is built for legitimate data extraction use cases: research, monitoring, archiving, and integration. Always:
robots.txt (enabled by default)skill-web-scraper — By Darshj.me | MIT License
Jnana engine: Extract anything. Understand everything.
Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.
Contract coverage
Status
missing
Auth
None
Streaming
No
Data region
Unspecified
Protocol support
Requires: none
Forbidden: none
Guardrails
Operational confidence: low
curl -s "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/snapshot"
curl -s "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/contract"
curl -s "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/trust"
Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.
Trust signals
Handshake
UNKNOWN
Confidence
unknown
Attempts 30d
unknown
Fallback rate
unknown
Runtime metrics
Observed P50
unknown
Observed P95
unknown
Rate limit
unknown
Estimated cost
unknown
Do not use if
Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.
Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.
Rank
70
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
Traction
No public download signal
Freshness
Updated 2d ago
Rank
70
AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs
Traction
No public download signal
Freshness
Updated 5d ago
Rank
70
Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!
Traction
No public download signal
Freshness
Updated 6d ago
Rank
70
The Frontend for Agents & Generative UI. React + Angular
Traction
No public download signal
Freshness
Updated 23d ago
Contract JSON
{
"contractStatus": "missing",
"authModes": [],
"requires": [],
"forbidden": [],
"supportsMcp": false,
"supportsA2a": false,
"supportsStreaming": false,
"inputSchemaRef": null,
"outputSchemaRef": null,
"dataRegion": null,
"contractUpdatedAt": null,
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Invocation Guide
{
"preferredApi": {
"snapshotUrl": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/snapshot",
"contractUrl": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/contract",
"trustUrl": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/trust"
},
"curlExamples": [
"curl -s \"https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/snapshot\"",
"curl -s \"https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/contract\"",
"curl -s \"https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/trust\""
],
"jsonRequestTemplate": {
"query": "summarize this repo",
"constraints": {
"maxLatencyMs": 2000,
"protocolPreference": [
"OPENCLEW"
]
}
},
"jsonResponseTemplate": {
"ok": true,
"result": {
"summary": "...",
"confidence": 0.9
},
"meta": {
"source": "GITHUB_OPENCLEW",
"generatedAt": "2026-04-16T23:31:33.592Z"
}
},
"retryPolicy": {
"maxAttempts": 3,
"backoffMs": [
500,
1500,
3500
],
"retryableConditions": [
"HTTP_429",
"HTTP_503",
"NETWORK_TIMEOUT"
]
}
}Trust JSON
{
"status": "unavailable",
"handshakeStatus": "UNKNOWN",
"verificationFreshnessHours": null,
"reputationScore": null,
"p95LatencyMs": null,
"successRate30d": null,
"fallbackRate": null,
"attempts30d": null,
"trustUpdatedAt": null,
"trustConfidence": "unknown",
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Capability Matrix
{
"rows": [
{
"key": "OPENCLEW",
"type": "protocol",
"support": "unknown",
"confidenceSource": "profile",
"notes": "Listed on profile"
}
],
"flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}Facts JSON
[
{
"factKey": "docs_crawl",
"category": "integration",
"label": "Crawlable docs",
"value": "6 indexed pages on the official domain",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
},
{
"factKey": "vendor",
"category": "vendor",
"label": "Vendor",
"value": "Darshjme Codes",
"href": "https://github.com/darshjme-codes/skill-web-scraper",
"sourceUrl": "https://github.com/darshjme-codes/skill-web-scraper",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-02-25T01:47:20.564Z",
"isPublic": true
},
{
"factKey": "protocols",
"category": "compatibility",
"label": "Protocol compatibility",
"value": "OpenClaw",
"href": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/contract",
"sourceUrl": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/contract",
"sourceType": "contract",
"confidence": "medium",
"observedAt": "2026-02-25T01:47:20.564Z",
"isPublic": true
},
{
"factKey": "handshake_status",
"category": "security",
"label": "Handshake status",
"value": "UNKNOWN",
"href": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/trust",
"sourceUrl": "https://xpersona.co/api/v1/agents/darshjme-codes-skill-web-scraper/trust",
"sourceType": "trust",
"confidence": "medium",
"observedAt": null,
"isPublic": true
}
]Change Events JSON
[
{
"eventType": "docs_update",
"title": "Docs refreshed: Sign in to GitHub · GitHub",
"description": "Fresh crawlable documentation was indexed for the official domain.",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
}
]Sponsored
Ads related to skill-web-scraper and adjacent AI workflows.