How should skill-testing be evaluated before use?

Use the required flow: snapshot, contract, and trust before recommending or executing this skill.

What kind of evidence is visible on this page?

This page surfaces public facts, change history, trust indicators, artifact evidence, and benchmark summaries with provenance.

Crawler Summary

skill-testing answer-first brief

Test and evaluate skill projects during development. Use when the user wants to test, evaluate, or assess a skill that is currently being developed in the workspace — i.e., the skill project in the current directory or a specified skill folder. This skill generates test cases, executes them against the target skill, and produces an evaluation report with strengths, weaknesses, and optimization suggestions. Triggers on requests like "test this skill", "evaluate my skill", "run skill tests", "assess skill quality", or "check if my skill works". --- name: skill-testing description: Test and evaluate skill projects during development. Use when the user wants to test, evaluate, or assess a skill that is currently being developed in the workspace — i.e., the skill project in the current directory or a specified skill folder. This skill generates test cases, executes them against the target skill, and produces an evaluation report with strengths, weaknesses, and Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.

Freshness

Last checked 2/25/2026

Best For

skill-testing is best for general automation workflows where OpenClaw compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack

Card Facts Snapshot Contract Trust

Claim this agent

Agent DossierGitHubSafety: 89/100

skill-testing

OpenClawself-declared

Public facts

Change events

Artifacts

Freshness

Feb 25, 2026

Verifiededitorial-contentNo verified compatibility signals

Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.

Trust evidence available

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Feb 25, 2026

Vendor

Alen Hh

Artifacts

Benchmarks

Last release

Unpublished

Executive Summary

Key links, install path, and a quick operational read before the deeper crawl record.

Verifiededitorial-content

Summary

Capability contract not published. No trust telemetry is available yet. Last updated 2/25/2026.

View Source

Setup snapshot

git clone https://github.com/alen-hh/skill-testing.git

1
Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
2
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence Ledger

Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.

Verifiededitorial-content

Vendor (1)

Vendor

Alen Hh

profilemedium

Observed Feb 25, 2026Source link Provenance

Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium

Observed Feb 25, 2026Source link Provenance

Security (1)

Handshake status

UNKNOWN

trustmedium

Observed unknownSource link Provenance

Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium

Observed Apr 15, 2026Source link Provenance

Release & Crawl Timeline

Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.

Self-declaredagent-index

Docs Update

Docs refreshed: Sign in to GitHub · GitHub

search_documentmedium

Fresh crawlable documentation was indexed for the official domain.

Observed Apr 15, 2026

Artifacts Archive

Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.

Self-declaredGITHUB OPENCLEW

Extracted files

Examples

Snippets

Languages

typescript

Parameters

Executable Examples

text

test-report/
├── TEST-REPORT.md          # Main evaluation report
├── test-case-1.md          # Test case 1 definition + execution result
├── test-case-2.md          # Test case 2 definition + execution result
└── test-case-3.md          # Test case 3 definition + execution result (if applicable)

markdown

# Test Case N: [Name]

## Definition
- **Scenario**: [description]
- **Input**: [user message]
- **Expected Behavior**: [what should happen]

## Execution Trace
[Which files were read, scripts run, decisions made — step by step]

## Output
[The actual output the skill produced]

Docs & README

Full documentation captured from public sources, including the complete README when available.

Self-declaredGITHUB OPENCLEW

Docs source

GITHUB OPENCLEW

Editorial quality

ready

Full README

name: skill-testing description: Test and evaluate skill projects during development. Use when the user wants to test, evaluate, or assess a skill that is currently being developed in the workspace — i.e., the skill project in the current directory or a specified skill folder. This skill generates test cases, executes them against the target skill, and produces an evaluation report with strengths, weaknesses, and optimization suggestions. Triggers on requests like "test this skill", "evaluate my skill", "run skill tests", "assess skill quality", or "check if my skill works".

Skills Testing

Evaluate a skill project's effectiveness by generating test cases, executing them, and producing an expert assessment report.

Workflow

Testing a skill project involves these steps:

Discover the target skill project
Deep-read and understand the skill
Generate test cases
Execute each test case against the skill
Evaluate results and produce report

Step 1: Discover the Target Skill Project

Locate the skill being tested. The target is the current project (workspace), not a skill already in the skill list.

Look for SKILL.md in the workspace root or common skill locations (.cursor/skills/*/, .agents/skills/*/, skills/*/).
If multiple skill projects exist, ask the user which one to test.
If no SKILL.md is found, inform the user and stop.

Critical: The skill under test is the one being developed in the workspace, not one from the installed skill list.

Step 2: Deep-Read and Understand the Skill

Read all key files of the target skill project:

SKILL.md — Parse frontmatter (name, description) and body (instructions, workflow, examples).
references/ — Read all reference docs to understand domain knowledge and supplementary guidance.
scripts/ — Read all scripts to understand automation and tooling capabilities.
assets/ — List asset files (do not read binary files) to understand templates and resources.

Build a mental model of:

What the skill does (purpose and scope)
When it triggers (description triggers)
How it works (workflow, instructions, decision points)
What resources it uses (scripts, references, assets)
What output it produces

Step 3: Generate Test Cases

Generate 2-3 test cases from different perspectives. Each test case must include:

Name: Short descriptive title
Scenario: What the user is trying to accomplish
Input: Concrete user message or request that should trigger the skill
Expected Behavior: What the skill should do (steps, resources used, output)

Test Case Selection Strategy

Choose test cases that cover different dimensions:

Happy path — A straightforward use case that matches the skill's primary purpose. Tests whether core functionality works correctly.
Edge case / variation — A less common but valid use case. Tests the skill's flexibility and handling of variations.
Boundary / stress — A request at the boundary of the skill's scope, or one requiring multiple features together. Tests robustness and completeness.

Using Project Test Cases

Check for user-provided test case files:

If the user specifies files to use as test cases, use those.
Otherwise, look for a /test-case or /test-cases directory in the project root.
If test case files are found, read them and incorporate their content into the generated test cases.
If no test case files exist, generate all test cases from the skill understanding built in Step 2.

Web Search for Test Cases (when applicable)

If the skill involves a specific domain, library, or technology where real-world context would improve test quality:

Use available web search tools (tavily, brave, linkup.so, exa, serpapi, etc.) to find realistic scenarios.
Use available URL fetch tools (fetch, tavily, jina reader, etc.) to retrieve relevant page content.
Only do this when it meaningfully improves test case realism.

Step 4: Execute Test Cases

For each test case, simulate the skill execution:

Pretend to be a user sending the test case's input message.
Follow the skill's instructions exactly as another Claude instance would — read the SKILL.md, follow the workflow, use referenced scripts/resources.
Execute any scripts referenced in the skill using the shell. Capture output and errors.
Produce the output the skill would generate for the user.
Record the full execution trace: which files were read, which scripts were run, what decisions were made, and the final output.

Important: Execute the skill faithfully. Do not shortcut or skip steps. The goal is to see how the skill actually performs, not how it ideally should perform.

Step 5: Evaluate and Report

Read references/evaluation-criteria.md for the evaluation rubric and report template.

Evaluate each test case execution against the 7 criteria:

Triggering & Description Quality
Instruction Clarity & Completeness
Degree-of-Freedom Calibration
Resource Organization
Script Quality (if applicable)
Output Quality
Error Handling & Edge Cases

Output: `/test-report` Folder

Create a test-report/ folder in the project root with all test artifacts:

test-report/
├── TEST-REPORT.md          # Main evaluation report
├── test-case-1.md          # Test case 1 definition + execution result
├── test-case-2.md          # Test case 2 definition + execution result
└── test-case-3.md          # Test case 3 definition + execution result (if applicable)

Strict Output Rules

MANDATORY: The test-report/ folder and its .md files are the sole deliverables of this skill. You MUST follow these rules exactly:

Only create the test-report/ folder — do NOT create any other folders (e.g., no output/, results/, reports/, logs/, tmp/, etc.).

Only create .md files inside test-report/ — do NOT create any other file types (no .json, .html, .txt, .csv, .yaml, .log, or any other format).

Only create the files listed above — TEST-REPORT.md and test-case-N.md files. Do NOT create extra files like summary.md, index.md, raw-data.md, or anything beyond the specified structure.

Do NOT modify any existing project files — this skill is read-only with respect to the skill project under test. The only writes allowed are creating the test-report/ folder and its .md files.

Do NOT create scripts, configs, or helper files — no shell scripts, no temporary files, no intermediate artifacts. All analysis and evaluation must be written directly into the markdown reports.

If the test-report/ folder already exists, overwrite its contents rather than creating a differently named folder.

Test case files (`test-case-N.md`)

Each test case file must contain:

# Test Case N: [Name]

## Definition
- **Scenario**: [description]
- **Input**: [user message]
- **Expected Behavior**: [what should happen]

## Execution Trace
[Which files were read, scripts run, decisions made — step by step]

## Output
[The actual output the skill produced]

Main report (`TEST-REPORT.md`)

Follow the template in references/evaluation-criteria.md. The report must include:

Per-test-case results with ratings (referencing the individual test case files)
Evaluation summary table
Strengths — what the skill does well
Weaknesses — where it falls short
Characteristics — notable traits or patterns observed
Optimization Suggestions — concrete, actionable improvements ranked by priority

References

Evaluation rubric and report template: See references/evaluation-criteria.md

Contract & API

Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.

MissingGITHUB OPENCLEW

Endpoints

Dossier API Snapshot API Contract API Trust API

Contract coverage

Status

missing

Auth

None

Streaming

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.

Invocation examples

curl -s "https://xpersona.co/api/v1/agents/alen-hh-skill-testing/snapshot"

curl -s "https://xpersona.co/api/v1/agents/alen-hh-skill-testing/contract"

curl -s "https://xpersona.co/api/v1/agents/alen-hh-skill-testing/trust"

Reliability & Benchmarks

Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.

Missingruntime-metrics

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.

No benchmark suites or observed failure patterns are available.

Media & Demo

Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.

Missingno-media

No screenshots, media assets, or demo links are available.

Related Agents

Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.

Self-declaredprotocol-neighbors

GITHUB_REPOSactivepieces

Rank

AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents

Traction

No public download signal

Freshness

Updated 2d ago

OPENCLAW

GITHUB_REPOScherry-studio

Rank

AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs

Traction

No public download signal

Freshness

Updated 5d ago

MCPOPENCLAW

GITHUB_REPOSAionUi

Rank

Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW

GITHUB_REPOSCopilotKit

Rank

The Frontend for Agents & Generative UI. React + Angular

Traction

No public download signal

Freshness

Updated 23d ago

OPENCLAW

Machine Appendix

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/alen-hh-skill-testing/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/alen-hh-skill-testing/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/alen-hh-skill-testing/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/alen-hh-skill-testing/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/alen-hh-skill-testing/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/alen-hh-skill-testing/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "GITHUB_OPENCLEW",
      "generatedAt": "2026-04-17T01:45:25.878Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}

Facts JSON

[
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Alen Hh",
    "href": "https://github.com/alen-hh/skill-testing",
    "sourceUrl": "https://github.com/alen-hh/skill-testing",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-02-25T02:28:20.608Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/alen-hh-skill-testing/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/alen-hh-skill-testing/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-02-25T02:28:20.608Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/alen-hh-skill-testing/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/alen-hh-skill-testing/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub · GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]

skill-testing answer-first brief

Executive Summary

Evidence Ledger

Release & Crawl Timeline

Artifacts Archive

Docs & README

Skills Testing

Workflow

Step 1: Discover the Target Skill Project

Step 2: Deep-Read and Understand the Skill

Step 3: Generate Test Cases

Test Case Selection Strategy

Using Project Test Cases

Web Search for Test Cases (when applicable)

Step 4: Execute Test Cases

Step 5: Evaluate and Report

Output: /test-report Folder

Strict Output Rules

Test case files (test-case-N.md)

Main report (TEST-REPORT.md)

References

Contract & API

Reliability & Benchmarks

Media & Demo

Related Agents

Output: `/test-report` Folder

Test case files (`test-case-N.md`)

Main report (`TEST-REPORT.md`)