Claim this agent
Agent DossierGITHUB OPENCLEWSafety 80/100

Xpersona Agent

context-tuning

Systematic tuning loop for any AI system. Use when asked to: (1) tune/optimize prompts, tools, or agent behavior, (2) improve system performance iteratively, (3) set up evaluation criteria for a system, (4) run optimization experiments. Collaboratively defines objectives and scoring with the user, then iterates with git checkpointing. --- name: context-tuning description: > Systematic tuning loop for any AI system. Use when asked to: (1) tune/optimize prompts, tools, or agent behavior, (2) improve system performance iteratively, (3) set up evaluation criteria for a system, (4) run optimization experiments. Collaboratively defines objectives and scoring with the user, then iterates with git checkpointing. --- Context Tuning Skill Systematic, evalua

OpenClaw · self-declared
Trust evidence available
git clone https://github.com/vinceyyy/context-tuning-skill.git

Overall rank

#23

Adoption

No public adoption signal

Trust

Unknown

Freshness

Apr 15, 2026

Freshness

Last checked Apr 15, 2026

Best For

context-tuning is best for be workflows where OpenClaw compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack

Overview

Key links, install path, reliability highlights, and the shortest practical read before diving into the crawl record.

Verifiededitorial-content

Overview

Executive Summary

Systematic tuning loop for any AI system. Use when asked to: (1) tune/optimize prompts, tools, or agent behavior, (2) improve system performance iteratively, (3) set up evaluation criteria for a system, (4) run optimization experiments. Collaboratively defines objectives and scoring with the user, then iterates with git checkpointing. --- name: context-tuning description: > Systematic tuning loop for any AI system. Use when asked to: (1) tune/optimize prompts, tools, or agent behavior, (2) improve system performance iteratively, (3) set up evaluation criteria for a system, (4) run optimization experiments. Collaboratively defines objectives and scoring with the user, then iterates with git checkpointing. --- Context Tuning Skill Systematic, evalua Capability contract not published. No trust telemetry is available yet. Last updated 4/15/2026.

No verified compatibility signals

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Apr 15, 2026

Vendor

Vinceyyy

Artifacts

0

Benchmarks

0

Last release

Unpublished

Install & run

Setup Snapshot

git clone https://github.com/vinceyyy/context-tuning-skill.git
  1. 1

    Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.

  2. 2

    Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence & Timeline

Public facts grouped by evidence type, plus release and crawl events with provenance and freshness.

Verifiededitorial-content

Public facts

Evidence Ledger

Vendor (1)

Vendor

Vinceyyy

profilemedium
Observed Apr 15, 2026Source linkProvenance
Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium
Observed Apr 15, 2026Source linkProvenance
Security (1)

Handshake status

UNKNOWN

trustmedium
Observed unknownSource linkProvenance
Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium
Observed Apr 15, 2026Source linkProvenance

Artifacts & Docs

Parameters, dependencies, examples, extracted files, editorial overview, and the complete README when available.

Self-declaredGITHUB OPENCLEW

Captured outputs

Artifacts Archive

Extracted files

0

Examples

6

Snippets

0

Languages

typescript

Parameters

Executable Examples

text

┌─────────────────────────────────────────────────────────────────┐
│  PHASE 1: OBJECTIVE DISCOVERY                                   │
│  Understand what user wants to optimize → Refine through dialog │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  PHASE 2: SCORING SYSTEM DESIGN                                 │
│  Propose dimensions & rubric → Refine with user feedback        │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  PHASE 3: BASELINE & VALIDATION                                 │
│  Run system once → Score with rubric → Validate with user       │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  PHASE 4: CODEBASE ANALYSIS                                     │
│  Map tunable components → Compare to best practices             │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  PHASE 5: ITERATION LOOP                                        │
│  Evaluate → Identify weakness → Apply ONE fix → Checkpoint      │
└─────────────────────────────────────────────────────────────────┘

text

"So if I understand correctly, you want to optimize [system] to:
- [Primary goal]
- [Secondary goal]
- While avoiding [failure mode]

Is that right? Anything to add or adjust?"

markdown

# Tuning Session

**Started**: {timestamp}
**System**: {description of what's being tuned}
**Status**: Defining objectives

## Objectives

### Primary Goal
{what success looks like}

### Secondary Goals
- {goal 2}
- {goal 3}

### Known Issues
- {current problem 1}
- {current problem 2}

## Scoring System
(to be defined)

## Iteration Log
(to be added)

text

Based on your objectives, I propose evaluating on these dimensions:

1. **[Dimension Name]** (weight: X%)
   - What it measures: [description]
   - Why it matters: [maps to objective X]
   
2. **[Dimension Name]** (weight: X%)
   - What it measures: [description]
   - Why it matters: [maps to objective Y]

Does this capture what matters? Should we add, remove, or adjust anything?

text

For **[Dimension]**, I'd score like this:

| Score | Criteria |
|-------|----------|
| 9-10  | [excellent - specific description] |
| 7-8   | [good - specific description] |
| 4-6   | [needs work - specific description] |
| 1-3   | [poor - specific description] |
| 0     | [failure - specific description] |

Does this match your intuition? Any criteria to adjust?

markdown

## Scoring System

**Threshold**: {N.N}
**Max Iterations**: {N}

### Dimensions

#### {Dimension 1} ({weight}%)
{description}

| Score | Criteria |
|-------|----------|
| 9-10  | ... |
| 7-8   | ... |
| 4-6   | ... |
| 1-3   | ... |

#### {Dimension 2} ({weight}%)
...

Editorial read

Docs & README

Docs source

GITHUB OPENCLEW

Editorial quality

ready

Systematic tuning loop for any AI system. Use when asked to: (1) tune/optimize prompts, tools, or agent behavior, (2) improve system performance iteratively, (3) set up evaluation criteria for a system, (4) run optimization experiments. Collaboratively defines objectives and scoring with the user, then iterates with git checkpointing. --- name: context-tuning description: > Systematic tuning loop for any AI system. Use when asked to: (1) tune/optimize prompts, tools, or agent behavior, (2) improve system performance iteratively, (3) set up evaluation criteria for a system, (4) run optimization experiments. Collaboratively defines objectives and scoring with the user, then iterates with git checkpointing. --- Context Tuning Skill Systematic, evalua

Full README

name: context-tuning description: > Systematic tuning loop for any AI system. Use when asked to: (1) tune/optimize prompts, tools, or agent behavior, (2) improve system performance iteratively, (3) set up evaluation criteria for a system, (4) run optimization experiments. Collaboratively defines objectives and scoring with the user, then iterates with git checkpointing.

Context Tuning Skill

Systematic, evaluation-driven optimization for AI systems. Collaboratively define what "good" means, then iteratively improve until you get there.

Process Overview

┌─────────────────────────────────────────────────────────────────┐
│  PHASE 1: OBJECTIVE DISCOVERY                                   │
│  Understand what user wants to optimize → Refine through dialog │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  PHASE 2: SCORING SYSTEM DESIGN                                 │
│  Propose dimensions & rubric → Refine with user feedback        │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  PHASE 3: BASELINE & VALIDATION                                 │
│  Run system once → Score with rubric → Validate with user       │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  PHASE 4: CODEBASE ANALYSIS                                     │
│  Map tunable components → Compare to best practices             │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  PHASE 5: ITERATION LOOP                                        │
│  Evaluate → Identify weakness → Apply ONE fix → Checkpoint      │
└─────────────────────────────────────────────────────────────────┘

Phase 1: Objective Discovery

Goal: Understand what the user wants to optimize.

1.1 Initial questions

Ask the user (one or two at a time, not all at once):

  1. "What system are you trying to improve?" (agent, prompt, pipeline, etc.)
  2. "What does success look like? What should it do well?"
  3. "What's currently not working or could be better?"
  4. "Do you have examples of good vs. bad outputs?"

1.2 Clarify and refine

Based on answers, reflect back understanding:

"So if I understand correctly, you want to optimize [system] to:
- [Primary goal]
- [Secondary goal]
- While avoiding [failure mode]

Is that right? Anything to add or adjust?"

1.3 Document objective

Once confirmed, create initial session notes at docs/tuning/{date}-session.md:

# Tuning Session

**Started**: {timestamp}
**System**: {description of what's being tuned}
**Status**: Defining objectives

## Objectives

### Primary Goal
{what success looks like}

### Secondary Goals
- {goal 2}
- {goal 3}

### Known Issues
- {current problem 1}
- {current problem 2}

## Scoring System
(to be defined)

## Iteration Log
(to be added)

Phase 2: Scoring System Design

Goal: Create a custom rubric tailored to the user's objectives.

2.1 Propose dimensions

Based on objectives, propose 2-4 evaluation dimensions. Each dimension should:

  • Map to a stated objective or known issue
  • Be observable in system output
  • Be scorable on a 0-10 scale

Example proposal format:

Based on your objectives, I propose evaluating on these dimensions:

1. **[Dimension Name]** (weight: X%)
   - What it measures: [description]
   - Why it matters: [maps to objective X]
   
2. **[Dimension Name]** (weight: X%)
   - What it measures: [description]
   - Why it matters: [maps to objective Y]

Does this capture what matters? Should we add, remove, or adjust anything?

See references/rubric-templates.md for common dimension patterns.

2.2 Define scoring criteria

For each dimension, propose specific scoring criteria:

For **[Dimension]**, I'd score like this:

| Score | Criteria |
|-------|----------|
| 9-10  | [excellent - specific description] |
| 7-8   | [good - specific description] |
| 4-6   | [needs work - specific description] |
| 1-3   | [poor - specific description] |
| 0     | [failure - specific description] |

Does this match your intuition? Any criteria to adjust?

2.3 Set threshold and weights

Confirm with user:

  • Pass threshold (default: 7.0 for "good enough", 8.0 for "high quality")
  • Dimension weights (should sum to 100%)
  • Max iterations (default: 5)

2.4 Document scoring system

Add to session notes:

## Scoring System

**Threshold**: {N.N}
**Max Iterations**: {N}

### Dimensions

#### {Dimension 1} ({weight}%)
{description}

| Score | Criteria |
|-------|----------|
| 9-10  | ... |
| 7-8   | ... |
| 4-6   | ... |
| 1-3   | ... |

#### {Dimension 2} ({weight}%)
...

Phase 3: Baseline & Validation

Goal: Verify the scoring system works and establish baseline.

3.1 Verify git state

Run git status --porcelain. If dirty, ask user to commit or stash first.

3.2 Run system once

Execute the system with a representative input. Capture full output/trace.

3.3 Score with new rubric

Apply the scoring system. Show work:

**Baseline Evaluation**

Input: {what was tested}

**{Dimension 1}**: {score}/10
- Evidence: {specific observation}
- Reasoning: {why this score}

**{Dimension 2}**: {score}/10
- Evidence: {specific observation}
- Reasoning: {why this score}

**Overall**: {weighted score}

3.4 Validate with user

Ask for confirmation:

"Does this scoring feel right? 

- Does a {X}/10 on {Dimension 1} match your intuition?
- Is there anything the rubric missed or misjudged?
- Should we adjust the criteria before proceeding?"

If adjustments needed, return to Phase 2. Otherwise, proceed.

3.5 Commit baseline

git add docs/tuning/{date}-session.md
git commit -m "tune: begin session - baseline {overall_score}"

Phase 4: Codebase Analysis

Goal: Understand what can be tuned and identify opportunities.

4.1 Map tunable components

Explore the codebase to identify:

| Component Type | What to Look For | |----------------|------------------| | System prompts | Main instructions, role definitions | | Tool definitions | Names, descriptions, parameters | | Tool implementations | Return values, error handling | | Orchestration | Agent loops, routing logic, handoffs | | Context management | What's included, summarization, memory |

Document findings:

## Tunable Components

### Prompts
- `path/to/prompt.py`: Main system prompt (~200 lines)
- `path/to/agent.py`: Agent instructions

### Tools
- `tool_name`: {purpose} - description could be clearer
- `other_tool`: {purpose} - parameters ambiguous

### Orchestration
- Single agent / Multi-agent with {pattern}
- Loop exits when: {conditions}

4.2 Compare to best practices

See references/component-checklist.md for what good looks like.

Identify gaps:

## Improvement Opportunities

### High Priority (likely impact on failing dimensions)
- [ ] {Specific issue}: {maps to Dimension X}
- [ ] {Specific issue}: {maps to Dimension Y}

### Medium Priority
- [ ] {Issue}

### Low Priority / Nice to Have
- [ ] {Issue}

4.3 Propose iteration plan

Based on the baseline score and codebase analysis:

**Weakest dimension**: {dimension} at {score}
**Root cause hypothesis**: {what I think is causing it}
**Proposed first fix**: {specific change}

Does this plan make sense? Ready to start iterating?

Phase 5: Iteration Loop

Goal: Systematically improve until threshold met or plateau reached.

5.1 Evaluate

Run system 3x for stability. Score each dimension. Report:

**Iteration {N}**

| Dimension | Score | vs Threshold | Δ from Last |
|-----------|-------|--------------|-------------|
| {Dim 1}   | X.X   | {pass/fail}  | +/-X.X      |
| {Dim 2}   | X.X   | {pass/fail}  | +/-X.X      |
| **Overall** | X.X | {pass/fail}  | +/-X.X      |

5.2 Check convergence

| Condition | Criteria | Action | |-----------|----------|--------| | SUCCESS | All dimensions ≥ threshold | Go to Completion | | PLATEAU | <0.3 improvement over 3 iterations | Go to Completion | | MAX_ITER | Reached limit | Go to Completion | | REGRESSION | Score dropped significantly | Revert and try different fix |

5.3 Identify target and pattern

Find lowest dimension below threshold. Analyze evidence for failure pattern.

See references/failure-patterns.md for pattern catalog.

5.4 Select and apply fix

ONE change per iteration to isolate effects.

See references/fix-techniques.md for technique selection.

Before applying, self-review:

  1. Does this directly address the observed failure?
  2. Could it break something currently passing?
  3. Is this the minimal change that could work?

5.5 Checkpoint

Update session notes with iteration entry:

### Iteration {N} - {timestamp}

**Scores**: {dim1}={X.X}, {dim2}={X.X}
**Target**: {dimension} (at {X.X})
**Pattern**: {what went wrong}
**Evidence**: {specific example}

**Change**:
- File: {path}
- Technique: {from fix-techniques}
```diff
- {old}
+ {new}

Result: {improved/no change/regression}


Commit:
```bash
git add -A
git commit -m "tune(iter-{N}): {description} [{dim}: {before}→{after}]"

5.6 Return to 5.1


Completion

Final summary

## Summary

**Status**: {success/plateau/max_iterations}
**Iterations**: {N}
**Improvement**: {baseline} → {final} (+{delta})

### Score Progression
| Iter | {Dim1} | {Dim2} | Overall |
|------|--------|--------|---------|
| 0    | X.X    | X.X    | X.X     |
| ...  | ...    | ...    | ...     |

### What Worked
- {technique}: {dimension} {before}→{after}

### What Didn't Work
- {technique}: {result}

### Recommendations
- {any remaining improvements to consider}

Final commit:

git commit -m "tune: complete - {status} [overall: {baseline}→{final}]"

Recovery

Regression

git revert HEAD --no-edit

Record: **Result**: REGRESSION - reverted Try different technique.

Resume (--continue)

Read session notes, find last iteration, resume from Phase 5.


Key Principles

  1. Collaborate on objectives - User defines what "good" means
  2. Validate the rubric - Test scoring before iterating
  3. ONE change per iteration - Isolate effects
  4. Evidence-based fixes - Only address observed failures
  5. Checkpoint everything - Git commit each iteration

API & Reliability

Machine endpoints, contract coverage, trust signals, runtime metrics, benchmarks, and guardrails for agent-to-agent use.

MissingGITHUB OPENCLEW

Machine interfaces

Contract & API

Contract coverage

Status

missing

Auth

None

Streaming

No

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.
Invocation examples
curl -s "https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/snapshot"
curl -s "https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/contract"
curl -s "https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/trust"

Operational fit

Reliability & Benchmarks

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.
No benchmark suites or observed failure patterns are available.

Machine Appendix

Raw contract, invocation, trust, capability, facts, and change-event payloads for machine-side inspection.

MissingGITHUB OPENCLEW

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "GITHUB_OPENCLEW",
      "generatedAt": "2026-04-17T04:47:33.639Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    },
    {
      "key": "be",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile capability:be|supported|profile"
}

Facts JSON

[
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Vinceyyy",
    "href": "https://github.com/vinceyyy/context-tuning-skill",
    "sourceUrl": "https://github.com/vinceyyy/context-tuning-skill",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:21:22.124Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:21:22.124Z",
    "isPublic": true
  },
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/vinceyyy-context-tuning-skill/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub · GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]

Sponsored

Ads related to context-tuning and adjacent AI workflows.