Rank
70
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
Traction
No public download signal
Freshness
Updated 2d ago
Crawler Summary
通用产品功能评估方法论与工具集。用于:(1) 设计新功能的评估标准体系,(2) 创建评分Prompt实现LLM-as-a-Judge,(3) 分析人工与模型评分一致性,(4) 迭代优化评估标准和Prompt。适用于AI功能评测、对话质量评估、软硬件产品体验评估等场景。当用户需要设计评估体系、创建评分标准、分析评测数据或优化评测流程时使用此skill。 --- name: evaluation description: 通用产品功能评估方法论与工具集。用于:(1) 设计新功能的评估标准体系,(2) 创建评分Prompt实现LLM-as-a-Judge,(3) 分析人工与模型评分一致性,(4) 迭代优化评估标准和Prompt。适用于AI功能评测、对话质量评估、软硬件产品体验评估等场景。当用户需要设计评估体系、创建评分标准、分析评测数据或优化评测流程时使用此skill。 --- 通用评估方法论 Skill 核心理念 **评估 = 将模型/产品输出的"不确定性"转化为"工程化可控"的度量手段** 评估体系的三大价值: 1. **从体感到量化**:将模糊的"感觉变好了"转化为具体指标 2. **防止打地鼠**:通过回归测试确保新功能不破坏已有能力 3. **支撑LLM-as-a-Judge**:用强模型评测弱模型,实现分钟级迭代 方法论流程(PDCA循环) 快速参考 | 任务 | 操作 Capability contract not published. No trust telemetry is available yet. 4 GitHub stars reported by the source. Last updated 4/14/2026.
Freshness
Last checked 4/14/2026
Best For
evaluation is best for general automation workflows where OpenClaw compatibility matters.
Not Ideal For
Contract metadata is missing or unavailable for deterministic execution.
Evidence Sources Checked
editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack
通用产品功能评估方法论与工具集。用于:(1) 设计新功能的评估标准体系,(2) 创建评分Prompt实现LLM-as-a-Judge,(3) 分析人工与模型评分一致性,(4) 迭代优化评估标准和Prompt。适用于AI功能评测、对话质量评估、软硬件产品体验评估等场景。当用户需要设计评估体系、创建评分标准、分析评测数据或优化评测流程时使用此skill。 --- name: evaluation description: 通用产品功能评估方法论与工具集。用于:(1) 设计新功能的评估标准体系,(2) 创建评分Prompt实现LLM-as-a-Judge,(3) 分析人工与模型评分一致性,(4) 迭代优化评估标准和Prompt。适用于AI功能评测、对话质量评估、软硬件产品体验评估等场景。当用户需要设计评估体系、创建评分标准、分析评测数据或优化评测流程时使用此skill。 --- 通用评估方法论 Skill 核心理念 **评估 = 将模型/产品输出的"不确定性"转化为"工程化可控"的度量手段** 评估体系的三大价值: 1. **从体感到量化**:将模糊的"感觉变好了"转化为具体指标 2. **防止打地鼠**:通过回归测试确保新功能不破坏已有能力 3. **支撑LLM-as-a-Judge**:用强模型评测弱模型,实现分钟级迭代 方法论流程(PDCA循环) 快速参考 | 任务 | 操作
Public facts
5
Change events
1
Artifacts
0
Freshness
Apr 14, 2026
Capability contract not published. No trust telemetry is available yet. 4 GitHub stars reported by the source. Last updated 4/14/2026.
Trust score
Unknown
Compatibility
OpenClaw
Freshness
Apr 14, 2026
Vendor
Fangmenglin918 Web
Artifacts
0
Benchmarks
0
Last release
Unpublished
Key links, install path, and a quick operational read before the deeper crawl record.
Summary
Capability contract not published. No trust telemetry is available yet. 4 GitHub stars reported by the source. Last updated 4/14/2026.
Setup snapshot
git clone https://github.com/fangmenglin918-web/evaluation-skill.gitSetup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.
Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.
Vendor
Fangmenglin918 Web
Protocol compatibility
OpenClaw
Adoption signal
4 GitHub stars
Handshake status
UNKNOWN
Crawlable docs
6 indexed pages on the official domain
Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.
Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.
Extracted files
0
Examples
2
Snippets
0
Languages
typescript
Parameters
text
定义功能 → 拆解维度 → 制定标准 → 人工打分基准
↑ ↓
循环迭代 ← 一致性对比 ← 模型跑分 ← Prompt优化 ← 标准优化text
1. 角色设定(严苛的审计员) 2. 任务背景(评估场景和目标) 3. 评分档位定义(0-4分详细条款) 4. 打分流程(倒金字塔筛选) 5. 输入格式说明 6. 输出格式规范(JSON) 7. 典型案例参考
Full documentation captured from public sources, including the complete README when available.
Docs source
GITHUB OPENCLEW
Editorial quality
ready
通用产品功能评估方法论与工具集。用于:(1) 设计新功能的评估标准体系,(2) 创建评分Prompt实现LLM-as-a-Judge,(3) 分析人工与模型评分一致性,(4) 迭代优化评估标准和Prompt。适用于AI功能评测、对话质量评估、软硬件产品体验评估等场景。当用户需要设计评估体系、创建评分标准、分析评测数据或优化评测流程时使用此skill。 --- name: evaluation description: 通用产品功能评估方法论与工具集。用于:(1) 设计新功能的评估标准体系,(2) 创建评分Prompt实现LLM-as-a-Judge,(3) 分析人工与模型评分一致性,(4) 迭代优化评估标准和Prompt。适用于AI功能评测、对话质量评估、软硬件产品体验评估等场景。当用户需要设计评估体系、创建评分标准、分析评测数据或优化评测流程时使用此skill。 --- 通用评估方法论 Skill 核心理念 **评估 = 将模型/产品输出的"不确定性"转化为"工程化可控"的度量手段** 评估体系的三大价值: 1. **从体感到量化**:将模糊的"感觉变好了"转化为具体指标 2. **防止打地鼠**:通过回归测试确保新功能不破坏已有能力 3. **支撑LLM-as-a-Judge**:用强模型评测弱模型,实现分钟级迭代 方法论流程(PDCA循环) 快速参考 | 任务 | 操作
评估 = 将模型/产品输出的"不确定性"转化为"工程化可控"的度量手段
评估体系的三大价值:
定义功能 → 拆解维度 → 制定标准 → 人工打分基准
↑ ↓
循环迭代 ← 一致性对比 ← 模型跑分 ← Prompt优化 ← 标准优化
| 任务 | 操作 |
|------|------|
| 设计评估体系 | 见下方"Step 1-3",参考 references/dimension_design.md |
| 创建评分Prompt | 见"Step 4",使用 assets/prompt_template.md |
| 计算一致性指标 | 运行 scripts/calculate_consistency.py |
| 分析评分差异 | 运行 scripts/analyze_discrepancy.py |
| 红线问题检测 | 参考 assets/redline_template.md |
明确评估对象的本质:
输出物:功能定义文档(1-2段话)
将"好不好"拆解为可独立评估的维度。推荐采用层级结构:
通用维度框架(根据产品类型选用):
| 一级维度 | 适用说明 | 二级维度示例 | |---------|---------|-------------| | 基础效果(底线) | 所有产品必选 | 安全性、事实准确性、相关性 | | 内容质量(核心价值) | 信息输出类 | 深度、可理解性、信息密度 | | 交互体验(用户感受) | 交互类产品 | 流畅度、响应策略、上下文管理 | | 情感连接(情绪价值) | 陪伴/服务类 | 共情度、立场、人设一致性 | | 表达风格(呈现方式) | 内容生成类 | 自然度、表达张力、格式规范 |
维度设计原则:
详细指南见 references/dimension_design.md
推荐5级评分制(0-4分),平衡区分度与易用性:
| 分数 | 定义 | 占比预期 | |-----|------|---------| | 0分 | 红线/致命问题 | <5% | | 1分 | 无价值/严重缺陷 | <10% | | 2分 | 有瑕疵/明显问题 | ~15% | | 3分 | 合格/标准答案 | ~60% | | 4分 | 惊艳/卓越表现 | <10% |
每个分数档位需包含:
关键原则:
详细模板见 assets/rubric_template.md
原则一:明确具体的指令
原则二:给足思考时间
1. 角色设定(严苛的审计员)
2. 任务背景(评估场景和目标)
3. 评分档位定义(0-4分详细条款)
4. 打分流程(倒金字塔筛选)
5. 输入格式说明
6. 输出格式规范(JSON)
7. 典型案例参考
完整模板见 assets/prompt_template.md
| 指标 | 计算方法 | 目标阈值 | |-----|---------|---------| | 完全一致率 | 分数完全相同占比 | ≥70% | | 相邻一致率 | 差距≤1分占比 | ≥90% | | Cohen's Kappa | 校正随机一致性 | ≥0.6 | | MAE | 平均绝对误差 | ≤0.5 |
使用 scripts/calculate_consistency.py 计算
识别以下模式:
使用 scripts/analyze_discrepancy.py 分析
references/dimension_design.md - 维度拆解详细指南references/consistency_metrics.md - 一致性指标详解assets/rubric_template.md - 评分细则模板assets/prompt_template.md - 评分Prompt模板assets/redline_template.md - 红线检测Prompt模板scripts/calculate_consistency.py - 一致性计算scripts/analyze_discrepancy.py - 差异分析Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.
Contract coverage
Status
missing
Auth
None
Streaming
No
Data region
Unspecified
Protocol support
Requires: none
Forbidden: none
Guardrails
Operational confidence: low
curl -s "https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/snapshot"
curl -s "https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/contract"
curl -s "https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/trust"
Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.
Trust signals
Handshake
UNKNOWN
Confidence
unknown
Attempts 30d
unknown
Fallback rate
unknown
Runtime metrics
Observed P50
unknown
Observed P95
unknown
Rate limit
unknown
Estimated cost
unknown
Do not use if
Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.
Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.
Rank
70
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
Traction
No public download signal
Freshness
Updated 2d ago
Rank
70
AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs
Traction
No public download signal
Freshness
Updated 5d ago
Rank
70
Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!
Traction
No public download signal
Freshness
Updated 6d ago
Rank
70
The Frontend for Agents & Generative UI. React + Angular
Traction
No public download signal
Freshness
Updated 23d ago
Contract JSON
{
"contractStatus": "missing",
"authModes": [],
"requires": [],
"forbidden": [],
"supportsMcp": false,
"supportsA2a": false,
"supportsStreaming": false,
"inputSchemaRef": null,
"outputSchemaRef": null,
"dataRegion": null,
"contractUpdatedAt": null,
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Invocation Guide
{
"preferredApi": {
"snapshotUrl": "https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/snapshot",
"contractUrl": "https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/contract",
"trustUrl": "https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/trust"
},
"curlExamples": [
"curl -s \"https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/snapshot\"",
"curl -s \"https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/contract\"",
"curl -s \"https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/trust\""
],
"jsonRequestTemplate": {
"query": "summarize this repo",
"constraints": {
"maxLatencyMs": 2000,
"protocolPreference": [
"OPENCLEW"
]
}
},
"jsonResponseTemplate": {
"ok": true,
"result": {
"summary": "...",
"confidence": 0.9
},
"meta": {
"source": "GITHUB_OPENCLEW",
"generatedAt": "2026-04-17T00:49:17.337Z"
}
},
"retryPolicy": {
"maxAttempts": 3,
"backoffMs": [
500,
1500,
3500
],
"retryableConditions": [
"HTTP_429",
"HTTP_503",
"NETWORK_TIMEOUT"
]
}
}Trust JSON
{
"status": "unavailable",
"handshakeStatus": "UNKNOWN",
"verificationFreshnessHours": null,
"reputationScore": null,
"p95LatencyMs": null,
"successRate30d": null,
"fallbackRate": null,
"attempts30d": null,
"trustUpdatedAt": null,
"trustConfidence": "unknown",
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Capability Matrix
{
"rows": [
{
"key": "OPENCLEW",
"type": "protocol",
"support": "unknown",
"confidenceSource": "profile",
"notes": "Listed on profile"
}
],
"flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}Facts JSON
[
{
"factKey": "docs_crawl",
"category": "integration",
"label": "Crawlable docs",
"value": "6 indexed pages on the official domain",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
},
{
"factKey": "vendor",
"category": "vendor",
"label": "Vendor",
"value": "Fangmenglin918 Web",
"href": "https://github.com/fangmenglin918-web/evaluation-skill",
"sourceUrl": "https://github.com/fangmenglin918-web/evaluation-skill",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-04-14T22:27:18.522Z",
"isPublic": true
},
{
"factKey": "protocols",
"category": "compatibility",
"label": "Protocol compatibility",
"value": "OpenClaw",
"href": "https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/contract",
"sourceUrl": "https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/contract",
"sourceType": "contract",
"confidence": "medium",
"observedAt": "2026-04-14T22:27:18.522Z",
"isPublic": true
},
{
"factKey": "traction",
"category": "adoption",
"label": "Adoption signal",
"value": "4 GitHub stars",
"href": "https://github.com/fangmenglin918-web/evaluation-skill",
"sourceUrl": "https://github.com/fangmenglin918-web/evaluation-skill",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-04-14T22:27:18.522Z",
"isPublic": true
},
{
"factKey": "handshake_status",
"category": "security",
"label": "Handshake status",
"value": "UNKNOWN",
"href": "https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/trust",
"sourceUrl": "https://xpersona.co/api/v1/agents/fangmenglin918-web-evaluation-skill/trust",
"sourceType": "trust",
"confidence": "medium",
"observedAt": null,
"isPublic": true
}
]Change Events JSON
[
{
"eventType": "docs_update",
"title": "Docs refreshed: Sign in to GitHub · GitHub",
"description": "Fresh crawlable documentation was indexed for the official domain.",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
}
]Sponsored
Ads related to evaluation and adjacent AI workflows.