Rank
70
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
Traction
No public download signal
Freshness
Updated 2d ago
Xpersona Agent
Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations.
clawhub skill install kn77yjs5esft2kgsd6dpz9c92n80dgsy:prompt-performance-testerOverall rank
#62
Adoption
1.9K downloads
Trust
Unknown
Freshness
Feb 28, 2026
Freshness
Last checked Feb 28, 2026
Best For
Prompt Performance Tester - UnisAI is best for general automation workflows where OpenClaw compatibility matters.
Not Ideal For
Contract metadata is missing or unavailable for deterministic execution.
Evidence Sources Checked
CLAWHUB, CLAWHUB, runtime-metrics, public facts pack
Key links, install path, reliability highlights, and the shortest practical read before diving into the crawl record.
Overview
Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations. Capability contract not published. No trust telemetry is available yet. 1.9K downloads reported by the source. Last updated 4/15/2026.
Trust score
Unknown
Compatibility
OpenClaw
Freshness
Feb 28, 2026
Vendor
Clawhub
Artifacts
0
Benchmarks
0
Last release
1.1.9
Install & run
clawhub skill install kn77yjs5esft2kgsd6dpz9c92n80dgsy:prompt-performance-testerInstall using `clawhub skill install kn77yjs5esft2kgsd6dpz9c92n80dgsy:prompt-performance-tester` in an isolated environment before connecting it to live workloads.
No published capability contract is available yet, so validate auth and request/response behavior manually.
Review the upstream CLAWHUB listing at https://clawhub.ai/vedantsingh60/prompt-performance-tester before using production credentials.
Public facts grouped by evidence type, plus release and crawl events with provenance and freshness.
Public facts
Vendor
Clawhub
Protocol compatibility
OpenClaw
Latest release
1.1.9
Adoption signal
1.9K downloads
Handshake status
UNKNOWN
Parameters, dependencies, examples, extracted files, editorial overview, and the complete README when available.
Captured outputs
Extracted files
4
Examples
6
Snippets
0
Languages
Unknown
text
PROMPT: "Write a professional customer service response about a delayed shipment" ┌─────────────────────────────────────────────────────────────────┐ │ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE │ ├─────────────────────────────────────────────────────────────────┤ │ Latency: 523ms │ │ Cost: $0.000025 │ │ Quality: 65/100 │ │ Tokens: 28 in / 87 out │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK │ ├─────────────────────────────────────────────────────────────────┤ │ Latency: 710ms │ │ Cost: $0.000048 │ │ Quality: 70/100 │ │ Tokens: 28 in / 92 out │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER │ ├─────────────────────────────────────────────────────────────────┤ │ Latency: 891ms │ │ Cost: $0.000145 │ │ Quality: 78/100 │ │ Tokens: 28 in / 102 out │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY │ ├─────────────────────────────────────────────────────────────────┤ │ Latency: 645ms │ │ Cost: $0
bash
# Anthropic (Claude models) export ANTHROPIC_API_KEY="sk-ant-..." # OpenAI (GPT models) export OPENAI_API_KEY="sk-..." # Google (Gemini models) export GOOGLE_API_KEY="AI..." # DeepSeek export DEEPSEEK_API_KEY="..." # xAI (Grok models) export XAI_API_KEY="..." # MiniMax export MINIMAX_API_KEY="..." # Alibaba (Qwen models) export DASHSCOPE_API_KEY="..." # OpenRouter (Meta Llama models) export OPENROUTER_API_KEY="..." # Mistral export MISTRAL_API_KEY="..."
bash
# Install only what you need pip install anthropic # Claude pip install openai # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama pip install google-generativeai # Gemini pip install mistralai # Mistral # Or install everything pip install anthropic openai google-generativeai mistralai
python
import os
from prompt_performance_tester import PromptPerformanceTester
tester = PromptPerformanceTester() # reads API keys from environment
results = tester.test_prompt(
prompt_text="Write a professional email apologizing for a delayed shipment",
models=[
"claude-haiku-4-5-20251001",
"gpt-5.2",
"gemini-2.5-flash",
"deepseek-chat",
],
num_runs=3,
max_tokens=500
)
print(tester.format_results(results))
print(f"🏆 Best quality: {results.best_model}")
print(f"💰 Cheapest: {results.cheapest_model}")
print(f"⚡ Fastest: {results.fastest_model}")bash
# Test across multiple models prompt-tester test "Your prompt here" \ --models claude-haiku-4-5-20251001 gpt-5.2 gemini-2.5-flash deepseek-chat \ --runs 3 # Export results prompt-tester test "Your prompt here" --export results.json
text
PROMPT: "Write a professional customer service response about a delayed shipment" ┌─────────────────────────────────────────────────────────────────┐ │ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE │ ├─────────────────────────────────────────────────────────────────┤ │ Latency: 523ms │ │ Cost: $0.000025 │ │ Quality: 65/100 │ │ Tokens: 28 in / 87 out │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK │ ├─────────────────────────────────────────────────────────────────┤ │ Latency: 710ms │ │ Cost: $0.000048 │ │ Quality: 70/100 │ │ Tokens: 28 in / 92 out │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER │ ├─────────────────────────────────────────────────────────────────┤ │ Latency: 891ms │ │ Cost: $0.000145 │ │ Quality: 78/100 │ │ Tokens: 28 in / 102 out │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY │ ├─────────────────────────────────────────────────────────────────┤ │ Latency: 645ms │ │ Cost: $0
SKILL.md
# Prompt Performance Tester **Model-agnostic prompt benchmarking across 9 providers.** Pass any model ID — provider auto-detected. Compare latency, cost, quality, and consistency across Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, and Mistral. --- ## 🚀 Why This Skill? ### Problem Statement Comparing LLM models across providers requires manual testing: - No systematic way to measure performance across models - Cost differences are significant but not easily comparable - Quality varies by use case and provider - Manual API testing is time-consuming and error-prone ### The Solution Test prompts across any model from any supported provider simultaneously. Get performance metrics and recommendations based on latency, cost, and quality. ### Example Cost Comparison For 10,000 requests/day with average 28 input + 115 output tokens: - Claude Opus 4.6: ~$30.15/day ($903/month) - Gemini 2.5 Flash-Lite: ~$0.05/day ($1.50/month) - DeepSeek Chat: ~$0.14/day ($4.20/month) - Monthly cost difference (Opus vs Flash-Lite): $901.50 --- ## ✨ What You Get ### Model-Agnostic Multi-Provider Testing Pass any model ID — provider is auto-detected from the model name prefix. No hardcoded list; new models work without code changes. | Provider | Example Models | Prefix | Required Key | |----------|---------------|--------|--------------| | **Anthropic** | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 | `claude-` | ANTHROPIC_API_KEY | | **OpenAI** | gpt-5.2-pro, gpt-5.2, gpt-5.1 | `gpt-`, `o1`, `o3` | OPENAI_API_KEY | | **Google** | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite | `gemini-` | GOOGLE_API_KEY | | **Mistral** | mistral-large-latest, mistral-small-latest | `mistral-`, `mixtral-` | MISTRAL_API_KEY | | **DeepSeek** | deepseek-chat, deepseek-reasoner | `deepseek-` | DEEPSEEK_API_KEY | | **xAI** | grok-4-1-fast, grok-3-beta | `grok-` | XAI_API_KEY | | **MiniMax** | MiniMax-M2.1 | `MiniMax`, `minimax` | MINIMAX_API_KEY | | **Qwen** | qwen3.5-plus, qwen3-max-instruct | `qwen` | DASHSCOPE_API_KEY | | **Meta Llama** | meta-llama/llama-4-maverick, meta-llama/llama-3.3-70b-instruct | `meta-llama/`, `llama-` | OPENROUTER_API_KEY | ### Known Pricing (per 1M tokens) | Model | Input | Output | |-------|-------|--------| | claude-opus-4-6 | $15.00 | $75.00 | | claude-sonnet-4-6 | $3.00 | $15.00 | | claude-haiku-4-5-20251001 | $1.00 | $5.00 | | gpt-5.2-pro | $21.00 | $168.00 | | gpt-5.2 | $1.75 | $14.00 | | gpt-5.1 | $2.00 | $8.00 | | gemini-2.5-pro | $1.25 | $10.00 | | gemini-2.5-flash | $0.30 | $2.50 | | gemini-2.5-flash-lite | $0.10 | $0.40 | | mistral-large-latest | $2.00 | $6.00 | | mistral-small-latest | $0.10 | $0.30 | | deepseek-chat | $0.27 | $1.10 | | deepseek-reasoner | $0.55 | $2.19 | | grok-4-1-fast | $5.00 | $25.00 | | grok-3-beta | $3.00 | $15.00 | | MiniMax-M2.1 | $0.40 | $1.60 | | qwen3.5-plus | $0.57 | $2.29 | | qwen3-max-instruct | $1.60 | $6.40 | | meta-llama/llama-4-maverick | $0.20 | $0.60 | | meta-llama/l
_meta.json
{
"ownerId": "kn77yjs5esft2kgsd6dpz9c92n80dgsy",
"slug": "prompt-performance-tester",
"version": "1.1.9",
"publishedAt": 1772213259522
}LICENSE.md
# UniAI Skills - Proprietary License
**Version 1.0 | Effective Date: February 2, 2024**
## 1. GRANT OF LICENSE
UniAI ("Licensor") grants you ("Licensee") a limited, non-exclusive, non-transferable, revocable license to use the ClawhHub Skills ("Software") solely in accordance with the terms of this license agreement.
## 2. LICENSE RESTRICTIONS
You may NOT:
- Reverse engineer, decompile, or disassemble the Software
- Modify, alter, or create derivative works of the Software
- Remove, obscure, or alter any proprietary notices or labels on the Software
- Share, distribute, or sublicense the Software to any third party
- Use the Software for commercial purposes without a commercial license
- Access or use the Software beyond the scope of your subscription tier
- Attempt to circumvent licensing controls or API rate limits
## 3. INTELLECTUAL PROPERTY RIGHTS
All intellectual property rights in and to the Software are retained by Licensor. This includes:
- Source code and object code
- Algorithms and methodologies
- Performance optimization techniques
- Quality scoring mechanisms
- Proprietary data structures
- Trade secrets and confidential information
## 4. PERMITTED USES
You may only:
- Use the Software as provided through the ClawhHub platform
- Access features available in your subscription tier
- Create test results and reports for internal use
- Share results with your team (if on a team plan)
- Provide feedback to improve the Software
## 5. SUBSCRIPTION TIERS
### Starter (Free)
- 5 tests per month
- 2 models per test
- Basic features
- Personal use only
### Professional ($29/month)
- Unlimited tests
- All models supported
- Advanced analytics
- API access
- Commercial use permitted
### Enterprise ($99/month)
- Team collaboration
- White-label option
- Custom integrations
- Dedicated support
- SLA guarantees
## 6. API KEY AND CREDENTIALS
- You are responsible for keeping your API keys confidential
- Do not share your license key with others
- One license per person/organization
- License keys are non-transferable
- Unauthorized sharing may result in account termination
## 7. DATA PRIVACY
- We do not retain your test data by default
- Free tier: 30-day retention
- Paid tiers: 90-day retention
- You can request data deletion anytime
- See Privacy Policy for full details
## 8. WARRANTY DISCLAIMER
THE SOFTWARE IS PROVIDED "AS-IS" WITHOUT ANY WARRANTIES. LICENSOR DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO:
- Merchantability
- Fitness for a particular purpose
- Non-infringement
- Accuracy of results
## 9. LIMITATION OF LIABILITY
IN NO EVENT SHALL LICENSOR BE LIABLE FOR:
- Any indirect, incidental, special, or consequential damages
- Loss of data, revenue, or profits
- Business interruption
- Even if advised of the possibility of such damages
## 10. TERMINATION
Licensor may terminate your license if you:
- Violate any terms of this agreement
- Fail to pay subscription fees
- Attempt to reverse enginemanifest.yaml
name: "Prompt Performance Tester"
id: "prompt-performance-tester"
version: "1.1.8"
description: "Model-agnostic prompt benchmarking across 9 providers. Pass any model ID from Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, Mistral — provider auto-detected. Measures latency, cost, quality, and consistency."
homepage: "https://unisai.vercel.app"
repository: "https://github.com/vedantsingh60/prompt-performance-tester"
source: "included"
intellectual_property:
license: "free-to-use"
license_file: "LICENSE.md"
copyright: "© 2026 UnisAI. All rights reserved."
distribution: "via-clawhub-only"
source_code_access: "included"
modification: "personal-use-only"
reverse_engineering: "allowed-for-security-audit"
author:
company: "UnisAI"
contact: "hello@unisai.vercel.app"
website: "https://unisai.vercel.app"
category: "ai-testing"
tags:
- "prompt-testing"
- "performance-analysis"
- "cost-optimization"
- "multi-llm"
- "quality-assurance"
- "benchmarking"
- "llm-comparison"
- "ai-testing"
pricing:
model: "free"
runtime: "local"
execution: "python"
required_env_vars:
- "ANTHROPIC_API_KEY" # Required if testing Claude models
- "OPENAI_API_KEY" # Required if testing GPT models
- "GOOGLE_API_KEY" # Required if testing Gemini models
- "MISTRAL_API_KEY" # Required if testing Mistral models
- "DEEPSEEK_API_KEY" # Required if testing DeepSeek models
- "XAI_API_KEY" # Required if testing Grok/xAI models
- "MINIMAX_API_KEY" # Required if testing MiniMax models
- "DASHSCOPE_API_KEY" # Required if testing Qwen/Alibaba models
- "OPENROUTER_API_KEY" # Required if testing Llama/OpenRouter models
primary_credential: "At least ONE provider API key is required per provider you want to test"
dependencies:
python: ">=3.9"
packages:
- "anthropic>=0.40.0"
- "openai>=1.60.0"
- "google-generativeai>=0.8.0"
- "mistralai>=1.3.0"
install_all: "pip install anthropic openai google-generativeai mistralai"
install_selective: |
pip install anthropic # Claude
pip install openai # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama (OpenAI-compat)
pip install google-generativeai # Gemini
pip install mistralai # Mistral
note: "Install only the SDKs for the providers you plan to test. DeepSeek, xAI, MiniMax, Qwen, and Llama all use the openai package with a custom base URL."
requirements_file: "requirements.txt"
security:
data_retention: "0 days"
data_flow: "prompts-sent-to-chosen-ai-providers"
third_party_data_sharing: |
WARNING: This skill sends your prompts to whichever AI providers you select for testing.
Each provider has their own data retention and privacy policies:
- Anthropic: https://www.anthropic.com/legal/privacy
- OpenAI: https://openai.com/policies/privacy-policy
- Google: https://ai.google.dev/gemini-api/terms
- Mistral: https://mistral.ai/terms/
- DeepSeek: https://www.deepseekEditorial read
Docs source
CLAWHUB
Editorial quality
thin
Skill: Prompt Performance Tester - UnisAI Owner: vedantsingh60 Summary: Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations. Tags: Latest:1.1.4, ai-testing:1.0.1, ai-testing multi-provider prompt-optimization cost-analysis llm-benchmarking claude gpt gemini performance-testing api-comparison multi-model:1.1.2, claude-api
Skill: Prompt Performance Tester - UnisAI
Owner: vedantsingh60
Summary: Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations.
Tags: Latest:1.1.4, ai-testing:1.0.1, ai-testing multi-provider prompt-optimization cost-analysis llm-benchmarking claude gpt gemini performance-testing api-comparison multi-model:1.1.2, claude-api:1.0.1, cost-analysis:1.0.1, latest:1.1.9, llm-benchmarking:1.0.1, openai-api:1.0.1, prompt-optimization:1.0.1
Version history:
v1.1.9 | 2026-02-27T17:27:39.522Z | user
v1.1.8 | 2026-02-27T17:27:27.786Z | user
v1.1.7 | 2026-02-27T16:50:17.543Z | user
Expanded to support model-agnostic benchmarking across 9 major LLM providers.
v1.1.6 | 2026-02-27T16:49:57.000Z | user
Expanded to support model-agnostic benchmarking across 9 major LLM providers.
v1.1.5 | 2026-02-16T20:07:53.120Z | user
v1.1.4 | 2026-02-02T03:45:04.686Z | user
1.1.4 is a documentation cleanup and simplification release.
v1.1.3 | 2026-02-02T03:15:28.352Z | user
v1.1.2 | 2026-02-02T03:03:35.305Z | user
Version 1.1.2
v1.1.1 | 2026-02-02T02:55:31.272Z | user
v1.1.0 | 2026-02-02T02:52:52.200Z | user
Version "1.1.0": - "✨ Multi-provider support: Claude, GPT, and Gemini" - "✨ 9 LLM models supported across 3 providers" - "✨ Cross-provider cost comparison engine" - "✨ Provider-specific API optimizations" - "✨ Enhanced recommendations with multi-provider insights" - "✨ Rebranded from Prompt Migrator to UniAI" - "🏷️ Updated tags for better discoverability (14 tags)" - "📊 Improved cost calculation accuracy" - "🔧 Added OpenAI and Google API integrations" - "📝 Updated documentation with multi-provider examples"
v1.0.1 | 2026-02-02T02:33:38.204Z | user
IP_PROTECTION_GUIDE.md.v1.0.0 | 2026-02-02T02:22:51.962Z | user
Initial release - Multi-model prompt testing across OpenAI, Claude Haiku, Sonnet, and Opus with latency, cost, and quality metrics
Archive index:
Archive v1.1.9: 5 files, 17508 bytes
Files: LICENSE.md (4799b), manifest.yaml (7490b), prompt_performance_tester.py (22862b), SKILL.md (17974b), _meta.json (144b)
File v1.1.9:SKILL.md
Model-agnostic prompt benchmarking across 9 providers.
Pass any model ID — provider auto-detected. Compare latency, cost, quality, and consistency across Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, and Mistral.
Comparing LLM models across providers requires manual testing:
Test prompts across any model from any supported provider simultaneously. Get performance metrics and recommendations based on latency, cost, and quality.
For 10,000 requests/day with average 28 input + 115 output tokens:
Pass any model ID — provider is auto-detected from the model name prefix. No hardcoded list; new models work without code changes.
| Provider | Example Models | Prefix | Required Key |
|----------|---------------|--------|--------------|
| Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 | claude- | ANTHROPIC_API_KEY |
| OpenAI | gpt-5.2-pro, gpt-5.2, gpt-5.1 | gpt-, o1, o3 | OPENAI_API_KEY |
| Google | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite | gemini- | GOOGLE_API_KEY |
| Mistral | mistral-large-latest, mistral-small-latest | mistral-, mixtral- | MISTRAL_API_KEY |
| DeepSeek | deepseek-chat, deepseek-reasoner | deepseek- | DEEPSEEK_API_KEY |
| xAI | grok-4-1-fast, grok-3-beta | grok- | XAI_API_KEY |
| MiniMax | MiniMax-M2.1 | MiniMax, minimax | MINIMAX_API_KEY |
| Qwen | qwen3.5-plus, qwen3-max-instruct | qwen | DASHSCOPE_API_KEY |
| Meta Llama | meta-llama/llama-4-maverick, meta-llama/llama-3.3-70b-instruct | meta-llama/, llama- | OPENROUTER_API_KEY |
| Model | Input | Output | |-------|-------|--------| | claude-opus-4-6 | $15.00 | $75.00 | | claude-sonnet-4-6 | $3.00 | $15.00 | | claude-haiku-4-5-20251001 | $1.00 | $5.00 | | gpt-5.2-pro | $21.00 | $168.00 | | gpt-5.2 | $1.75 | $14.00 | | gpt-5.1 | $2.00 | $8.00 | | gemini-2.5-pro | $1.25 | $10.00 | | gemini-2.5-flash | $0.30 | $2.50 | | gemini-2.5-flash-lite | $0.10 | $0.40 | | mistral-large-latest | $2.00 | $6.00 | | mistral-small-latest | $0.10 | $0.30 | | deepseek-chat | $0.27 | $1.10 | | deepseek-reasoner | $0.55 | $2.19 | | grok-4-1-fast | $5.00 | $25.00 | | grok-3-beta | $3.00 | $15.00 | | MiniMax-M2.1 | $0.40 | $1.60 | | qwen3.5-plus | $0.57 | $2.29 | | qwen3-max-instruct | $1.60 | $6.40 | | meta-llama/llama-4-maverick | $0.20 | $0.60 | | meta-llama/llama-3.3-70b-instruct | $0.59 | $0.79 |
Note: Unlisted models still work — cost calculation returns $0.00 with a warning. Pricing table is for reference only, not a validation gate.
Every test measures:
Get instant answers to:
PROMPT: "Write a professional customer service response about a delayed shipment"
┌─────────────────────────────────────────────────────────────────┐
│ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE │
├─────────────────────────────────────────────────────────────────┤
│ Latency: 523ms │
│ Cost: $0.000025 │
│ Quality: 65/100 │
│ Tokens: 28 in / 87 out │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK │
├─────────────────────────────────────────────────────────────────┤
│ Latency: 710ms │
│ Cost: $0.000048 │
│ Quality: 70/100 │
│ Tokens: 28 in / 92 out │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER │
├─────────────────────────────────────────────────────────────────┤
│ Latency: 891ms │
│ Cost: $0.000145 │
│ Quality: 78/100 │
│ Tokens: 28 in / 102 out │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY │
├─────────────────────────────────────────────────────────────────┤
│ Latency: 645ms │
│ Cost: $0.000402 │
│ Quality: 88/100 │
│ Tokens: 28 in / 98 out │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE OPUS 4.6 (Anthropic) 🏆 HIGHEST QUALITY │
├─────────────────────────────────────────────────────────────────┤
│ Latency: 1,234ms │
│ Cost: $0.001875 │
│ Quality: 94/100 │
│ Tokens: 28 in / 125 out │
└─────────────────────────────────────────────────────────────────┘
🎯 RECOMMENDATIONS:
1. Most cost-effective: Gemini 2.5 Flash-Lite ($0.000025/request) — 99.98% cheaper than Opus
2. Budget pick: DeepSeek Chat ($0.000048/request) — strong quality at low cost
3. Best quality: Claude Opus 4.6 (94/100) — state-of-the-art reasoning & analysis
4. Smart pick: Claude Haiku 4.5 ($0.000145/request) — 81% cheaper, 83% quality match
5. Speed + Quality: GPT-5.2 ($0.000402/request) — excellent quality at mid-range cost
💡 Potential monthly savings (10,000 requests/day, 28 input + 115 output tokens avg):
- Using Gemini 2.5 Flash-Lite vs Opus: $903/month saved ($1.44 vs $904.50)
- Using DeepSeek Chat vs Opus: $899/month saved ($4.50 vs $904.50)
- Using Claude Haiku vs Opus: $731/month saved ($173.40 vs $904.50)
Click "Subscribe" on ClawhHub to get access.
Add keys for the providers you want to test:
# Anthropic (Claude models)
export ANTHROPIC_API_KEY="sk-ant-..."
# OpenAI (GPT models)
export OPENAI_API_KEY="sk-..."
# Google (Gemini models)
export GOOGLE_API_KEY="AI..."
# DeepSeek
export DEEPSEEK_API_KEY="..."
# xAI (Grok models)
export XAI_API_KEY="..."
# MiniMax
export MINIMAX_API_KEY="..."
# Alibaba (Qwen models)
export DASHSCOPE_API_KEY="..."
# OpenRouter (Meta Llama models)
export OPENROUTER_API_KEY="..."
# Mistral
export MISTRAL_API_KEY="..."
You only need keys for the providers you plan to test.
# Install only what you need
pip install anthropic # Claude
pip install openai # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama
pip install google-generativeai # Gemini
pip install mistralai # Mistral
# Or install everything
pip install anthropic openai google-generativeai mistralai
Option A: Python
import os
from prompt_performance_tester import PromptPerformanceTester
tester = PromptPerformanceTester() # reads API keys from environment
results = tester.test_prompt(
prompt_text="Write a professional email apologizing for a delayed shipment",
models=[
"claude-haiku-4-5-20251001",
"gpt-5.2",
"gemini-2.5-flash",
"deepseek-chat",
],
num_runs=3,
max_tokens=500
)
print(tester.format_results(results))
print(f"🏆 Best quality: {results.best_model}")
print(f"💰 Cheapest: {results.cheapest_model}")
print(f"⚡ Fastest: {results.fastest_model}")
Option B: CLI
# Test across multiple models
prompt-tester test "Your prompt here" \
--models claude-haiku-4-5-20251001 gpt-5.2 gemini-2.5-flash deepseek-chat \
--runs 3
# Export results
prompt-tester test "Your prompt here" --export results.json
anthropic, openai, google-generativeai, mistralai (install only what you need)PROVIDER_MAP detects provider from model name; no hardcoded whitelistopenai SDK with a custom base_urlcost=0 with a warningEvery test captures:
Q: Do I need API keys for all 9 providers?
A: No. You only need keys for the providers you want to test. If you only test Claude models, you only need ANTHROPIC_API_KEY.
Q: Who pays for the API costs? A: You do. You provide your own API keys and pay each provider directly. This skill has no per-request fees.
Q: How accurate are the cost calculations?
A: Costs are calculated from the known pricing table using actual token counts. Models not in the pricing table return $0.00 — the model still runs, the cost just won't be shown.
Q: Can I test models not in the pricing table? A: Yes. Any model whose name starts with a supported prefix will run. Cost will show as $0.00 for unlisted models.
Q: Can I test prompts in non-English languages? A: Yes. All supported providers handle multiple languages.
Q: Can I use this in production/CI/CD?
A: Yes. Import PromptPerformanceTester directly from Python or call via CLI.
Q: What if my prompt is very long?
A: Set max_tokens appropriately. The skill passes your prompt as-is to each provider's API.
This skill is distributed via ClawhHub under the following terms.
Full Terms: See LICENSE.md
PROPRIETARY_SKILL_VEDANT_2024) from docsPROPRIETARY_SKILL_UNISAI_2026_MULTI_PROVIDER throughoutLast Updated: February 27, 2026 Current Version: 1.1.8 Status: Active & Maintained
© 2026 UnisAI. All rights reserved.
File v1.1.9:_meta.json
{ "ownerId": "kn77yjs5esft2kgsd6dpz9c92n80dgsy", "slug": "prompt-performance-tester", "version": "1.1.9", "publishedAt": 1772213259522 }
File v1.1.9:LICENSE.md
Version 1.0 | Effective Date: February 2, 2024
UniAI ("Licensor") grants you ("Licensee") a limited, non-exclusive, non-transferable, revocable license to use the ClawhHub Skills ("Software") solely in accordance with the terms of this license agreement.
You may NOT:
All intellectual property rights in and to the Software are retained by Licensor. This includes:
You may only:
THE SOFTWARE IS PROVIDED "AS-IS" WITHOUT ANY WARRANTIES. LICENSOR DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO:
IN NO EVENT SHALL LICENSOR BE LIABLE FOR:
Licensor may terminate your license if you:
Upon termination:
To use the Software for commercial purposes:
For commercial use with Starter tier, contact: hello@unisai.vercel.app
The Software uses third-party services (e.g., Anthropic API). Your use is also subject to their terms of service:
Licensor reserves the right to:
You agree to comply with all applicable laws and regulations in your jurisdiction when using the Software.
Any disputes arising from this agreement shall be:
This agreement, along with our Privacy Policy and Terms of Service, constitutes the entire agreement between you and Licensor regarding the Software.
For licensing inquiries or support:
By using the Software, you acknowledge that you have read, understood, and agree to be bound by this License Agreement.
© 2026 UniAI. All rights reserved.
File v1.1.9:manifest.yaml
name: "Prompt Performance Tester" id: "prompt-performance-tester" version: "1.1.8" description: "Model-agnostic prompt benchmarking across 9 providers. Pass any model ID from Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, Mistral — provider auto-detected. Measures latency, cost, quality, and consistency."
homepage: "https://unisai.vercel.app" repository: "https://github.com/vedantsingh60/prompt-performance-tester" source: "included"
intellectual_property: license: "free-to-use" license_file: "LICENSE.md" copyright: "© 2026 UnisAI. All rights reserved." distribution: "via-clawhub-only" source_code_access: "included" modification: "personal-use-only" reverse_engineering: "allowed-for-security-audit"
author: company: "UnisAI" contact: "hello@unisai.vercel.app" website: "https://unisai.vercel.app"
category: "ai-testing" tags:
pricing: model: "free"
runtime: "local" execution: "python"
required_env_vars:
dependencies: python: ">=3.9" packages: - "anthropic>=0.40.0" - "openai>=1.60.0" - "google-generativeai>=0.8.0" - "mistralai>=1.3.0" install_all: "pip install anthropic openai google-generativeai mistralai" install_selective: | pip install anthropic # Claude pip install openai # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama (OpenAI-compat) pip install google-generativeai # Gemini pip install mistralai # Mistral note: "Install only the SDKs for the providers you plan to test. DeepSeek, xAI, MiniMax, Qwen, and Llama all use the openai package with a custom base URL." requirements_file: "requirements.txt"
security: data_retention: "0 days" data_flow: "prompts-sent-to-chosen-ai-providers" third_party_data_sharing: | WARNING: This skill sends your prompts to whichever AI providers you select for testing. Each provider has their own data retention and privacy policies: - Anthropic: https://www.anthropic.com/legal/privacy - OpenAI: https://openai.com/policies/privacy-policy - Google: https://ai.google.dev/gemini-api/terms - Mistral: https://mistral.ai/terms/ - DeepSeek: https://www.deepseek.com/privacy_policy - xAI: https://x.ai/privacy - OpenRouter: https://openrouter.ai/privacy api_key_storage: "Environment variables only — never hardcoded or logged" network_access: "Required to call chosen AI provider APIs"
capabilities: functions: - name: "testPrompt" description: "Test a prompt across multiple LLM models and providers" parameters: prompt_text: type: "string" description: "The prompt to benchmark" required: true models: type: "array" description: "List of model IDs to test — any model matching a supported prefix works" items: type: "string" examples: - "claude-sonnet-4-6" - "gpt-5.2" - "deepseek-chat" - "grok-4-1-fast" - "gemini-2.5-flash" required: false num_runs: type: "number" description: "Number of runs per model for consistency testing" default: 1 range: [1, 10] system_prompt: type: "string" description: "Optional system prompt" max_tokens: type: "number" description: "Maximum response tokens" default: 1000 range: [100, 4000]
environment_variables: ANTHROPIC_API_KEY: description: "Anthropic API key — required for any claude-* model" required_for_prefix: "claude-" OPENAI_API_KEY: description: "OpenAI API key — required for any gpt-, o1, o3* model" required_for_prefix: "gpt-, o1, o3" GOOGLE_API_KEY: description: "Google AI API key — required for any gemini-* model" required_for_prefix: "gemini-" MISTRAL_API_KEY: description: "Mistral API key — required for mistral-, mixtral- models" required_for_prefix: "mistral-, mixtral-" DEEPSEEK_API_KEY: description: "DeepSeek API key — required for any deepseek-* model" required_for_prefix: "deepseek-" XAI_API_KEY: description: "xAI API key — required for any grok-* model" required_for_prefix: "grok-" MINIMAX_API_KEY: description: "MiniMax API key — required for minimax* or MiniMax* models" required_for_prefix: "minimax, MiniMax" DASHSCOPE_API_KEY: description: "Alibaba DashScope API key — required for any qwen* model" required_for_prefix: "qwen" OPENROUTER_API_KEY: description: "OpenRouter API key — required for meta-llama/* or llama-* models" required_for_prefix: "meta-llama/, llama-"
support: support_email: "support@unisai.vercel.app" website: "https://unisai.vercel.app" github: "https://github.com/vedantsingh60/prompt-performance-tester" documentation: "See SKILL.md in this package" response_time: "Best effort — community supported"
restrictions:
changelog: "1.1.8": - "🏗️ Model-agnostic architecture — provider auto-detected from model name prefix, no hardcoded whitelist" - "✨ Added DeepSeek, xAI Grok, MiniMax, Qwen as first-class providers (9 total)" - "✨ Updated Claude to 4.6 series (claude-opus-4-6, claude-sonnet-4-6)" - "✨ Any future model works automatically without code changes" - "🔧 Lazy client initialization — only loads SDKs for providers actually used" - "🔧 Unified OpenAI-compat path for DeepSeek, xAI, MiniMax, Qwen, OpenRouter" - "📝 Fixed UnisAI branding (was UniAI)" - "💰 Updated pricing table with 20 models across 9 providers" "1.1.5": - "🚀 Updated to latest 2026 models" - "✨ GPT-5.2 series (Instant, Thinking, Pro)" - "✨ Gemini 3 Pro and 2.5 series" - "✨ Claude 4.5 pricing updates" - "✨ 10 total models across 3 providers" "1.1.0": - "✨ Multi-provider support (Claude, GPT, Gemini)" - "✨ Cross-provider cost comparison" - "✨ Enhanced recommendations engine" "1.0.0": - "Initial release with Claude-only support" - "Performance metrics: latency, cost, quality, consistency"
metadata: status: "active" created_at: "2024-02-02T00:00:00Z" updated_at: "2026-02-27T00:00:00Z" maturity: "production" maintenance: "actively-maintained" compatibility: - "OpenClaw v1.0+" - "Claude Code" - "ClawhHub v2.0+" security_audit: "Source code included for security review and transparency"
Archive v1.1.8: 5 files, 17509 bytes
Files: LICENSE.md (4799b), manifest.yaml (7490b), prompt_performance_tester.py (22862b), SKILL.md (17974b), _meta.json (144b)
File v1.1.8:SKILL.md
Model-agnostic prompt benchmarking across 9 providers.
Pass any model ID — provider auto-detected. Compare latency, cost, quality, and consistency across Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, and Mistral.
Comparing LLM models across providers requires manual testing:
Test prompts across any model from any supported provider simultaneously. Get performance metrics and recommendations based on latency, cost, and quality.
For 10,000 requests/day with average 28 input + 115 output tokens:
Pass any model ID — provider is auto-detected from the model name prefix. No hardcoded list; new models work without code changes.
| Provider | Example Models | Prefix | Required Key |
|----------|---------------|--------|--------------|
| Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 | claude- | ANTHROPIC_API_KEY |
| OpenAI | gpt-5.2-pro, gpt-5.2, gpt-5.1 | gpt-, o1, o3 | OPENAI_API_KEY |
| Google | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite | gemini- | GOOGLE_API_KEY |
| Mistral | mistral-large-latest, mistral-small-latest | mistral-, mixtral- | MISTRAL_API_KEY |
| DeepSeek | deepseek-chat, deepseek-reasoner | deepseek- | DEEPSEEK_API_KEY |
| xAI | grok-4-1-fast, grok-3-beta | grok- | XAI_API_KEY |
| MiniMax | MiniMax-M2.1 | MiniMax, minimax | MINIMAX_API_KEY |
| Qwen | qwen3.5-plus, qwen3-max-instruct | qwen | DASHSCOPE_API_KEY |
| Meta Llama | meta-llama/llama-4-maverick, meta-llama/llama-3.3-70b-instruct | meta-llama/, llama- | OPENROUTER_API_KEY |
| Model | Input | Output | |-------|-------|--------| | claude-opus-4-6 | $15.00 | $75.00 | | claude-sonnet-4-6 | $3.00 | $15.00 | | claude-haiku-4-5-20251001 | $1.00 | $5.00 | | gpt-5.2-pro | $21.00 | $168.00 | | gpt-5.2 | $1.75 | $14.00 | | gpt-5.1 | $2.00 | $8.00 | | gemini-2.5-pro | $1.25 | $10.00 | | gemini-2.5-flash | $0.30 | $2.50 | | gemini-2.5-flash-lite | $0.10 | $0.40 | | mistral-large-latest | $2.00 | $6.00 | | mistral-small-latest | $0.10 | $0.30 | | deepseek-chat | $0.27 | $1.10 | | deepseek-reasoner | $0.55 | $2.19 | | grok-4-1-fast | $5.00 | $25.00 | | grok-3-beta | $3.00 | $15.00 | | MiniMax-M2.1 | $0.40 | $1.60 | | qwen3.5-plus | $0.57 | $2.29 | | qwen3-max-instruct | $1.60 | $6.40 | | meta-llama/llama-4-maverick | $0.20 | $0.60 | | meta-llama/llama-3.3-70b-instruct | $0.59 | $0.79 |
Note: Unlisted models still work — cost calculation returns $0.00 with a warning. Pricing table is for reference only, not a validation gate.
Every test measures:
Get instant answers to:
PROMPT: "Write a professional customer service response about a delayed shipment"
┌─────────────────────────────────────────────────────────────────┐
│ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE │
├─────────────────────────────────────────────────────────────────┤
│ Latency: 523ms │
│ Cost: $0.000025 │
│ Quality: 65/100 │
│ Tokens: 28 in / 87 out │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK │
├─────────────────────────────────────────────────────────────────┤
│ Latency: 710ms │
│ Cost: $0.000048 │
│ Quality: 70/100 │
│ Tokens: 28 in / 92 out │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER │
├─────────────────────────────────────────────────────────────────┤
│ Latency: 891ms │
│ Cost: $0.000145 │
│ Quality: 78/100 │
│ Tokens: 28 in / 102 out │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY │
├─────────────────────────────────────────────────────────────────┤
│ Latency: 645ms │
│ Cost: $0.000402 │
│ Quality: 88/100 │
│ Tokens: 28 in / 98 out │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE OPUS 4.6 (Anthropic) 🏆 HIGHEST QUALITY │
├─────────────────────────────────────────────────────────────────┤
│ Latency: 1,234ms │
│ Cost: $0.001875 │
│ Quality: 94/100 │
│ Tokens: 28 in / 125 out │
└─────────────────────────────────────────────────────────────────┘
🎯 RECOMMENDATIONS:
1. Most cost-effective: Gemini 2.5 Flash-Lite ($0.000025/request) — 99.98% cheaper than Opus
2. Budget pick: DeepSeek Chat ($0.000048/request) — strong quality at low cost
3. Best quality: Claude Opus 4.6 (94/100) — state-of-the-art reasoning & analysis
4. Smart pick: Claude Haiku 4.5 ($0.000145/request) — 81% cheaper, 83% quality match
5. Speed + Quality: GPT-5.2 ($0.000402/request) — excellent quality at mid-range cost
💡 Potential monthly savings (10,000 requests/day, 28 input + 115 output tokens avg):
- Using Gemini 2.5 Flash-Lite vs Opus: $903/month saved ($1.44 vs $904.50)
- Using DeepSeek Chat vs Opus: $899/month saved ($4.50 vs $904.50)
- Using Claude Haiku vs Opus: $731/month saved ($173.40 vs $904.50)
Click "Subscribe" on ClawhHub to get access.
Add keys for the providers you want to test:
# Anthropic (Claude models)
export ANTHROPIC_API_KEY="sk-ant-..."
# OpenAI (GPT models)
export OPENAI_API_KEY="sk-..."
# Google (Gemini models)
export GOOGLE_API_KEY="AI..."
# DeepSeek
export DEEPSEEK_API_KEY="..."
# xAI (Grok models)
export XAI_API_KEY="..."
# MiniMax
export MINIMAX_API_KEY="..."
# Alibaba (Qwen models)
export DASHSCOPE_API_KEY="..."
# OpenRouter (Meta Llama models)
export OPENROUTER_API_KEY="..."
# Mistral
export MISTRAL_API_KEY="..."
You only need keys for the providers you plan to test.
# Install only what you need
pip install anthropic # Claude
pip install openai # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama
pip install google-generativeai # Gemini
pip install mistralai # Mistral
# Or install everything
pip install anthropic openai google-generativeai mistralai
Option A: Python
import os
from prompt_performance_tester import PromptPerformanceTester
tester = PromptPerformanceTester() # reads API keys from environment
results = tester.test_prompt(
prompt_text="Write a professional email apologizing for a delayed shipment",
models=[
"claude-haiku-4-5-20251001",
"gpt-5.2",
"gemini-2.5-flash",
"deepseek-chat",
],
num_runs=3,
max_tokens=500
)
print(tester.format_results(results))
print(f"🏆 Best quality: {results.best_model}")
print(f"💰 Cheapest: {results.cheapest_model}")
print(f"⚡ Fastest: {results.fastest_model}")
Option B: CLI
# Test across multiple models
prompt-tester test "Your prompt here" \
--models claude-haiku-4-5-20251001 gpt-5.2 gemini-2.5-flash deepseek-chat \
--runs 3
# Export results
prompt-tester test "Your prompt here" --export results.json
anthropic, openai, google-generativeai, mistralai (install only what you need)PROVIDER_MAP detects provider from model name; no hardcoded whitelistopenai SDK with a custom base_urlcost=0 with a warningEvery test captures:
Q: Do I need API keys for all 9 providers?
A: No. You only need keys for the providers you want to test. If you only test Claude models, you only need ANTHROPIC_API_KEY.
Q: Who pays for the API costs? A: You do. You provide your own API keys and pay each provider directly. This skill has no per-request fees.
Q: How accurate are the cost calculations?
A: Costs are calculated from the known pricing table using actual token counts. Models not in the pricing table return $0.00 — the model still runs, the cost just won't be shown.
Q: Can I test models not in the pricing table? A: Yes. Any model whose name starts with a supported prefix will run. Cost will show as $0.00 for unlisted models.
Q: Can I test prompts in non-English languages? A: Yes. All supported providers handle multiple languages.
Q: Can I use this in production/CI/CD?
A: Yes. Import PromptPerformanceTester directly from Python or call via CLI.
Q: What if my prompt is very long?
A: Set max_tokens appropriately. The skill passes your prompt as-is to each provider's API.
This skill is distributed via ClawhHub under the following terms.
Full Terms: See LICENSE.md
PROPRIETARY_SKILL_VEDANT_2024) from docsPROPRIETARY_SKILL_UNISAI_2026_MULTI_PROVIDER throughoutLast Updated: February 27, 2026 Current Version: 1.1.8 Status: Active & Maintained
© 2026 UnisAI. All rights reserved.
File v1.1.8:_meta.json
{ "ownerId": "kn77yjs5esft2kgsd6dpz9c92n80dgsy", "slug": "prompt-performance-tester", "version": "1.1.8", "publishedAt": 1772213247786 }
File v1.1.8:LICENSE.md
Version 1.0 | Effective Date: February 2, 2024
UniAI ("Licensor") grants you ("Licensee") a limited, non-exclusive, non-transferable, revocable license to use the ClawhHub Skills ("Software") solely in accordance with the terms of this license agreement.
You may NOT:
All intellectual property rights in and to the Software are retained by Licensor. This includes:
You may only:
THE SOFTWARE IS PROVIDED "AS-IS" WITHOUT ANY WARRANTIES. LICENSOR DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO:
IN NO EVENT SHALL LICENSOR BE LIABLE FOR:
Licensor may terminate your license if you:
Upon termination:
To use the Software for commercial purposes:
For commercial use with Starter tier, contact: hello@unisai.vercel.app
The Software uses third-party services (e.g., Anthropic API). Your use is also subject to their terms of service:
Licensor reserves the right to:
You agree to comply with all applicable laws and regulations in your jurisdiction when using the Software.
Any disputes arising from this agreement shall be:
This agreement, along with our Privacy Policy and Terms of Service, constitutes the entire agreement between you and Licensor regarding the Software.
For licensing inquiries or support:
By using the Software, you acknowledge that you have read, understood, and agree to be bound by this License Agreement.
© 2026 UniAI. All rights reserved.
File v1.1.8:manifest.yaml
name: "Prompt Performance Tester" id: "prompt-performance-tester" version: "1.1.8" description: "Model-agnostic prompt benchmarking across 9 providers. Pass any model ID from Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, Mistral — provider auto-detected. Measures latency, cost, quality, and consistency."
homepage: "https://unisai.vercel.app" repository: "https://github.com/vedantsingh60/prompt-performance-tester" source: "included"
intellectual_property: license: "free-to-use" license_file: "LICENSE.md" copyright: "© 2026 UnisAI. All rights reserved." distribution: "via-clawhub-only" source_code_access: "included" modification: "personal-use-only" reverse_engineering: "allowed-for-security-audit"
author: company: "UnisAI" contact: "hello@unisai.vercel.app" website: "https://unisai.vercel.app"
category: "ai-testing" tags:
pricing: model: "free"
runtime: "local" execution: "python"
required_env_vars:
dependencies: python: ">=3.9" packages: - "anthropic>=0.40.0" - "openai>=1.60.0" - "google-generativeai>=0.8.0" - "mistralai>=1.3.0" install_all: "pip install anthropic openai google-generativeai mistralai" install_selective: | pip install anthropic # Claude pip install openai # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama (OpenAI-compat) pip install google-generativeai # Gemini pip install mistralai # Mistral note: "Install only the SDKs for the providers you plan to test. DeepSeek, xAI, MiniMax, Qwen, and Llama all use the openai package with a custom base URL." requirements_file: "requirements.txt"
security: data_retention: "0 days" data_flow: "prompts-sent-to-chosen-ai-providers" third_party_data_sharing: | WARNING: This skill sends your prompts to whichever AI providers you select for testing. Each provider has their own data retention and privacy policies: - Anthropic: https://www.anthropic.com/legal/privacy - OpenAI: https://openai.com/policies/privacy-policy - Google: https://ai.google.dev/gemini-api/terms - Mistral: https://mistral.ai/terms/ - DeepSeek: https://www.deepseek.com/privacy_policy - xAI: https://x.ai/privacy - OpenRouter: https://openrouter.ai/privacy api_key_storage: "Environment variables only — never hardcoded or logged" network_access: "Required to call chosen AI provider APIs"
capabilities: functions: - name: "testPrompt" description: "Test a prompt across multiple LLM models and providers" parameters: prompt_text: type: "string" description: "The prompt to benchmark" required: true models: type: "array" description: "List of model IDs to test — any model matching a supported prefix works" items: type: "string" examples: - "claude-sonnet-4-6" - "gpt-5.2" - "deepseek-chat" - "grok-4-1-fast" - "gemini-2.5-flash" required: false num_runs: type: "number" description: "Number of runs per model for consistency testing" default: 1 range: [1, 10] system_prompt: type: "string" description: "Optional system prompt" max_tokens: type: "number" description: "Maximum response tokens" default: 1000 range: [100, 4000]
environment_variables: ANTHROPIC_API_KEY: description: "Anthropic API key — required for any claude-* model" required_for_prefix: "claude-" OPENAI_API_KEY: description: "OpenAI API key — required for any gpt-, o1, o3* model" required_for_prefix: "gpt-, o1, o3" GOOGLE_API_KEY: description: "Google AI API key — required for any gemini-* model" required_for_prefix: "gemini-" MISTRAL_API_KEY: description: "Mistral API key — required for mistral-, mixtral- models" required_for_prefix: "mistral-, mixtral-" DEEPSEEK_API_KEY: description: "DeepSeek API key — required for any deepseek-* model" required_for_prefix: "deepseek-" XAI_API_KEY: description: "xAI API key — required for any grok-* model" required_for_prefix: "grok-" MINIMAX_API_KEY: description: "MiniMax API key — required for minimax* or MiniMax* models" required_for_prefix: "minimax, MiniMax" DASHSCOPE_API_KEY: description: "Alibaba DashScope API key — required for any qwen* model" required_for_prefix: "qwen" OPENROUTER_API_KEY: description: "OpenRouter API key — required for meta-llama/* or llama-* models" required_for_prefix: "meta-llama/, llama-"
support: support_email: "support@unisai.vercel.app" website: "https://unisai.vercel.app" github: "https://github.com/vedantsingh60/prompt-performance-tester" documentation: "See SKILL.md in this package" response_time: "Best effort — community supported"
restrictions:
changelog: "1.1.8": - "🏗️ Model-agnostic architecture — provider auto-detected from model name prefix, no hardcoded whitelist" - "✨ Added DeepSeek, xAI Grok, MiniMax, Qwen as first-class providers (9 total)" - "✨ Updated Claude to 4.6 series (claude-opus-4-6, claude-sonnet-4-6)" - "✨ Any future model works automatically without code changes" - "🔧 Lazy client initialization — only loads SDKs for providers actually used" - "🔧 Unified OpenAI-compat path for DeepSeek, xAI, MiniMax, Qwen, OpenRouter" - "📝 Fixed UnisAI branding (was UniAI)" - "💰 Updated pricing table with 20 models across 9 providers" "1.1.5": - "🚀 Updated to latest 2026 models" - "✨ GPT-5.2 series (Instant, Thinking, Pro)" - "✨ Gemini 3 Pro and 2.5 series" - "✨ Claude 4.5 pricing updates" - "✨ 10 total models across 3 providers" "1.1.0": - "✨ Multi-provider support (Claude, GPT, Gemini)" - "✨ Cross-provider cost comparison" - "✨ Enhanced recommendations engine" "1.0.0": - "Initial release with Claude-only support" - "Performance metrics: latency, cost, quality, consistency"
metadata: status: "active" created_at: "2024-02-02T00:00:00Z" updated_at: "2026-02-27T00:00:00Z" maturity: "production" maintenance: "actively-maintained" compatibility: - "OpenClaw v1.0+" - "Claude Code" - "ClawhHub v2.0+" security_audit: "Source code included for security review and transparency"
Machine endpoints, contract coverage, trust signals, runtime metrics, benchmarks, and guardrails for agent-to-agent use.
Machine interfaces
Contract coverage
Status
missing
Auth
None
Streaming
No
Data region
Unspecified
Protocol support
Requires: none
Forbidden: none
Guardrails
Operational confidence: low
curl -s "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/snapshot"
curl -s "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract"
curl -s "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust"
Operational fit
Trust signals
Handshake
UNKNOWN
Confidence
unknown
Attempts 30d
unknown
Fallback rate
unknown
Runtime metrics
Observed P50
unknown
Observed P95
unknown
Rate limit
unknown
Estimated cost
unknown
Do not use if
Raw contract, invocation, trust, capability, facts, and change-event payloads for machine-side inspection.
Contract JSON
{
"contractStatus": "missing",
"authModes": [],
"requires": [],
"forbidden": [],
"supportsMcp": false,
"supportsA2a": false,
"supportsStreaming": false,
"inputSchemaRef": null,
"outputSchemaRef": null,
"dataRegion": null,
"contractUpdatedAt": null,
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Invocation Guide
{
"preferredApi": {
"snapshotUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/snapshot",
"contractUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract",
"trustUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust"
},
"curlExamples": [
"curl -s \"https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/snapshot\"",
"curl -s \"https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract\"",
"curl -s \"https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust\""
],
"jsonRequestTemplate": {
"query": "summarize this repo",
"constraints": {
"maxLatencyMs": 2000,
"protocolPreference": [
"OPENCLEW"
]
}
},
"jsonResponseTemplate": {
"ok": true,
"result": {
"summary": "...",
"confidence": 0.9
},
"meta": {
"source": "CLAWHUB",
"generatedAt": "2026-04-17T02:48:13.304Z"
}
},
"retryPolicy": {
"maxAttempts": 3,
"backoffMs": [
500,
1500,
3500
],
"retryableConditions": [
"HTTP_429",
"HTTP_503",
"NETWORK_TIMEOUT"
]
}
}Trust JSON
{
"status": "unavailable",
"handshakeStatus": "UNKNOWN",
"verificationFreshnessHours": null,
"reputationScore": null,
"p95LatencyMs": null,
"successRate30d": null,
"fallbackRate": null,
"attempts30d": null,
"trustUpdatedAt": null,
"trustConfidence": "unknown",
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Capability Matrix
{
"rows": [
{
"key": "OPENCLEW",
"type": "protocol",
"support": "unknown",
"confidenceSource": "profile",
"notes": "Listed on profile"
}
],
"flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}Facts JSON
[
{
"factKey": "vendor",
"category": "vendor",
"label": "Vendor",
"value": "Clawhub",
"href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
"sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-04-15T00:45:39.800Z",
"isPublic": true
},
{
"factKey": "protocols",
"category": "compatibility",
"label": "Protocol compatibility",
"value": "OpenClaw",
"href": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract",
"sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract",
"sourceType": "contract",
"confidence": "medium",
"observedAt": "2026-04-15T00:45:39.800Z",
"isPublic": true
},
{
"factKey": "traction",
"category": "adoption",
"label": "Adoption signal",
"value": "1.9K downloads",
"href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
"sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-04-15T00:45:39.800Z",
"isPublic": true
},
{
"factKey": "latest_release",
"category": "release",
"label": "Latest release",
"value": "1.1.9",
"href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
"sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
"sourceType": "release",
"confidence": "medium",
"observedAt": "2026-02-27T17:27:39.522Z",
"isPublic": true
},
{
"factKey": "handshake_status",
"category": "security",
"label": "Handshake status",
"value": "UNKNOWN",
"href": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust",
"sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust",
"sourceType": "trust",
"confidence": "medium",
"observedAt": null,
"isPublic": true
}
]Change Events JSON
[
{
"eventType": "release",
"title": "Release 1.1.9",
"description": "- Updated provider/model lists to include more example models and the latest pricing. - Expanded cost comparison examples to include DeepSeek Chat. - Added clarification that unlisted models are supported, but cost is shown as $0.00 with a warning. - Improved explanation of quality, cost, and performance metrics for broader clarity. - Enhanced recommendation and real-world example sections to better showcase DeepSeek and new models.",
"href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
"sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
"sourceType": "release",
"confidence": "medium",
"observedAt": "2026-02-27T17:27:39.522Z",
"isPublic": true
}
]Sponsored
Ads related to Prompt Performance Tester - UnisAI and adjacent AI workflows.