Claim this agent
Agent DossierCLAWHUBSafety 84/100

Xpersona Agent

Prompt Performance Tester - UnisAI

Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations.

OpenClaw · self-declared
1.9K downloadsTrust evidence available
clawhub skill install kn77yjs5esft2kgsd6dpz9c92n80dgsy:prompt-performance-tester

Overall rank

#62

Adoption

1.9K downloads

Trust

Unknown

Freshness

Feb 28, 2026

Freshness

Last checked Feb 28, 2026

Best For

Prompt Performance Tester - UnisAI is best for general automation workflows where OpenClaw compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

CLAWHUB, CLAWHUB, runtime-metrics, public facts pack

Overview

Key links, install path, reliability highlights, and the shortest practical read before diving into the crawl record.

Self-declaredCLAWHUB

Overview

Executive Summary

Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations. Capability contract not published. No trust telemetry is available yet. 1.9K downloads reported by the source. Last updated 4/15/2026.

No verified compatibility signals1.9K downloads

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Feb 28, 2026

Vendor

Clawhub

Artifacts

0

Benchmarks

0

Last release

1.1.9

Install & run

Setup Snapshot

clawhub skill install kn77yjs5esft2kgsd6dpz9c92n80dgsy:prompt-performance-tester
  1. 1

    Install using `clawhub skill install kn77yjs5esft2kgsd6dpz9c92n80dgsy:prompt-performance-tester` in an isolated environment before connecting it to live workloads.

  2. 2

    No published capability contract is available yet, so validate auth and request/response behavior manually.

  3. 3

    Review the upstream CLAWHUB listing at https://clawhub.ai/vedantsingh60/prompt-performance-tester before using production credentials.

Evidence & Timeline

Public facts grouped by evidence type, plus release and crawl events with provenance and freshness.

Self-declaredCLAWHUB

Public facts

Evidence Ledger

Vendor (1)

Vendor

Clawhub

profilemedium
Observed Apr 15, 2026Source linkProvenance
Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium
Observed Apr 15, 2026Source linkProvenance
Release (1)

Latest release

1.1.9

releasemedium
Observed Feb 27, 2026Source linkProvenance
Adoption (1)

Adoption signal

1.9K downloads

profilemedium
Observed Apr 15, 2026Source linkProvenance
Security (1)

Handshake status

UNKNOWN

trustmedium
Observed unknownSource linkProvenance

Artifacts & Docs

Parameters, dependencies, examples, extracted files, editorial overview, and the complete README when available.

Self-declaredCLAWHUB

Captured outputs

Artifacts Archive

Extracted files

4

Examples

6

Snippets

0

Languages

Unknown

Executable Examples

text

PROMPT: "Write a professional customer service response about a delayed shipment"

┌─────────────────────────────────────────────────────────────────┐
│ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE              │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  523ms                                                 │
│ Cost:     $0.000025                                             │
│ Quality:  65/100                                                │
│ Tokens:   28 in / 87 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK                        │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  710ms                                                 │
│ Cost:     $0.000048                                             │
│ Quality:  70/100                                                │
│ Tokens:   28 in / 92 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER             │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  891ms                                                 │
│ Cost:     $0.000145                                             │
│ Quality:  78/100                                                │
│ Tokens:   28 in / 102 out                                       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY                          │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  645ms                                                 │
│ Cost:     $0

bash

# Anthropic (Claude models)
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenAI (GPT models)
export OPENAI_API_KEY="sk-..."

# Google (Gemini models)
export GOOGLE_API_KEY="AI..."

# DeepSeek
export DEEPSEEK_API_KEY="..."

# xAI (Grok models)
export XAI_API_KEY="..."

# MiniMax
export MINIMAX_API_KEY="..."

# Alibaba (Qwen models)
export DASHSCOPE_API_KEY="..."

# OpenRouter (Meta Llama models)
export OPENROUTER_API_KEY="..."

# Mistral
export MISTRAL_API_KEY="..."

bash

# Install only what you need
pip install anthropic          # Claude
pip install openai             # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama
pip install google-generativeai  # Gemini
pip install mistralai          # Mistral

# Or install everything
pip install anthropic openai google-generativeai mistralai

python

import os
from prompt_performance_tester import PromptPerformanceTester

tester = PromptPerformanceTester()  # reads API keys from environment

results = tester.test_prompt(
    prompt_text="Write a professional email apologizing for a delayed shipment",
    models=[
        "claude-haiku-4-5-20251001",
        "gpt-5.2",
        "gemini-2.5-flash",
        "deepseek-chat",
    ],
    num_runs=3,
    max_tokens=500
)

print(tester.format_results(results))
print(f"🏆 Best quality:  {results.best_model}")
print(f"💰 Cheapest:      {results.cheapest_model}")
print(f"⚡ Fastest:       {results.fastest_model}")

bash

# Test across multiple models
prompt-tester test "Your prompt here" \
  --models claude-haiku-4-5-20251001 gpt-5.2 gemini-2.5-flash deepseek-chat \
  --runs 3

# Export results
prompt-tester test "Your prompt here" --export results.json

text

PROMPT: "Write a professional customer service response about a delayed shipment"

┌─────────────────────────────────────────────────────────────────┐
│ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE              │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  523ms                                                 │
│ Cost:     $0.000025                                             │
│ Quality:  65/100                                                │
│ Tokens:   28 in / 87 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK                        │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  710ms                                                 │
│ Cost:     $0.000048                                             │
│ Quality:  70/100                                                │
│ Tokens:   28 in / 92 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER             │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  891ms                                                 │
│ Cost:     $0.000145                                             │
│ Quality:  78/100                                                │
│ Tokens:   28 in / 102 out                                       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY                          │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  645ms                                                 │
│ Cost:     $0
Extracted Files

SKILL.md

# Prompt Performance Tester

**Model-agnostic prompt benchmarking across 9 providers.**

Pass any model ID — provider auto-detected. Compare latency, cost, quality, and consistency across Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, and Mistral.

---

## 🚀 Why This Skill?

### Problem Statement
Comparing LLM models across providers requires manual testing:
- No systematic way to measure performance across models
- Cost differences are significant but not easily comparable
- Quality varies by use case and provider
- Manual API testing is time-consuming and error-prone

### The Solution
Test prompts across any model from any supported provider simultaneously. Get performance metrics and recommendations based on latency, cost, and quality.

### Example Cost Comparison
For 10,000 requests/day with average 28 input + 115 output tokens:
- Claude Opus 4.6: ~$30.15/day ($903/month)
- Gemini 2.5 Flash-Lite: ~$0.05/day ($1.50/month)
- DeepSeek Chat: ~$0.14/day ($4.20/month)
- Monthly cost difference (Opus vs Flash-Lite): $901.50

---

## ✨ What You Get

### Model-Agnostic Multi-Provider Testing
Pass any model ID — provider is auto-detected from the model name prefix.
No hardcoded list; new models work without code changes.

| Provider | Example Models | Prefix | Required Key |
|----------|---------------|--------|--------------|
| **Anthropic** | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 | `claude-` | ANTHROPIC_API_KEY |
| **OpenAI** | gpt-5.2-pro, gpt-5.2, gpt-5.1 | `gpt-`, `o1`, `o3` | OPENAI_API_KEY |
| **Google** | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite | `gemini-` | GOOGLE_API_KEY |
| **Mistral** | mistral-large-latest, mistral-small-latest | `mistral-`, `mixtral-` | MISTRAL_API_KEY |
| **DeepSeek** | deepseek-chat, deepseek-reasoner | `deepseek-` | DEEPSEEK_API_KEY |
| **xAI** | grok-4-1-fast, grok-3-beta | `grok-` | XAI_API_KEY |
| **MiniMax** | MiniMax-M2.1 | `MiniMax`, `minimax` | MINIMAX_API_KEY |
| **Qwen** | qwen3.5-plus, qwen3-max-instruct | `qwen` | DASHSCOPE_API_KEY |
| **Meta Llama** | meta-llama/llama-4-maverick, meta-llama/llama-3.3-70b-instruct | `meta-llama/`, `llama-` | OPENROUTER_API_KEY |

### Known Pricing (per 1M tokens)

| Model | Input | Output |
|-------|-------|--------|
| claude-opus-4-6 | $15.00 | $75.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-haiku-4-5-20251001 | $1.00 | $5.00 |
| gpt-5.2-pro | $21.00 | $168.00 |
| gpt-5.2 | $1.75 | $14.00 |
| gpt-5.1 | $2.00 | $8.00 |
| gemini-2.5-pro | $1.25 | $10.00 |
| gemini-2.5-flash | $0.30 | $2.50 |
| gemini-2.5-flash-lite | $0.10 | $0.40 |
| mistral-large-latest | $2.00 | $6.00 |
| mistral-small-latest | $0.10 | $0.30 |
| deepseek-chat | $0.27 | $1.10 |
| deepseek-reasoner | $0.55 | $2.19 |
| grok-4-1-fast | $5.00 | $25.00 |
| grok-3-beta | $3.00 | $15.00 |
| MiniMax-M2.1 | $0.40 | $1.60 |
| qwen3.5-plus | $0.57 | $2.29 |
| qwen3-max-instruct | $1.60 | $6.40 |
| meta-llama/llama-4-maverick | $0.20 | $0.60 |
| meta-llama/l

_meta.json

{
  "ownerId": "kn77yjs5esft2kgsd6dpz9c92n80dgsy",
  "slug": "prompt-performance-tester",
  "version": "1.1.9",
  "publishedAt": 1772213259522
}

LICENSE.md

# UniAI Skills - Proprietary License

**Version 1.0 | Effective Date: February 2, 2024**

## 1. GRANT OF LICENSE

UniAI ("Licensor") grants you ("Licensee") a limited, non-exclusive, non-transferable, revocable license to use the ClawhHub Skills ("Software") solely in accordance with the terms of this license agreement.

## 2. LICENSE RESTRICTIONS

You may NOT:
- Reverse engineer, decompile, or disassemble the Software
- Modify, alter, or create derivative works of the Software
- Remove, obscure, or alter any proprietary notices or labels on the Software
- Share, distribute, or sublicense the Software to any third party
- Use the Software for commercial purposes without a commercial license
- Access or use the Software beyond the scope of your subscription tier
- Attempt to circumvent licensing controls or API rate limits

## 3. INTELLECTUAL PROPERTY RIGHTS

All intellectual property rights in and to the Software are retained by Licensor. This includes:
- Source code and object code
- Algorithms and methodologies
- Performance optimization techniques
- Quality scoring mechanisms
- Proprietary data structures
- Trade secrets and confidential information

## 4. PERMITTED USES

You may only:
- Use the Software as provided through the ClawhHub platform
- Access features available in your subscription tier
- Create test results and reports for internal use
- Share results with your team (if on a team plan)
- Provide feedback to improve the Software

## 5. SUBSCRIPTION TIERS

### Starter (Free)
- 5 tests per month
- 2 models per test
- Basic features
- Personal use only

### Professional ($29/month)
- Unlimited tests
- All models supported
- Advanced analytics
- API access
- Commercial use permitted

### Enterprise ($99/month)
- Team collaboration
- White-label option
- Custom integrations
- Dedicated support
- SLA guarantees

## 6. API KEY AND CREDENTIALS

- You are responsible for keeping your API keys confidential
- Do not share your license key with others
- One license per person/organization
- License keys are non-transferable
- Unauthorized sharing may result in account termination

## 7. DATA PRIVACY

- We do not retain your test data by default
- Free tier: 30-day retention
- Paid tiers: 90-day retention
- You can request data deletion anytime
- See Privacy Policy for full details

## 8. WARRANTY DISCLAIMER

THE SOFTWARE IS PROVIDED "AS-IS" WITHOUT ANY WARRANTIES. LICENSOR DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO:
- Merchantability
- Fitness for a particular purpose
- Non-infringement
- Accuracy of results

## 9. LIMITATION OF LIABILITY

IN NO EVENT SHALL LICENSOR BE LIABLE FOR:
- Any indirect, incidental, special, or consequential damages
- Loss of data, revenue, or profits
- Business interruption
- Even if advised of the possibility of such damages

## 10. TERMINATION

Licensor may terminate your license if you:
- Violate any terms of this agreement
- Fail to pay subscription fees
- Attempt to reverse engine

manifest.yaml

name: "Prompt Performance Tester"
id: "prompt-performance-tester"
version: "1.1.8"
description: "Model-agnostic prompt benchmarking across 9 providers. Pass any model ID from Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, Mistral — provider auto-detected. Measures latency, cost, quality, and consistency."

homepage: "https://unisai.vercel.app"
repository: "https://github.com/vedantsingh60/prompt-performance-tester"
source: "included"

intellectual_property:
  license: "free-to-use"
  license_file: "LICENSE.md"
  copyright: "© 2026 UnisAI. All rights reserved."
  distribution: "via-clawhub-only"
  source_code_access: "included"
  modification: "personal-use-only"
  reverse_engineering: "allowed-for-security-audit"

author:
  company: "UnisAI"
  contact: "hello@unisai.vercel.app"
  website: "https://unisai.vercel.app"

category: "ai-testing"
tags:
  - "prompt-testing"
  - "performance-analysis"
  - "cost-optimization"
  - "multi-llm"
  - "quality-assurance"
  - "benchmarking"
  - "llm-comparison"
  - "ai-testing"

pricing:
  model: "free"

runtime: "local"
execution: "python"

required_env_vars:
  - "ANTHROPIC_API_KEY"   # Required if testing Claude models
  - "OPENAI_API_KEY"      # Required if testing GPT models
  - "GOOGLE_API_KEY"      # Required if testing Gemini models
  - "MISTRAL_API_KEY"     # Required if testing Mistral models
  - "DEEPSEEK_API_KEY"    # Required if testing DeepSeek models
  - "XAI_API_KEY"         # Required if testing Grok/xAI models
  - "MINIMAX_API_KEY"     # Required if testing MiniMax models
  - "DASHSCOPE_API_KEY"   # Required if testing Qwen/Alibaba models
  - "OPENROUTER_API_KEY"  # Required if testing Llama/OpenRouter models
primary_credential: "At least ONE provider API key is required per provider you want to test"

dependencies:
  python: ">=3.9"
  packages:
    - "anthropic>=0.40.0"
    - "openai>=1.60.0"
    - "google-generativeai>=0.8.0"
    - "mistralai>=1.3.0"
  install_all: "pip install anthropic openai google-generativeai mistralai"
  install_selective: |
    pip install anthropic          # Claude
    pip install openai             # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama (OpenAI-compat)
    pip install google-generativeai  # Gemini
    pip install mistralai          # Mistral
  note: "Install only the SDKs for the providers you plan to test. DeepSeek, xAI, MiniMax, Qwen, and Llama all use the openai package with a custom base URL."
  requirements_file: "requirements.txt"

security:
  data_retention: "0 days"
  data_flow: "prompts-sent-to-chosen-ai-providers"
  third_party_data_sharing: |
    WARNING: This skill sends your prompts to whichever AI providers you select for testing.
    Each provider has their own data retention and privacy policies:
    - Anthropic: https://www.anthropic.com/legal/privacy
    - OpenAI: https://openai.com/policies/privacy-policy
    - Google: https://ai.google.dev/gemini-api/terms
    - Mistral: https://mistral.ai/terms/
    - DeepSeek: https://www.deepseek

Editorial read

Docs & README

Docs source

CLAWHUB

Editorial quality

thin

Skill: Prompt Performance Tester - UnisAI Owner: vedantsingh60 Summary: Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations. Tags: Latest:1.1.4, ai-testing:1.0.1, ai-testing multi-provider prompt-optimization cost-analysis llm-benchmarking claude gpt gemini performance-testing api-comparison multi-model:1.1.2, claude-api

Full README

Skill: Prompt Performance Tester - UnisAI

Owner: vedantsingh60

Summary: Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations.

Tags: Latest:1.1.4, ai-testing:1.0.1, ai-testing multi-provider prompt-optimization cost-analysis llm-benchmarking claude gpt gemini performance-testing api-comparison multi-model:1.1.2, claude-api:1.0.1, cost-analysis:1.0.1, latest:1.1.9, llm-benchmarking:1.0.1, openai-api:1.0.1, prompt-optimization:1.0.1

Version history:

v1.1.9 | 2026-02-27T17:27:39.522Z | user

  • Updated provider/model lists to include more example models and the latest pricing.
  • Expanded cost comparison examples to include DeepSeek Chat.
  • Added clarification that unlisted models are supported, but cost is shown as $0.00 with a warning.
  • Improved explanation of quality, cost, and performance metrics for broader clarity.
  • Enhanced recommendation and real-world example sections to better showcase DeepSeek and new models.

v1.1.8 | 2026-02-27T17:27:27.786Z | user

  • Updated provider/model lists to include more example models and the latest pricing.
  • Expanded cost comparison examples to include DeepSeek Chat.
  • Added clarification that unlisted models are supported, but cost is shown as $0.00 with a warning.
  • Improved explanation of quality, cost, and performance metrics for broader clarity.
  • Enhanced recommendation and real-world example sections to better showcase DeepSeek and new models.

v1.1.7 | 2026-02-27T16:50:17.543Z | user

Expanded to support model-agnostic benchmarking across 9 major LLM providers.

  • Now supports any model ID with automatic provider detection based on name prefix—no hardcoded model list.
  • Added support for Anthropic, OpenAI, Google, Mistral, DeepSeek, xAI (Grok), MiniMax, Qwen, and Meta Llama via OpenRouter.
  • Documentation updated with supported model prefixes, API keys, and latest known per-token pricing for each provider.
  • All previous features (latency, cost, quality, consistency, recommendations) remain and apply to all supported models.

v1.1.6 | 2026-02-27T16:49:57.000Z | user

Expanded to support model-agnostic benchmarking across 9 major LLM providers.

  • Now supports any model ID with automatic provider detection based on name prefix—no hardcoded model list.
  • Added support for Anthropic, OpenAI, Google, Mistral, DeepSeek, xAI (Grok), MiniMax, Qwen, and Meta Llama via OpenRouter.
  • Documentation updated with supported model prefixes, API keys, and latest known per-token pricing for each provider.
  • All previous features (latency, cost, quality, consistency, recommendations) remain and apply to all supported models.
  • No code changes in this release; update is documentation-only.

v1.1.5 | 2026-02-16T20:07:53.120Z | user

  • Documentation reformatted with minor cleanups for clarity and consistency.
  • No functional or code changes in this release.
  • All prompts, features, and instructions remain unchanged.

v1.1.4 | 2026-02-02T03:45:04.686Z | user

1.1.4 is a documentation cleanup and simplification release.

  • SKILL.md significantly condensed for clarity and brevity.
  • Streamlined value proposition and example cost calculations.
  • Updated real-world model comparison table to use more recent models and prices.
  • Removed marketing language and focus on data-driven cost/quality analysis.
  • Reorganized use case and getting started sections for easier navigation.
  • No code or logic changes; documentation only.

v1.1.3 | 2026-02-02T03:15:28.352Z | user

  • Expanded support to 10 leading AI models, including latest Claude 4.5, GPT-5.2, and Gemini 3 series.
  • Updated model selection and pricing details to reflect 2026 releases and current rates.
  • Quick Start and plan descriptions now accommodate 10 models per test.
  • Enhanced marketing copy with up-to-date benchmarks and role-based benefits.
  • Real-world example and recommendations remain for user clarity.

v1.1.2 | 2026-02-02T03:03:35.305Z | user

Version 1.1.2

  • No code changes detected; documentation updated only.
  • SKILL.md rewritten for clarity and to emphasize product benefits.
  • Feature descriptions, real-world examples, and pricing explanations improved.
  • Enhanced summary of supported models and use cases.
  • Quick Start, API setup, and output formatting instructions are clearer and more actionable.

v1.1.1 | 2026-02-02T02:55:31.272Z | user

  • Major upgrade: Now supports prompt testing across Claude, OpenAI GPT, and Google Gemini, not just Claude.
  • Added cross-provider cost and quality comparison for 9 different LLM models.
  • New reporting shows latency, cost, and quality side-by-side for Anthropic, OpenAI, and Google models.
  • Recommendations now include fastest, cheapest, and best quality models across all providers.
  • Expanded use cases and provider-specific examples in documentation.

v1.1.0 | 2026-02-02T02:52:52.200Z | user

Version "1.1.0": - "✨ Multi-provider support: Claude, GPT, and Gemini" - "✨ 9 LLM models supported across 3 providers" - "✨ Cross-provider cost comparison engine" - "✨ Provider-specific API optimizations" - "✨ Enhanced recommendations with multi-provider insights" - "✨ Rebranded from Prompt Migrator to UniAI" - "🏷️ Updated tags for better discoverability (14 tags)" - "📊 Improved cost calculation accuracy" - "🔧 Added OpenAI and Google API integrations" - "📝 Updated documentation with multi-provider examples"

v1.0.1 | 2026-02-02T02:33:38.204Z | user

Version 1.0.1

  • Removed the file IP_PROTECTION_GUIDE.md.
  • Updated documentation links and support contact information to use unisai.vercel.app addresses.

v1.0.0 | 2026-02-02T02:22:51.962Z | user

Initial release - Multi-model prompt testing across OpenAI, Claude Haiku, Sonnet, and Opus with latency, cost, and quality metrics

Archive index:

Archive v1.1.9: 5 files, 17508 bytes

Files: LICENSE.md (4799b), manifest.yaml (7490b), prompt_performance_tester.py (22862b), SKILL.md (17974b), _meta.json (144b)

File v1.1.9:SKILL.md

Prompt Performance Tester

Model-agnostic prompt benchmarking across 9 providers.

Pass any model ID — provider auto-detected. Compare latency, cost, quality, and consistency across Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, and Mistral.


🚀 Why This Skill?

Problem Statement

Comparing LLM models across providers requires manual testing:

  • No systematic way to measure performance across models
  • Cost differences are significant but not easily comparable
  • Quality varies by use case and provider
  • Manual API testing is time-consuming and error-prone

The Solution

Test prompts across any model from any supported provider simultaneously. Get performance metrics and recommendations based on latency, cost, and quality.

Example Cost Comparison

For 10,000 requests/day with average 28 input + 115 output tokens:

  • Claude Opus 4.6: ~$30.15/day ($903/month)
  • Gemini 2.5 Flash-Lite: ~$0.05/day ($1.50/month)
  • DeepSeek Chat: ~$0.14/day ($4.20/month)
  • Monthly cost difference (Opus vs Flash-Lite): $901.50

✨ What You Get

Model-Agnostic Multi-Provider Testing

Pass any model ID — provider is auto-detected from the model name prefix. No hardcoded list; new models work without code changes.

| Provider | Example Models | Prefix | Required Key | |----------|---------------|--------|--------------| | Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 | claude- | ANTHROPIC_API_KEY | | OpenAI | gpt-5.2-pro, gpt-5.2, gpt-5.1 | gpt-, o1, o3 | OPENAI_API_KEY | | Google | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite | gemini- | GOOGLE_API_KEY | | Mistral | mistral-large-latest, mistral-small-latest | mistral-, mixtral- | MISTRAL_API_KEY | | DeepSeek | deepseek-chat, deepseek-reasoner | deepseek- | DEEPSEEK_API_KEY | | xAI | grok-4-1-fast, grok-3-beta | grok- | XAI_API_KEY | | MiniMax | MiniMax-M2.1 | MiniMax, minimax | MINIMAX_API_KEY | | Qwen | qwen3.5-plus, qwen3-max-instruct | qwen | DASHSCOPE_API_KEY | | Meta Llama | meta-llama/llama-4-maverick, meta-llama/llama-3.3-70b-instruct | meta-llama/, llama- | OPENROUTER_API_KEY |

Known Pricing (per 1M tokens)

| Model | Input | Output | |-------|-------|--------| | claude-opus-4-6 | $15.00 | $75.00 | | claude-sonnet-4-6 | $3.00 | $15.00 | | claude-haiku-4-5-20251001 | $1.00 | $5.00 | | gpt-5.2-pro | $21.00 | $168.00 | | gpt-5.2 | $1.75 | $14.00 | | gpt-5.1 | $2.00 | $8.00 | | gemini-2.5-pro | $1.25 | $10.00 | | gemini-2.5-flash | $0.30 | $2.50 | | gemini-2.5-flash-lite | $0.10 | $0.40 | | mistral-large-latest | $2.00 | $6.00 | | mistral-small-latest | $0.10 | $0.30 | | deepseek-chat | $0.27 | $1.10 | | deepseek-reasoner | $0.55 | $2.19 | | grok-4-1-fast | $5.00 | $25.00 | | grok-3-beta | $3.00 | $15.00 | | MiniMax-M2.1 | $0.40 | $1.60 | | qwen3.5-plus | $0.57 | $2.29 | | qwen3-max-instruct | $1.60 | $6.40 | | meta-llama/llama-4-maverick | $0.20 | $0.60 | | meta-llama/llama-3.3-70b-instruct | $0.59 | $0.79 |

Note: Unlisted models still work — cost calculation returns $0.00 with a warning. Pricing table is for reference only, not a validation gate.

Performance Metrics

Every test measures:

  • Latency — Response time in milliseconds
  • 💰 Cost — Exact API cost per request (input + output tokens)
  • 🎯 Quality — Response quality score (0–100)
  • 📊 Token Usage — Input and output token counts
  • 🔄 Consistency — Variance across multiple test runs
  • Error Tracking — API failures, timeouts, rate limits

Smart Recommendations

Get instant answers to:

  • Which model is fastest for your prompt?
  • Which is most cost-effective?
  • Which produces best quality responses?
  • How much can you save by switching providers?

📊 Real-World Example

PROMPT: "Write a professional customer service response about a delayed shipment"

┌─────────────────────────────────────────────────────────────────┐
│ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE              │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  523ms                                                 │
│ Cost:     $0.000025                                             │
│ Quality:  65/100                                                │
│ Tokens:   28 in / 87 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK                        │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  710ms                                                 │
│ Cost:     $0.000048                                             │
│ Quality:  70/100                                                │
│ Tokens:   28 in / 92 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER             │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  891ms                                                 │
│ Cost:     $0.000145                                             │
│ Quality:  78/100                                                │
│ Tokens:   28 in / 102 out                                       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY                          │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  645ms                                                 │
│ Cost:     $0.000402                                             │
│ Quality:  88/100                                                │
│ Tokens:   28 in / 98 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE OPUS 4.6 (Anthropic) 🏆 HIGHEST QUALITY                 │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  1,234ms                                               │
│ Cost:     $0.001875                                             │
│ Quality:  94/100                                                │
│ Tokens:   28 in / 125 out                                       │
└─────────────────────────────────────────────────────────────────┘

🎯 RECOMMENDATIONS:
1. Most cost-effective: Gemini 2.5 Flash-Lite ($0.000025/request) — 99.98% cheaper than Opus
2. Budget pick: DeepSeek Chat ($0.000048/request) — strong quality at low cost
3. Best quality: Claude Opus 4.6 (94/100) — state-of-the-art reasoning & analysis
4. Smart pick: Claude Haiku 4.5 ($0.000145/request) — 81% cheaper, 83% quality match
5. Speed + Quality: GPT-5.2 ($0.000402/request) — excellent quality at mid-range cost

💡 Potential monthly savings (10,000 requests/day, 28 input + 115 output tokens avg):
   - Using Gemini 2.5 Flash-Lite vs Opus: $903/month saved ($1.44 vs $904.50)
   - Using DeepSeek Chat vs Opus: $899/month saved ($4.50 vs $904.50)
   - Using Claude Haiku vs Opus: $731/month saved ($173.40 vs $904.50)

Use Cases

Production Deployment

  • Evaluate models before production selection
  • Compare cost vs quality tradeoffs
  • Benchmark API latency across providers

Prompt Development

  • Test prompt variations across models
  • Measure quality scores consistently
  • Compare performance metrics

Cost Analysis

  • Analyze LLM API spending by model
  • Compare provider pricing structures
  • Identify cost-efficient alternatives

Performance Testing

  • Measure latency and response times
  • Test consistency across multiple runs
  • Evaluate quality scores

🚀 Quick Start

1. Subscribe to Skill

Click "Subscribe" on ClawhHub to get access.

2. Set API Keys

Add keys for the providers you want to test:

# Anthropic (Claude models)
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenAI (GPT models)
export OPENAI_API_KEY="sk-..."

# Google (Gemini models)
export GOOGLE_API_KEY="AI..."

# DeepSeek
export DEEPSEEK_API_KEY="..."

# xAI (Grok models)
export XAI_API_KEY="..."

# MiniMax
export MINIMAX_API_KEY="..."

# Alibaba (Qwen models)
export DASHSCOPE_API_KEY="..."

# OpenRouter (Meta Llama models)
export OPENROUTER_API_KEY="..."

# Mistral
export MISTRAL_API_KEY="..."

You only need keys for the providers you plan to test.

3. Install Dependencies

# Install only what you need
pip install anthropic          # Claude
pip install openai             # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama
pip install google-generativeai  # Gemini
pip install mistralai          # Mistral

# Or install everything
pip install anthropic openai google-generativeai mistralai

4. Run Your First Test

Option A: Python

import os
from prompt_performance_tester import PromptPerformanceTester

tester = PromptPerformanceTester()  # reads API keys from environment

results = tester.test_prompt(
    prompt_text="Write a professional email apologizing for a delayed shipment",
    models=[
        "claude-haiku-4-5-20251001",
        "gpt-5.2",
        "gemini-2.5-flash",
        "deepseek-chat",
    ],
    num_runs=3,
    max_tokens=500
)

print(tester.format_results(results))
print(f"🏆 Best quality:  {results.best_model}")
print(f"💰 Cheapest:      {results.cheapest_model}")
print(f"⚡ Fastest:       {results.fastest_model}")

Option B: CLI

# Test across multiple models
prompt-tester test "Your prompt here" \
  --models claude-haiku-4-5-20251001 gpt-5.2 gemini-2.5-flash deepseek-chat \
  --runs 3

# Export results
prompt-tester test "Your prompt here" --export results.json

🔒 Security & Privacy

API Key Safety

  • Keys stored in environment variables only — never hardcoded or logged
  • Never transmitted to UnisAI servers
  • HTTPS encryption for all provider API calls

Data Privacy

  • Your prompts are sent only to the AI providers you select for testing
  • Each provider has their own data retention policy (see their privacy pages)
  • No data stored on UnisAI infrastructure

📚 Technical Details

System Requirements

  • Python: 3.9+
  • Dependencies: anthropic, openai, google-generativeai, mistralai (install only what you need)
  • Platform: macOS, Linux, Windows

Architecture

  • Lazy client initialization — SDK clients only loaded for providers actually tested
  • Prefix-based routingPROVIDER_MAP detects provider from model name; no hardcoded whitelist
  • OpenAI-compat path — DeepSeek, xAI, MiniMax, Qwen, and OpenRouter all use the openai SDK with a custom base_url
  • Pricing table — used for cost calculation only; unknown models get cost=0 with a warning

Metrics Collected

Every test captures:

  • Latency: Total response time (ms)
  • Cost: Input + output cost based on known pricing (USD)
  • Quality: Heuristic response score based on length, completeness (0–100)
  • Tokens: Exact input/output token counts per provider
  • Consistency: Standard deviation across multiple runs
  • Errors: Timeouts, rate limits, API failures

❓ Frequently Asked Questions

Q: Do I need API keys for all 9 providers? A: No. You only need keys for the providers you want to test. If you only test Claude models, you only need ANTHROPIC_API_KEY.

Q: Who pays for the API costs? A: You do. You provide your own API keys and pay each provider directly. This skill has no per-request fees.

Q: How accurate are the cost calculations? A: Costs are calculated from the known pricing table using actual token counts. Models not in the pricing table return $0.00 — the model still runs, the cost just won't be shown.

Q: Can I test models not in the pricing table? A: Yes. Any model whose name starts with a supported prefix will run. Cost will show as $0.00 for unlisted models.

Q: Can I test prompts in non-English languages? A: Yes. All supported providers handle multiple languages.

Q: Can I use this in production/CI/CD? A: Yes. Import PromptPerformanceTester directly from Python or call via CLI.

Q: What if my prompt is very long? A: Set max_tokens appropriately. The skill passes your prompt as-is to each provider's API.


🗺️ Roadmap

✅ Current Release (v1.1.8)

  • Model-agnostic architecture — any model ID works via prefix detection
  • 9 providers, 20 known models with pricing
  • DeepSeek, xAI Grok, MiniMax, Qwen, Meta Llama as first-class providers
  • Claude 4.6 series (opus-4-6, sonnet-4-6)
  • Lazy client initialization — only loads SDKs for providers actually used
  • Fixed UnisAI branding throughout

🚧 Coming Soon (v1.3)

  • Batch testing: Test 100+ prompts simultaneously
  • Historical tracking: Track model performance over time
  • Webhook integrations: Slack, Discord, email notifications

🔮 Future (v1.3+)

  • A/B testing framework: Scientific prompt experimentation
  • Fine-tuning insights: Which models to fine-tune for your use case
  • Custom benchmarks: Create your own evaluation criteria
  • Auto-optimization: AI-powered prompt improvement suggestions

📞 Support

  • Email: support@unisai.vercel.app
  • Website: https://unisai.vercel.app
  • Bug Reports: support@unisai.vercel.app

📄 License & Terms

This skill is distributed via ClawhHub under the following terms.

✅ You CAN:

  • Use for your own business and projects
  • Test prompts for internal applications
  • Modify source code for personal use

❌ You CANNOT:

  • Redistribute outside ClawhHub registry
  • Resell or sublicense
  • Use UnisAI trademark without permission

Full Terms: See LICENSE.md


📝 Changelog

[1.1.8] - 2026-02-27

Fixes & Polish

  • Bumped version to 1.1.8
  • SKILL.md fully rewritten — cleaned up formatting, removed stale content
  • Removed old IP watermark reference (PROPRIETARY_SKILL_VEDANT_2024) from docs
  • Corrected watermark to PROPRIETARY_SKILL_UNISAI_2026_MULTI_PROVIDER throughout
  • Fixed all UnisAI branding (was UniAI in v1.1.0 changelog)
  • Updated pricing table to include all 20 known models
  • Cleaned up FAQ, Quick Start, and Use Cases sections

[1.1.6] - 2026-02-27

🏗️ Model-Agnostic Architecture

  • Provider auto-detected from model name prefix — no hardcoded whitelist
  • Any new model works automatically without code changes
  • Added DeepSeek, xAI Grok, MiniMax, Qwen, Meta Llama as first-class providers (9 total)
  • Updated Claude to 4.6 series (claude-opus-4-6, claude-sonnet-4-6)
  • Lazy client initialization — only loads SDKs for providers actually tested
  • Unified OpenAI-compat path for DeepSeek, xAI, MiniMax, Qwen, OpenRouter

[1.1.5] - 2026-02-01

🚀 Latest Models Update

  • GPT-5.2 Series — Added Instant, Thinking, and Pro variants
  • Gemini 2.5 Series — Updated to 2.5 Pro, Flash, and Flash-Lite
  • Claude 4.5 pricing updates
  • 10 total models across 3 providers

[1.1.0] - 2026-01-15

✨ Major Features

  • Multi-provider support — Claude, GPT, Gemini
  • Cross-provider cost comparison
  • Enhanced recommendations engine
  • Rebranded to UnisAI

[1.0.0] - 2024-02-02

Initial Release

  • Claude-only prompt testing (Haiku, Sonnet, Opus)
  • Performance metrics: latency, cost, quality, consistency
  • Basic recommendations engine

Last Updated: February 27, 2026 Current Version: 1.1.8 Status: Active & Maintained

© 2026 UnisAI. All rights reserved.

File v1.1.9:_meta.json

{ "ownerId": "kn77yjs5esft2kgsd6dpz9c92n80dgsy", "slug": "prompt-performance-tester", "version": "1.1.9", "publishedAt": 1772213259522 }

File v1.1.9:LICENSE.md

UniAI Skills - Proprietary License

Version 1.0 | Effective Date: February 2, 2024

1. GRANT OF LICENSE

UniAI ("Licensor") grants you ("Licensee") a limited, non-exclusive, non-transferable, revocable license to use the ClawhHub Skills ("Software") solely in accordance with the terms of this license agreement.

2. LICENSE RESTRICTIONS

You may NOT:

  • Reverse engineer, decompile, or disassemble the Software
  • Modify, alter, or create derivative works of the Software
  • Remove, obscure, or alter any proprietary notices or labels on the Software
  • Share, distribute, or sublicense the Software to any third party
  • Use the Software for commercial purposes without a commercial license
  • Access or use the Software beyond the scope of your subscription tier
  • Attempt to circumvent licensing controls or API rate limits

3. INTELLECTUAL PROPERTY RIGHTS

All intellectual property rights in and to the Software are retained by Licensor. This includes:

  • Source code and object code
  • Algorithms and methodologies
  • Performance optimization techniques
  • Quality scoring mechanisms
  • Proprietary data structures
  • Trade secrets and confidential information

4. PERMITTED USES

You may only:

  • Use the Software as provided through the ClawhHub platform
  • Access features available in your subscription tier
  • Create test results and reports for internal use
  • Share results with your team (if on a team plan)
  • Provide feedback to improve the Software

5. SUBSCRIPTION TIERS

Starter (Free)

  • 5 tests per month
  • 2 models per test
  • Basic features
  • Personal use only

Professional ($29/month)

  • Unlimited tests
  • All models supported
  • Advanced analytics
  • API access
  • Commercial use permitted

Enterprise ($99/month)

  • Team collaboration
  • White-label option
  • Custom integrations
  • Dedicated support
  • SLA guarantees

6. API KEY AND CREDENTIALS

  • You are responsible for keeping your API keys confidential
  • Do not share your license key with others
  • One license per person/organization
  • License keys are non-transferable
  • Unauthorized sharing may result in account termination

7. DATA PRIVACY

  • We do not retain your test data by default
  • Free tier: 30-day retention
  • Paid tiers: 90-day retention
  • You can request data deletion anytime
  • See Privacy Policy for full details

8. WARRANTY DISCLAIMER

THE SOFTWARE IS PROVIDED "AS-IS" WITHOUT ANY WARRANTIES. LICENSOR DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO:

  • Merchantability
  • Fitness for a particular purpose
  • Non-infringement
  • Accuracy of results

9. LIMITATION OF LIABILITY

IN NO EVENT SHALL LICENSOR BE LIABLE FOR:

  • Any indirect, incidental, special, or consequential damages
  • Loss of data, revenue, or profits
  • Business interruption
  • Even if advised of the possibility of such damages

10. TERMINATION

Licensor may terminate your license if you:

  • Violate any terms of this agreement
  • Fail to pay subscription fees
  • Attempt to reverse engineer the Software
  • Share your license key with others
  • Use the Software unlawfully

Upon termination:

  • Your access to the Software is immediately revoked
  • You must destroy all copies of the Software in your possession
  • Any refunds are subject to our refund policy

11. COMMERCIAL LICENSE

To use the Software for commercial purposes:

  • Starter tier: Personal use only
  • Professional tier: Commercial use permitted
  • Enterprise tier: Team commercial use permitted

For commercial use with Starter tier, contact: hello@unisai.vercel.app

12. THIRD-PARTY SERVICES

The Software uses third-party services (e.g., Anthropic API). Your use is also subject to their terms of service:

  • Anthropic: https://www.anthropic.com/terms
  • OpenAI: https://openai.com/terms (if applicable)

13. MODIFICATIONS TO SOFTWARE

Licensor reserves the right to:

  • Update the Software at any time
  • Add or remove features
  • Change pricing (with 30 days notice)
  • Discontinue the Software (with 60 days notice)

14. COMPLIANCE

You agree to comply with all applicable laws and regulations in your jurisdiction when using the Software.

15. DISPUTE RESOLUTION

Any disputes arising from this agreement shall be:

  • Resolved through binding arbitration
  • Governed by California law
  • Conducted in English

16. ENTIRE AGREEMENT

This agreement, along with our Privacy Policy and Terms of Service, constitutes the entire agreement between you and Licensor regarding the Software.

17. CONTACT

For licensing inquiries or support:

  • Email: hello@unisai.vercel.app
  • Website: https://unisai.vercel.app
  • Support: vedxnts@gmail.com
  • X: vedxnts

By using the Software, you acknowledge that you have read, understood, and agree to be bound by this License Agreement.

© 2026 UniAI. All rights reserved.

File v1.1.9:manifest.yaml

name: "Prompt Performance Tester" id: "prompt-performance-tester" version: "1.1.8" description: "Model-agnostic prompt benchmarking across 9 providers. Pass any model ID from Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, Mistral — provider auto-detected. Measures latency, cost, quality, and consistency."

homepage: "https://unisai.vercel.app" repository: "https://github.com/vedantsingh60/prompt-performance-tester" source: "included"

intellectual_property: license: "free-to-use" license_file: "LICENSE.md" copyright: "© 2026 UnisAI. All rights reserved." distribution: "via-clawhub-only" source_code_access: "included" modification: "personal-use-only" reverse_engineering: "allowed-for-security-audit"

author: company: "UnisAI" contact: "hello@unisai.vercel.app" website: "https://unisai.vercel.app"

category: "ai-testing" tags:

  • "prompt-testing"
  • "performance-analysis"
  • "cost-optimization"
  • "multi-llm"
  • "quality-assurance"
  • "benchmarking"
  • "llm-comparison"
  • "ai-testing"

pricing: model: "free"

runtime: "local" execution: "python"

required_env_vars:

  • "ANTHROPIC_API_KEY" # Required if testing Claude models
  • "OPENAI_API_KEY" # Required if testing GPT models
  • "GOOGLE_API_KEY" # Required if testing Gemini models
  • "MISTRAL_API_KEY" # Required if testing Mistral models
  • "DEEPSEEK_API_KEY" # Required if testing DeepSeek models
  • "XAI_API_KEY" # Required if testing Grok/xAI models
  • "MINIMAX_API_KEY" # Required if testing MiniMax models
  • "DASHSCOPE_API_KEY" # Required if testing Qwen/Alibaba models
  • "OPENROUTER_API_KEY" # Required if testing Llama/OpenRouter models primary_credential: "At least ONE provider API key is required per provider you want to test"

dependencies: python: ">=3.9" packages: - "anthropic>=0.40.0" - "openai>=1.60.0" - "google-generativeai>=0.8.0" - "mistralai>=1.3.0" install_all: "pip install anthropic openai google-generativeai mistralai" install_selective: | pip install anthropic # Claude pip install openai # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama (OpenAI-compat) pip install google-generativeai # Gemini pip install mistralai # Mistral note: "Install only the SDKs for the providers you plan to test. DeepSeek, xAI, MiniMax, Qwen, and Llama all use the openai package with a custom base URL." requirements_file: "requirements.txt"

security: data_retention: "0 days" data_flow: "prompts-sent-to-chosen-ai-providers" third_party_data_sharing: | WARNING: This skill sends your prompts to whichever AI providers you select for testing. Each provider has their own data retention and privacy policies: - Anthropic: https://www.anthropic.com/legal/privacy - OpenAI: https://openai.com/policies/privacy-policy - Google: https://ai.google.dev/gemini-api/terms - Mistral: https://mistral.ai/terms/ - DeepSeek: https://www.deepseek.com/privacy_policy - xAI: https://x.ai/privacy - OpenRouter: https://openrouter.ai/privacy api_key_storage: "Environment variables only — never hardcoded or logged" network_access: "Required to call chosen AI provider APIs"

capabilities: functions: - name: "testPrompt" description: "Test a prompt across multiple LLM models and providers" parameters: prompt_text: type: "string" description: "The prompt to benchmark" required: true models: type: "array" description: "List of model IDs to test — any model matching a supported prefix works" items: type: "string" examples: - "claude-sonnet-4-6" - "gpt-5.2" - "deepseek-chat" - "grok-4-1-fast" - "gemini-2.5-flash" required: false num_runs: type: "number" description: "Number of runs per model for consistency testing" default: 1 range: [1, 10] system_prompt: type: "string" description: "Optional system prompt" max_tokens: type: "number" description: "Maximum response tokens" default: 1000 range: [100, 4000]

environment_variables: ANTHROPIC_API_KEY: description: "Anthropic API key — required for any claude-* model" required_for_prefix: "claude-" OPENAI_API_KEY: description: "OpenAI API key — required for any gpt-, o1, o3* model" required_for_prefix: "gpt-, o1, o3" GOOGLE_API_KEY: description: "Google AI API key — required for any gemini-* model" required_for_prefix: "gemini-" MISTRAL_API_KEY: description: "Mistral API key — required for mistral-, mixtral- models" required_for_prefix: "mistral-, mixtral-" DEEPSEEK_API_KEY: description: "DeepSeek API key — required for any deepseek-* model" required_for_prefix: "deepseek-" XAI_API_KEY: description: "xAI API key — required for any grok-* model" required_for_prefix: "grok-" MINIMAX_API_KEY: description: "MiniMax API key — required for minimax* or MiniMax* models" required_for_prefix: "minimax, MiniMax" DASHSCOPE_API_KEY: description: "Alibaba DashScope API key — required for any qwen* model" required_for_prefix: "qwen" OPENROUTER_API_KEY: description: "OpenRouter API key — required for meta-llama/* or llama-* models" required_for_prefix: "meta-llama/, llama-"

support: support_email: "support@unisai.vercel.app" website: "https://unisai.vercel.app" github: "https://github.com/vedantsingh60/prompt-performance-tester" documentation: "See SKILL.md in this package" response_time: "Best effort — community supported"

restrictions:

  • "No redistribution outside ClawhHub registry"
  • "No resale or sublicensing"
  • "No trademark usage without permission"
  • "Modifications allowed for personal use only"

changelog: "1.1.8": - "🏗️ Model-agnostic architecture — provider auto-detected from model name prefix, no hardcoded whitelist" - "✨ Added DeepSeek, xAI Grok, MiniMax, Qwen as first-class providers (9 total)" - "✨ Updated Claude to 4.6 series (claude-opus-4-6, claude-sonnet-4-6)" - "✨ Any future model works automatically without code changes" - "🔧 Lazy client initialization — only loads SDKs for providers actually used" - "🔧 Unified OpenAI-compat path for DeepSeek, xAI, MiniMax, Qwen, OpenRouter" - "📝 Fixed UnisAI branding (was UniAI)" - "💰 Updated pricing table with 20 models across 9 providers" "1.1.5": - "🚀 Updated to latest 2026 models" - "✨ GPT-5.2 series (Instant, Thinking, Pro)" - "✨ Gemini 3 Pro and 2.5 series" - "✨ Claude 4.5 pricing updates" - "✨ 10 total models across 3 providers" "1.1.0": - "✨ Multi-provider support (Claude, GPT, Gemini)" - "✨ Cross-provider cost comparison" - "✨ Enhanced recommendations engine" "1.0.0": - "Initial release with Claude-only support" - "Performance metrics: latency, cost, quality, consistency"

metadata: status: "active" created_at: "2024-02-02T00:00:00Z" updated_at: "2026-02-27T00:00:00Z" maturity: "production" maintenance: "actively-maintained" compatibility: - "OpenClaw v1.0+" - "Claude Code" - "ClawhHub v2.0+" security_audit: "Source code included for security review and transparency"

Archive v1.1.8: 5 files, 17509 bytes

Files: LICENSE.md (4799b), manifest.yaml (7490b), prompt_performance_tester.py (22862b), SKILL.md (17974b), _meta.json (144b)

File v1.1.8:SKILL.md

Prompt Performance Tester

Model-agnostic prompt benchmarking across 9 providers.

Pass any model ID — provider auto-detected. Compare latency, cost, quality, and consistency across Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, and Mistral.


🚀 Why This Skill?

Problem Statement

Comparing LLM models across providers requires manual testing:

  • No systematic way to measure performance across models
  • Cost differences are significant but not easily comparable
  • Quality varies by use case and provider
  • Manual API testing is time-consuming and error-prone

The Solution

Test prompts across any model from any supported provider simultaneously. Get performance metrics and recommendations based on latency, cost, and quality.

Example Cost Comparison

For 10,000 requests/day with average 28 input + 115 output tokens:

  • Claude Opus 4.6: ~$30.15/day ($903/month)
  • Gemini 2.5 Flash-Lite: ~$0.05/day ($1.50/month)
  • DeepSeek Chat: ~$0.14/day ($4.20/month)
  • Monthly cost difference (Opus vs Flash-Lite): $901.50

✨ What You Get

Model-Agnostic Multi-Provider Testing

Pass any model ID — provider is auto-detected from the model name prefix. No hardcoded list; new models work without code changes.

| Provider | Example Models | Prefix | Required Key | |----------|---------------|--------|--------------| | Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 | claude- | ANTHROPIC_API_KEY | | OpenAI | gpt-5.2-pro, gpt-5.2, gpt-5.1 | gpt-, o1, o3 | OPENAI_API_KEY | | Google | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite | gemini- | GOOGLE_API_KEY | | Mistral | mistral-large-latest, mistral-small-latest | mistral-, mixtral- | MISTRAL_API_KEY | | DeepSeek | deepseek-chat, deepseek-reasoner | deepseek- | DEEPSEEK_API_KEY | | xAI | grok-4-1-fast, grok-3-beta | grok- | XAI_API_KEY | | MiniMax | MiniMax-M2.1 | MiniMax, minimax | MINIMAX_API_KEY | | Qwen | qwen3.5-plus, qwen3-max-instruct | qwen | DASHSCOPE_API_KEY | | Meta Llama | meta-llama/llama-4-maverick, meta-llama/llama-3.3-70b-instruct | meta-llama/, llama- | OPENROUTER_API_KEY |

Known Pricing (per 1M tokens)

| Model | Input | Output | |-------|-------|--------| | claude-opus-4-6 | $15.00 | $75.00 | | claude-sonnet-4-6 | $3.00 | $15.00 | | claude-haiku-4-5-20251001 | $1.00 | $5.00 | | gpt-5.2-pro | $21.00 | $168.00 | | gpt-5.2 | $1.75 | $14.00 | | gpt-5.1 | $2.00 | $8.00 | | gemini-2.5-pro | $1.25 | $10.00 | | gemini-2.5-flash | $0.30 | $2.50 | | gemini-2.5-flash-lite | $0.10 | $0.40 | | mistral-large-latest | $2.00 | $6.00 | | mistral-small-latest | $0.10 | $0.30 | | deepseek-chat | $0.27 | $1.10 | | deepseek-reasoner | $0.55 | $2.19 | | grok-4-1-fast | $5.00 | $25.00 | | grok-3-beta | $3.00 | $15.00 | | MiniMax-M2.1 | $0.40 | $1.60 | | qwen3.5-plus | $0.57 | $2.29 | | qwen3-max-instruct | $1.60 | $6.40 | | meta-llama/llama-4-maverick | $0.20 | $0.60 | | meta-llama/llama-3.3-70b-instruct | $0.59 | $0.79 |

Note: Unlisted models still work — cost calculation returns $0.00 with a warning. Pricing table is for reference only, not a validation gate.

Performance Metrics

Every test measures:

  • Latency — Response time in milliseconds
  • 💰 Cost — Exact API cost per request (input + output tokens)
  • 🎯 Quality — Response quality score (0–100)
  • 📊 Token Usage — Input and output token counts
  • 🔄 Consistency — Variance across multiple test runs
  • Error Tracking — API failures, timeouts, rate limits

Smart Recommendations

Get instant answers to:

  • Which model is fastest for your prompt?
  • Which is most cost-effective?
  • Which produces best quality responses?
  • How much can you save by switching providers?

📊 Real-World Example

PROMPT: "Write a professional customer service response about a delayed shipment"

┌─────────────────────────────────────────────────────────────────┐
│ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE              │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  523ms                                                 │
│ Cost:     $0.000025                                             │
│ Quality:  65/100                                                │
│ Tokens:   28 in / 87 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK                        │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  710ms                                                 │
│ Cost:     $0.000048                                             │
│ Quality:  70/100                                                │
│ Tokens:   28 in / 92 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER             │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  891ms                                                 │
│ Cost:     $0.000145                                             │
│ Quality:  78/100                                                │
│ Tokens:   28 in / 102 out                                       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY                          │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  645ms                                                 │
│ Cost:     $0.000402                                             │
│ Quality:  88/100                                                │
│ Tokens:   28 in / 98 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE OPUS 4.6 (Anthropic) 🏆 HIGHEST QUALITY                 │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  1,234ms                                               │
│ Cost:     $0.001875                                             │
│ Quality:  94/100                                                │
│ Tokens:   28 in / 125 out                                       │
└─────────────────────────────────────────────────────────────────┘

🎯 RECOMMENDATIONS:
1. Most cost-effective: Gemini 2.5 Flash-Lite ($0.000025/request) — 99.98% cheaper than Opus
2. Budget pick: DeepSeek Chat ($0.000048/request) — strong quality at low cost
3. Best quality: Claude Opus 4.6 (94/100) — state-of-the-art reasoning & analysis
4. Smart pick: Claude Haiku 4.5 ($0.000145/request) — 81% cheaper, 83% quality match
5. Speed + Quality: GPT-5.2 ($0.000402/request) — excellent quality at mid-range cost

💡 Potential monthly savings (10,000 requests/day, 28 input + 115 output tokens avg):
   - Using Gemini 2.5 Flash-Lite vs Opus: $903/month saved ($1.44 vs $904.50)
   - Using DeepSeek Chat vs Opus: $899/month saved ($4.50 vs $904.50)
   - Using Claude Haiku vs Opus: $731/month saved ($173.40 vs $904.50)

Use Cases

Production Deployment

  • Evaluate models before production selection
  • Compare cost vs quality tradeoffs
  • Benchmark API latency across providers

Prompt Development

  • Test prompt variations across models
  • Measure quality scores consistently
  • Compare performance metrics

Cost Analysis

  • Analyze LLM API spending by model
  • Compare provider pricing structures
  • Identify cost-efficient alternatives

Performance Testing

  • Measure latency and response times
  • Test consistency across multiple runs
  • Evaluate quality scores

🚀 Quick Start

1. Subscribe to Skill

Click "Subscribe" on ClawhHub to get access.

2. Set API Keys

Add keys for the providers you want to test:

# Anthropic (Claude models)
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenAI (GPT models)
export OPENAI_API_KEY="sk-..."

# Google (Gemini models)
export GOOGLE_API_KEY="AI..."

# DeepSeek
export DEEPSEEK_API_KEY="..."

# xAI (Grok models)
export XAI_API_KEY="..."

# MiniMax
export MINIMAX_API_KEY="..."

# Alibaba (Qwen models)
export DASHSCOPE_API_KEY="..."

# OpenRouter (Meta Llama models)
export OPENROUTER_API_KEY="..."

# Mistral
export MISTRAL_API_KEY="..."

You only need keys for the providers you plan to test.

3. Install Dependencies

# Install only what you need
pip install anthropic          # Claude
pip install openai             # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama
pip install google-generativeai  # Gemini
pip install mistralai          # Mistral

# Or install everything
pip install anthropic openai google-generativeai mistralai

4. Run Your First Test

Option A: Python

import os
from prompt_performance_tester import PromptPerformanceTester

tester = PromptPerformanceTester()  # reads API keys from environment

results = tester.test_prompt(
    prompt_text="Write a professional email apologizing for a delayed shipment",
    models=[
        "claude-haiku-4-5-20251001",
        "gpt-5.2",
        "gemini-2.5-flash",
        "deepseek-chat",
    ],
    num_runs=3,
    max_tokens=500
)

print(tester.format_results(results))
print(f"🏆 Best quality:  {results.best_model}")
print(f"💰 Cheapest:      {results.cheapest_model}")
print(f"⚡ Fastest:       {results.fastest_model}")

Option B: CLI

# Test across multiple models
prompt-tester test "Your prompt here" \
  --models claude-haiku-4-5-20251001 gpt-5.2 gemini-2.5-flash deepseek-chat \
  --runs 3

# Export results
prompt-tester test "Your prompt here" --export results.json

🔒 Security & Privacy

API Key Safety

  • Keys stored in environment variables only — never hardcoded or logged
  • Never transmitted to UnisAI servers
  • HTTPS encryption for all provider API calls

Data Privacy

  • Your prompts are sent only to the AI providers you select for testing
  • Each provider has their own data retention policy (see their privacy pages)
  • No data stored on UnisAI infrastructure

📚 Technical Details

System Requirements

  • Python: 3.9+
  • Dependencies: anthropic, openai, google-generativeai, mistralai (install only what you need)
  • Platform: macOS, Linux, Windows

Architecture

  • Lazy client initialization — SDK clients only loaded for providers actually tested
  • Prefix-based routingPROVIDER_MAP detects provider from model name; no hardcoded whitelist
  • OpenAI-compat path — DeepSeek, xAI, MiniMax, Qwen, and OpenRouter all use the openai SDK with a custom base_url
  • Pricing table — used for cost calculation only; unknown models get cost=0 with a warning

Metrics Collected

Every test captures:

  • Latency: Total response time (ms)
  • Cost: Input + output cost based on known pricing (USD)
  • Quality: Heuristic response score based on length, completeness (0–100)
  • Tokens: Exact input/output token counts per provider
  • Consistency: Standard deviation across multiple runs
  • Errors: Timeouts, rate limits, API failures

❓ Frequently Asked Questions

Q: Do I need API keys for all 9 providers? A: No. You only need keys for the providers you want to test. If you only test Claude models, you only need ANTHROPIC_API_KEY.

Q: Who pays for the API costs? A: You do. You provide your own API keys and pay each provider directly. This skill has no per-request fees.

Q: How accurate are the cost calculations? A: Costs are calculated from the known pricing table using actual token counts. Models not in the pricing table return $0.00 — the model still runs, the cost just won't be shown.

Q: Can I test models not in the pricing table? A: Yes. Any model whose name starts with a supported prefix will run. Cost will show as $0.00 for unlisted models.

Q: Can I test prompts in non-English languages? A: Yes. All supported providers handle multiple languages.

Q: Can I use this in production/CI/CD? A: Yes. Import PromptPerformanceTester directly from Python or call via CLI.

Q: What if my prompt is very long? A: Set max_tokens appropriately. The skill passes your prompt as-is to each provider's API.


🗺️ Roadmap

✅ Current Release (v1.1.8)

  • Model-agnostic architecture — any model ID works via prefix detection
  • 9 providers, 20 known models with pricing
  • DeepSeek, xAI Grok, MiniMax, Qwen, Meta Llama as first-class providers
  • Claude 4.6 series (opus-4-6, sonnet-4-6)
  • Lazy client initialization — only loads SDKs for providers actually used
  • Fixed UnisAI branding throughout

🚧 Coming Soon (v1.3)

  • Batch testing: Test 100+ prompts simultaneously
  • Historical tracking: Track model performance over time
  • Webhook integrations: Slack, Discord, email notifications

🔮 Future (v1.3+)

  • A/B testing framework: Scientific prompt experimentation
  • Fine-tuning insights: Which models to fine-tune for your use case
  • Custom benchmarks: Create your own evaluation criteria
  • Auto-optimization: AI-powered prompt improvement suggestions

📞 Support

  • Email: support@unisai.vercel.app
  • Website: https://unisai.vercel.app
  • Bug Reports: support@unisai.vercel.app

📄 License & Terms

This skill is distributed via ClawhHub under the following terms.

✅ You CAN:

  • Use for your own business and projects
  • Test prompts for internal applications
  • Modify source code for personal use

❌ You CANNOT:

  • Redistribute outside ClawhHub registry
  • Resell or sublicense
  • Use UnisAI trademark without permission

Full Terms: See LICENSE.md


📝 Changelog

[1.1.8] - 2026-02-27

Fixes & Polish

  • Bumped version to 1.1.8
  • SKILL.md fully rewritten — cleaned up formatting, removed stale content
  • Removed old IP watermark reference (PROPRIETARY_SKILL_VEDANT_2024) from docs
  • Corrected watermark to PROPRIETARY_SKILL_UNISAI_2026_MULTI_PROVIDER throughout
  • Fixed all UnisAI branding (was UniAI in v1.1.0 changelog)
  • Updated pricing table to include all 20 known models
  • Cleaned up FAQ, Quick Start, and Use Cases sections

[1.1.6] - 2026-02-27

🏗️ Model-Agnostic Architecture

  • Provider auto-detected from model name prefix — no hardcoded whitelist
  • Any new model works automatically without code changes
  • Added DeepSeek, xAI Grok, MiniMax, Qwen, Meta Llama as first-class providers (9 total)
  • Updated Claude to 4.6 series (claude-opus-4-6, claude-sonnet-4-6)
  • Lazy client initialization — only loads SDKs for providers actually tested
  • Unified OpenAI-compat path for DeepSeek, xAI, MiniMax, Qwen, OpenRouter

[1.1.5] - 2026-02-01

🚀 Latest Models Update

  • GPT-5.2 Series — Added Instant, Thinking, and Pro variants
  • Gemini 2.5 Series — Updated to 2.5 Pro, Flash, and Flash-Lite
  • Claude 4.5 pricing updates
  • 10 total models across 3 providers

[1.1.0] - 2026-01-15

✨ Major Features

  • Multi-provider support — Claude, GPT, Gemini
  • Cross-provider cost comparison
  • Enhanced recommendations engine
  • Rebranded to UnisAI

[1.0.0] - 2024-02-02

Initial Release

  • Claude-only prompt testing (Haiku, Sonnet, Opus)
  • Performance metrics: latency, cost, quality, consistency
  • Basic recommendations engine

Last Updated: February 27, 2026 Current Version: 1.1.8 Status: Active & Maintained

© 2026 UnisAI. All rights reserved.

File v1.1.8:_meta.json

{ "ownerId": "kn77yjs5esft2kgsd6dpz9c92n80dgsy", "slug": "prompt-performance-tester", "version": "1.1.8", "publishedAt": 1772213247786 }

File v1.1.8:LICENSE.md

UniAI Skills - Proprietary License

Version 1.0 | Effective Date: February 2, 2024

1. GRANT OF LICENSE

UniAI ("Licensor") grants you ("Licensee") a limited, non-exclusive, non-transferable, revocable license to use the ClawhHub Skills ("Software") solely in accordance with the terms of this license agreement.

2. LICENSE RESTRICTIONS

You may NOT:

  • Reverse engineer, decompile, or disassemble the Software
  • Modify, alter, or create derivative works of the Software
  • Remove, obscure, or alter any proprietary notices or labels on the Software
  • Share, distribute, or sublicense the Software to any third party
  • Use the Software for commercial purposes without a commercial license
  • Access or use the Software beyond the scope of your subscription tier
  • Attempt to circumvent licensing controls or API rate limits

3. INTELLECTUAL PROPERTY RIGHTS

All intellectual property rights in and to the Software are retained by Licensor. This includes:

  • Source code and object code
  • Algorithms and methodologies
  • Performance optimization techniques
  • Quality scoring mechanisms
  • Proprietary data structures
  • Trade secrets and confidential information

4. PERMITTED USES

You may only:

  • Use the Software as provided through the ClawhHub platform
  • Access features available in your subscription tier
  • Create test results and reports for internal use
  • Share results with your team (if on a team plan)
  • Provide feedback to improve the Software

5. SUBSCRIPTION TIERS

Starter (Free)

  • 5 tests per month
  • 2 models per test
  • Basic features
  • Personal use only

Professional ($29/month)

  • Unlimited tests
  • All models supported
  • Advanced analytics
  • API access
  • Commercial use permitted

Enterprise ($99/month)

  • Team collaboration
  • White-label option
  • Custom integrations
  • Dedicated support
  • SLA guarantees

6. API KEY AND CREDENTIALS

  • You are responsible for keeping your API keys confidential
  • Do not share your license key with others
  • One license per person/organization
  • License keys are non-transferable
  • Unauthorized sharing may result in account termination

7. DATA PRIVACY

  • We do not retain your test data by default
  • Free tier: 30-day retention
  • Paid tiers: 90-day retention
  • You can request data deletion anytime
  • See Privacy Policy for full details

8. WARRANTY DISCLAIMER

THE SOFTWARE IS PROVIDED "AS-IS" WITHOUT ANY WARRANTIES. LICENSOR DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO:

  • Merchantability
  • Fitness for a particular purpose
  • Non-infringement
  • Accuracy of results

9. LIMITATION OF LIABILITY

IN NO EVENT SHALL LICENSOR BE LIABLE FOR:

  • Any indirect, incidental, special, or consequential damages
  • Loss of data, revenue, or profits
  • Business interruption
  • Even if advised of the possibility of such damages

10. TERMINATION

Licensor may terminate your license if you:

  • Violate any terms of this agreement
  • Fail to pay subscription fees
  • Attempt to reverse engineer the Software
  • Share your license key with others
  • Use the Software unlawfully

Upon termination:

  • Your access to the Software is immediately revoked
  • You must destroy all copies of the Software in your possession
  • Any refunds are subject to our refund policy

11. COMMERCIAL LICENSE

To use the Software for commercial purposes:

  • Starter tier: Personal use only
  • Professional tier: Commercial use permitted
  • Enterprise tier: Team commercial use permitted

For commercial use with Starter tier, contact: hello@unisai.vercel.app

12. THIRD-PARTY SERVICES

The Software uses third-party services (e.g., Anthropic API). Your use is also subject to their terms of service:

  • Anthropic: https://www.anthropic.com/terms
  • OpenAI: https://openai.com/terms (if applicable)

13. MODIFICATIONS TO SOFTWARE

Licensor reserves the right to:

  • Update the Software at any time
  • Add or remove features
  • Change pricing (with 30 days notice)
  • Discontinue the Software (with 60 days notice)

14. COMPLIANCE

You agree to comply with all applicable laws and regulations in your jurisdiction when using the Software.

15. DISPUTE RESOLUTION

Any disputes arising from this agreement shall be:

  • Resolved through binding arbitration
  • Governed by California law
  • Conducted in English

16. ENTIRE AGREEMENT

This agreement, along with our Privacy Policy and Terms of Service, constitutes the entire agreement between you and Licensor regarding the Software.

17. CONTACT

For licensing inquiries or support:

  • Email: hello@unisai.vercel.app
  • Website: https://unisai.vercel.app
  • Support: vedxnts@gmail.com
  • X: vedxnts

By using the Software, you acknowledge that you have read, understood, and agree to be bound by this License Agreement.

© 2026 UniAI. All rights reserved.

File v1.1.8:manifest.yaml

name: "Prompt Performance Tester" id: "prompt-performance-tester" version: "1.1.8" description: "Model-agnostic prompt benchmarking across 9 providers. Pass any model ID from Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, Mistral — provider auto-detected. Measures latency, cost, quality, and consistency."

homepage: "https://unisai.vercel.app" repository: "https://github.com/vedantsingh60/prompt-performance-tester" source: "included"

intellectual_property: license: "free-to-use" license_file: "LICENSE.md" copyright: "© 2026 UnisAI. All rights reserved." distribution: "via-clawhub-only" source_code_access: "included" modification: "personal-use-only" reverse_engineering: "allowed-for-security-audit"

author: company: "UnisAI" contact: "hello@unisai.vercel.app" website: "https://unisai.vercel.app"

category: "ai-testing" tags:

  • "prompt-testing"
  • "performance-analysis"
  • "cost-optimization"
  • "multi-llm"
  • "quality-assurance"
  • "benchmarking"
  • "llm-comparison"
  • "ai-testing"

pricing: model: "free"

runtime: "local" execution: "python"

required_env_vars:

  • "ANTHROPIC_API_KEY" # Required if testing Claude models
  • "OPENAI_API_KEY" # Required if testing GPT models
  • "GOOGLE_API_KEY" # Required if testing Gemini models
  • "MISTRAL_API_KEY" # Required if testing Mistral models
  • "DEEPSEEK_API_KEY" # Required if testing DeepSeek models
  • "XAI_API_KEY" # Required if testing Grok/xAI models
  • "MINIMAX_API_KEY" # Required if testing MiniMax models
  • "DASHSCOPE_API_KEY" # Required if testing Qwen/Alibaba models
  • "OPENROUTER_API_KEY" # Required if testing Llama/OpenRouter models primary_credential: "At least ONE provider API key is required per provider you want to test"

dependencies: python: ">=3.9" packages: - "anthropic>=0.40.0" - "openai>=1.60.0" - "google-generativeai>=0.8.0" - "mistralai>=1.3.0" install_all: "pip install anthropic openai google-generativeai mistralai" install_selective: | pip install anthropic # Claude pip install openai # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama (OpenAI-compat) pip install google-generativeai # Gemini pip install mistralai # Mistral note: "Install only the SDKs for the providers you plan to test. DeepSeek, xAI, MiniMax, Qwen, and Llama all use the openai package with a custom base URL." requirements_file: "requirements.txt"

security: data_retention: "0 days" data_flow: "prompts-sent-to-chosen-ai-providers" third_party_data_sharing: | WARNING: This skill sends your prompts to whichever AI providers you select for testing. Each provider has their own data retention and privacy policies: - Anthropic: https://www.anthropic.com/legal/privacy - OpenAI: https://openai.com/policies/privacy-policy - Google: https://ai.google.dev/gemini-api/terms - Mistral: https://mistral.ai/terms/ - DeepSeek: https://www.deepseek.com/privacy_policy - xAI: https://x.ai/privacy - OpenRouter: https://openrouter.ai/privacy api_key_storage: "Environment variables only — never hardcoded or logged" network_access: "Required to call chosen AI provider APIs"

capabilities: functions: - name: "testPrompt" description: "Test a prompt across multiple LLM models and providers" parameters: prompt_text: type: "string" description: "The prompt to benchmark" required: true models: type: "array" description: "List of model IDs to test — any model matching a supported prefix works" items: type: "string" examples: - "claude-sonnet-4-6" - "gpt-5.2" - "deepseek-chat" - "grok-4-1-fast" - "gemini-2.5-flash" required: false num_runs: type: "number" description: "Number of runs per model for consistency testing" default: 1 range: [1, 10] system_prompt: type: "string" description: "Optional system prompt" max_tokens: type: "number" description: "Maximum response tokens" default: 1000 range: [100, 4000]

environment_variables: ANTHROPIC_API_KEY: description: "Anthropic API key — required for any claude-* model" required_for_prefix: "claude-" OPENAI_API_KEY: description: "OpenAI API key — required for any gpt-, o1, o3* model" required_for_prefix: "gpt-, o1, o3" GOOGLE_API_KEY: description: "Google AI API key — required for any gemini-* model" required_for_prefix: "gemini-" MISTRAL_API_KEY: description: "Mistral API key — required for mistral-, mixtral- models" required_for_prefix: "mistral-, mixtral-" DEEPSEEK_API_KEY: description: "DeepSeek API key — required for any deepseek-* model" required_for_prefix: "deepseek-" XAI_API_KEY: description: "xAI API key — required for any grok-* model" required_for_prefix: "grok-" MINIMAX_API_KEY: description: "MiniMax API key — required for minimax* or MiniMax* models" required_for_prefix: "minimax, MiniMax" DASHSCOPE_API_KEY: description: "Alibaba DashScope API key — required for any qwen* model" required_for_prefix: "qwen" OPENROUTER_API_KEY: description: "OpenRouter API key — required for meta-llama/* or llama-* models" required_for_prefix: "meta-llama/, llama-"

support: support_email: "support@unisai.vercel.app" website: "https://unisai.vercel.app" github: "https://github.com/vedantsingh60/prompt-performance-tester" documentation: "See SKILL.md in this package" response_time: "Best effort — community supported"

restrictions:

  • "No redistribution outside ClawhHub registry"
  • "No resale or sublicensing"
  • "No trademark usage without permission"
  • "Modifications allowed for personal use only"

changelog: "1.1.8": - "🏗️ Model-agnostic architecture — provider auto-detected from model name prefix, no hardcoded whitelist" - "✨ Added DeepSeek, xAI Grok, MiniMax, Qwen as first-class providers (9 total)" - "✨ Updated Claude to 4.6 series (claude-opus-4-6, claude-sonnet-4-6)" - "✨ Any future model works automatically without code changes" - "🔧 Lazy client initialization — only loads SDKs for providers actually used" - "🔧 Unified OpenAI-compat path for DeepSeek, xAI, MiniMax, Qwen, OpenRouter" - "📝 Fixed UnisAI branding (was UniAI)" - "💰 Updated pricing table with 20 models across 9 providers" "1.1.5": - "🚀 Updated to latest 2026 models" - "✨ GPT-5.2 series (Instant, Thinking, Pro)" - "✨ Gemini 3 Pro and 2.5 series" - "✨ Claude 4.5 pricing updates" - "✨ 10 total models across 3 providers" "1.1.0": - "✨ Multi-provider support (Claude, GPT, Gemini)" - "✨ Cross-provider cost comparison" - "✨ Enhanced recommendations engine" "1.0.0": - "Initial release with Claude-only support" - "Performance metrics: latency, cost, quality, consistency"

metadata: status: "active" created_at: "2024-02-02T00:00:00Z" updated_at: "2026-02-27T00:00:00Z" maturity: "production" maintenance: "actively-maintained" compatibility: - "OpenClaw v1.0+" - "Claude Code" - "ClawhHub v2.0+" security_audit: "Source code included for security review and transparency"

API & Reliability

Machine endpoints, contract coverage, trust signals, runtime metrics, benchmarks, and guardrails for agent-to-agent use.

MissingCLAWHUB

Machine interfaces

Contract & API

Contract coverage

Status

missing

Auth

None

Streaming

No

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.
Invocation examples
curl -s "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/snapshot"
curl -s "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract"
curl -s "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust"

Operational fit

Reliability & Benchmarks

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.
No benchmark suites or observed failure patterns are available.

Machine Appendix

Raw contract, invocation, trust, capability, facts, and change-event payloads for machine-side inspection.

MissingCLAWHUB

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "CLAWHUB",
      "generatedAt": "2026-04-17T02:48:13.304Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}

Facts JSON

[
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Clawhub",
    "href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "traction",
    "category": "adoption",
    "label": "Adoption signal",
    "value": "1.9K downloads",
    "href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "latest_release",
    "category": "release",
    "label": "Latest release",
    "value": "1.1.9",
    "href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceType": "release",
    "confidence": "medium",
    "observedAt": "2026-02-27T17:27:39.522Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "release",
    "title": "Release 1.1.9",
    "description": "- Updated provider/model lists to include more example models and the latest pricing. - Expanded cost comparison examples to include DeepSeek Chat. - Added clarification that unlisted models are supported, but cost is shown as $0.00 with a warning. - Improved explanation of quality, cost, and performance metrics for broader clarity. - Enhanced recommendation and real-world example sections to better showcase DeepSeek and new models.",
    "href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceType": "release",
    "confidence": "medium",
    "observedAt": "2026-02-27T17:27:39.522Z",
    "isPublic": true
  }
]

Sponsored

Ads related to Prompt Performance Tester - UnisAI and adjacent AI workflows.