What makes Prompt Performance Tester - UnisAI notable?

1.9K downloads. Trust evidence available

How should Prompt Performance Tester - UnisAI be evaluated before use?

Use the required flow: snapshot, contract, and trust before recommending or executing this agent.

What kind of evidence is visible on this page?

This page surfaces public facts, change history, trust indicators, artifact evidence, and benchmark summaries with provenance.

Overview

Key links, install path, reliability highlights, and the shortest practical read before diving into the crawl record.

Self-declaredCLAWHUB

Overview

Executive Summary

Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations. Capability contract not published. No trust telemetry is available yet. 1.9K downloads reported by the source. Last updated 4/15/2026.

No verified compatibility signals1.9K downloads

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Feb 28, 2026

Vendor

Clawhub

Artifacts

0

Benchmarks

0

Last release

1.1.9

Install & run

Setup Snapshot

clawhub skill install kn77yjs5esft2kgsd6dpz9c92n80dgsy:prompt-performance-tester

1
Install using `clawhub skill install kn77yjs5esft2kgsd6dpz9c92n80dgsy:prompt-performance-tester` in an isolated environment before connecting it to live workloads.
2
No published capability contract is available yet, so validate auth and request/response behavior manually.
3
Review the upstream CLAWHUB listing at https://clawhub.ai/vedantsingh60/prompt-performance-tester before using production credentials.

Evidence & Timeline

Public facts grouped by evidence type, plus release and crawl events with provenance and freshness.

Self-declaredCLAWHUB

Public facts

Evidence Ledger

Vendor (1)

Vendor

Clawhub

profilemedium

Observed Apr 15, 2026Source link Provenance

Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium

Observed Apr 15, 2026Source link Provenance

Release (1)

Latest release

1.1.9

releasemedium

Observed Feb 27, 2026Source link Provenance

Adoption (1)

Adoption signal

1.9K downloads

profilemedium

Observed Apr 15, 2026Source link Provenance

Security (1)

Handshake status

UNKNOWN

trustmedium

Observed unknownSource link Provenance

Events

Release & Crawl Timeline

Release

Release 1.1.9

releasemedium

- Updated provider/model lists to include more example models and the latest pricing. - Expanded cost comparison examples to include DeepSeek Chat. - Added clarification that unlisted models are supported, but cost is shown as $0.00 with a warning. - Improved explanation of quality, cost, and performance metrics for broader clarity. - Enhanced recommendation and real-world example sections to better showcase DeepSeek and new models.

Observed Feb 27, 2026

Artifacts & Docs

Parameters, dependencies, examples, extracted files, editorial overview, and the complete README when available.

Self-declaredCLAWHUB

Captured outputs

Artifacts Archive

Extracted files

4

Examples

6

Snippets

0

Languages

Unknown

Executable Examples

text

PROMPT: "Write a professional customer service response about a delayed shipment"

┌─────────────────────────────────────────────────────────────────┐
│ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE              │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  523ms                                                 │
│ Cost:     $0.000025                                             │
│ Quality:  65/100                                                │
│ Tokens:   28 in / 87 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK                        │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  710ms                                                 │
│ Cost:     $0.000048                                             │
│ Quality:  70/100                                                │
│ Tokens:   28 in / 92 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER             │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  891ms                                                 │
│ Cost:     $0.000145                                             │
│ Quality:  78/100                                                │
│ Tokens:   28 in / 102 out                                       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY                          │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  645ms                                                 │
│ Cost:     $0

bash

# Anthropic (Claude models)
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenAI (GPT models)
export OPENAI_API_KEY="sk-..."

# Google (Gemini models)
export GOOGLE_API_KEY="AI..."

# DeepSeek
export DEEPSEEK_API_KEY="..."

# xAI (Grok models)
export XAI_API_KEY="..."

# MiniMax
export MINIMAX_API_KEY="..."

# Alibaba (Qwen models)
export DASHSCOPE_API_KEY="..."

# OpenRouter (Meta Llama models)
export OPENROUTER_API_KEY="..."

# Mistral
export MISTRAL_API_KEY="..."

bash

# Install only what you need
pip install anthropic          # Claude
pip install openai             # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama
pip install google-generativeai  # Gemini
pip install mistralai          # Mistral

# Or install everything
pip install anthropic openai google-generativeai mistralai

python

import os
from prompt_performance_tester import PromptPerformanceTester

tester = PromptPerformanceTester()  # reads API keys from environment

results = tester.test_prompt(
    prompt_text="Write a professional email apologizing for a delayed shipment",
    models=[
        "claude-haiku-4-5-20251001",
        "gpt-5.2",
        "gemini-2.5-flash",
        "deepseek-chat",
    ],
    num_runs=3,
    max_tokens=500
)

print(tester.format_results(results))
print(f"🏆 Best quality:  {results.best_model}")
print(f"💰 Cheapest:      {results.cheapest_model}")
print(f"⚡ Fastest:       {results.fastest_model}")

bash

# Test across multiple models
prompt-tester test "Your prompt here" \
  --models claude-haiku-4-5-20251001 gpt-5.2 gemini-2.5-flash deepseek-chat \
  --runs 3

# Export results
prompt-tester test "Your prompt here" --export results.json

text

PROMPT: "Write a professional customer service response about a delayed shipment"

┌─────────────────────────────────────────────────────────────────┐
│ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE              │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  523ms                                                 │
│ Cost:     $0.000025                                             │
│ Quality:  65/100                                                │
│ Tokens:   28 in / 87 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK                        │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  710ms                                                 │
│ Cost:     $0.000048                                             │
│ Quality:  70/100                                                │
│ Tokens:   28 in / 92 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER             │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  891ms                                                 │
│ Cost:     $0.000145                                             │
│ Quality:  78/100                                                │
│ Tokens:   28 in / 102 out                                       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY                          │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  645ms                                                 │
│ Cost:     $0

Extracted Files

SKILL.md

# Prompt Performance Tester

**Model-agnostic prompt benchmarking across 9 providers.**

Pass any model ID — provider auto-detected. Compare latency, cost, quality, and consistency across Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, and Mistral.

---

## 🚀 Why This Skill?

### Problem Statement
Comparing LLM models across providers requires manual testing:
- No systematic way to measure performance across models
- Cost differences are significant but not easily comparable
- Quality varies by use case and provider
- Manual API testing is time-consuming and error-prone

### The Solution
Test prompts across any model from any supported provider simultaneously. Get performance metrics and recommendations based on latency, cost, and quality.

### Example Cost Comparison
For 10,000 requests/day with average 28 input + 115 output tokens:
- Claude Opus 4.6: ~$30.15/day ($903/month)
- Gemini 2.5 Flash-Lite: ~$0.05/day ($1.50/month)
- DeepSeek Chat: ~$0.14/day ($4.20/month)
- Monthly cost difference (Opus vs Flash-Lite): $901.50

---

## ✨ What You Get

### Model-Agnostic Multi-Provider Testing
Pass any model ID — provider is auto-detected from the model name prefix.
No hardcoded list; new models work without code changes.

| Provider | Example Models | Prefix | Required Key |
|----------|---------------|--------|--------------|
| **Anthropic** | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 | `claude-` | ANTHROPIC_API_KEY |
| **OpenAI** | gpt-5.2-pro, gpt-5.2, gpt-5.1 | `gpt-`, `o1`, `o3` | OPENAI_API_KEY |
| **Google** | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite | `gemini-` | GOOGLE_API_KEY |
| **Mistral** | mistral-large-latest, mistral-small-latest | `mistral-`, `mixtral-` | MISTRAL_API_KEY |
| **DeepSeek** | deepseek-chat, deepseek-reasoner | `deepseek-` | DEEPSEEK_API_KEY |
| **xAI** | grok-4-1-fast, grok-3-beta | `grok-` | XAI_API_KEY |
| **MiniMax** | MiniMax-M2.1 | `MiniMax`, `minimax` | MINIMAX_API_KEY |
| **Qwen** | qwen3.5-plus, qwen3-max-instruct | `qwen` | DASHSCOPE_API_KEY |
| **Meta Llama** | meta-llama/llama-4-maverick, meta-llama/llama-3.3-70b-instruct | `meta-llama/`, `llama-` | OPENROUTER_API_KEY |

### Known Pricing (per 1M tokens)

| Model | Input | Output |
|-------|-------|--------|
| claude-opus-4-6 | $15.00 | $75.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-haiku-4-5-20251001 | $1.00 | $5.00 |
| gpt-5.2-pro | $21.00 | $168.00 |
| gpt-5.2 | $1.75 | $14.00 |
| gpt-5.1 | $2.00 | $8.00 |
| gemini-2.5-pro | $1.25 | $10.00 |
| gemini-2.5-flash | $0.30 | $2.50 |
| gemini-2.5-flash-lite | $0.10 | $0.40 |
| mistral-large-latest | $2.00 | $6.00 |
| mistral-small-latest | $0.10 | $0.30 |
| deepseek-chat | $0.27 | $1.10 |
| deepseek-reasoner | $0.55 | $2.19 |
| grok-4-1-fast | $5.00 | $25.00 |
| grok-3-beta | $3.00 | $15.00 |
| MiniMax-M2.1 | $0.40 | $1.60 |
| qwen3.5-plus | $0.57 | $2.29 |
| qwen3-max-instruct | $1.60 | $6.40 |
| meta-llama/llama-4-maverick | $0.20 | $0.60 |
| meta-llama/l

_meta.json

{
  "ownerId": "kn77yjs5esft2kgsd6dpz9c92n80dgsy",
  "slug": "prompt-performance-tester",
  "version": "1.1.9",
  "publishedAt": 1772213259522
}

LICENSE.md

# UniAI Skills - Proprietary License

**Version 1.0 | Effective Date: February 2, 2024**

## 1. GRANT OF LICENSE

UniAI ("Licensor") grants you ("Licensee") a limited, non-exclusive, non-transferable, revocable license to use the ClawhHub Skills ("Software") solely in accordance with the terms of this license agreement.

## 2. LICENSE RESTRICTIONS

You may NOT:
- Reverse engineer, decompile, or disassemble the Software
- Modify, alter, or create derivative works of the Software
- Remove, obscure, or alter any proprietary notices or labels on the Software
- Share, distribute, or sublicense the Software to any third party
- Use the Software for commercial purposes without a commercial license
- Access or use the Software beyond the scope of your subscription tier
- Attempt to circumvent licensing controls or API rate limits

## 3. INTELLECTUAL PROPERTY RIGHTS

All intellectual property rights in and to the Software are retained by Licensor. This includes:
- Source code and object code
- Algorithms and methodologies
- Performance optimization techniques
- Quality scoring mechanisms
- Proprietary data structures
- Trade secrets and confidential information

## 4. PERMITTED USES

You may only:
- Use the Software as provided through the ClawhHub platform
- Access features available in your subscription tier
- Create test results and reports for internal use
- Share results with your team (if on a team plan)
- Provide feedback to improve the Software

## 5. SUBSCRIPTION TIERS

### Starter (Free)
- 5 tests per month
- 2 models per test
- Basic features
- Personal use only

### Professional ($29/month)
- Unlimited tests
- All models supported
- Advanced analytics
- API access
- Commercial use permitted

### Enterprise ($99/month)
- Team collaboration
- White-label option
- Custom integrations
- Dedicated support
- SLA guarantees

## 6. API KEY AND CREDENTIALS

- You are responsible for keeping your API keys confidential
- Do not share your license key with others
- One license per person/organization
- License keys are non-transferable
- Unauthorized sharing may result in account termination

## 7. DATA PRIVACY

- We do not retain your test data by default
- Free tier: 30-day retention
- Paid tiers: 90-day retention
- You can request data deletion anytime
- See Privacy Policy for full details

## 8. WARRANTY DISCLAIMER

THE SOFTWARE IS PROVIDED "AS-IS" WITHOUT ANY WARRANTIES. LICENSOR DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO:
- Merchantability
- Fitness for a particular purpose
- Non-infringement
- Accuracy of results

## 9. LIMITATION OF LIABILITY

IN NO EVENT SHALL LICENSOR BE LIABLE FOR:
- Any indirect, incidental, special, or consequential damages
- Loss of data, revenue, or profits
- Business interruption
- Even if advised of the possibility of such damages

## 10. TERMINATION

Licensor may terminate your license if you:
- Violate any terms of this agreement
- Fail to pay subscription fees
- Attempt to reverse engine

manifest.yaml

name: "Prompt Performance Tester"
id: "prompt-performance-tester"
version: "1.1.8"
description: "Model-agnostic prompt benchmarking across 9 providers. Pass any model ID from Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, Mistral — provider auto-detected. Measures latency, cost, quality, and consistency."

homepage: "https://unisai.vercel.app"
repository: "https://github.com/vedantsingh60/prompt-performance-tester"
source: "included"

intellectual_property:
  license: "free-to-use"
  license_file: "LICENSE.md"
  copyright: "© 2026 UnisAI. All rights reserved."
  distribution: "via-clawhub-only"
  source_code_access: "included"
  modification: "personal-use-only"
  reverse_engineering: "allowed-for-security-audit"

author:
  company: "UnisAI"
  contact: "hello@unisai.vercel.app"
  website: "https://unisai.vercel.app"

category: "ai-testing"
tags:
  - "prompt-testing"
  - "performance-analysis"
  - "cost-optimization"
  - "multi-llm"
  - "quality-assurance"
  - "benchmarking"
  - "llm-comparison"
  - "ai-testing"

pricing:
  model: "free"

runtime: "local"
execution: "python"

required_env_vars:
  - "ANTHROPIC_API_KEY"   # Required if testing Claude models
  - "OPENAI_API_KEY"      # Required if testing GPT models
  - "GOOGLE_API_KEY"      # Required if testing Gemini models
  - "MISTRAL_API_KEY"     # Required if testing Mistral models
  - "DEEPSEEK_API_KEY"    # Required if testing DeepSeek models
  - "XAI_API_KEY"         # Required if testing Grok/xAI models
  - "MINIMAX_API_KEY"     # Required if testing MiniMax models
  - "DASHSCOPE_API_KEY"   # Required if testing Qwen/Alibaba models
  - "OPENROUTER_API_KEY"  # Required if testing Llama/OpenRouter models
primary_credential: "At least ONE provider API key is required per provider you want to test"

dependencies:
  python: ">=3.9"
  packages:
    - "anthropic>=0.40.0"
    - "openai>=1.60.0"
    - "google-generativeai>=0.8.0"
    - "mistralai>=1.3.0"
  install_all: "pip install anthropic openai google-generativeai mistralai"
  install_selective: |
    pip install anthropic          # Claude
    pip install openai             # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama (OpenAI-compat)
    pip install google-generativeai  # Gemini
    pip install mistralai          # Mistral
  note: "Install only the SDKs for the providers you plan to test. DeepSeek, xAI, MiniMax, Qwen, and Llama all use the openai package with a custom base URL."
  requirements_file: "requirements.txt"

security:
  data_retention: "0 days"
  data_flow: "prompts-sent-to-chosen-ai-providers"
  third_party_data_sharing: |
    WARNING: This skill sends your prompts to whichever AI providers you select for testing.
    Each provider has their own data retention and privacy policies:
    - Anthropic: https://www.anthropic.com/legal/privacy
    - OpenAI: https://openai.com/policies/privacy-policy
    - Google: https://ai.google.dev/gemini-api/terms
    - Mistral: https://mistral.ai/terms/
    - DeepSeek: https://www.deepseek

Editorial read

Docs & README

Docs source

CLAWHUB

Editorial quality

thin

Skill: Prompt Performance Tester - UnisAI Owner: vedantsingh60 Summary: Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations. Tags: Latest:1.1.4, ai-testing:1.0.1, ai-testing multi-provider prompt-optimization cost-analysis llm-benchmarking claude gpt gemini performance-testing api-comparison multi-model:1.1.2, claude-api

Full README

Skill: Prompt Performance Tester - UnisAI

Owner: vedantsingh60

Summary: Test prompts across Claude, GPT, and Gemini models and get detailed latency, cost, quality, consistency, and error metrics with smart recommendations.

Tags: Latest:1.1.4, ai-testing:1.0.1, ai-testing multi-provider prompt-optimization cost-analysis llm-benchmarking claude gpt gemini performance-testing api-comparison multi-model:1.1.2, claude-api:1.0.1, cost-analysis:1.0.1, latest:1.1.9, llm-benchmarking:1.0.1, openai-api:1.0.1, prompt-optimization:1.0.1

Version history:

v1.1.9 | 2026-02-27T17:27:39.522Z | user

Updated provider/model lists to include more example models and the latest pricing.
Expanded cost comparison examples to include DeepSeek Chat.
Added clarification that unlisted models are supported, but cost is shown as $0.00 with a warning.
Improved explanation of quality, cost, and performance metrics for broader clarity.
Enhanced recommendation and real-world example sections to better showcase DeepSeek and new models.

v1.1.8 | 2026-02-27T17:27:27.786Z | user

Updated provider/model lists to include more example models and the latest pricing.
Expanded cost comparison examples to include DeepSeek Chat.
Added clarification that unlisted models are supported, but cost is shown as $0.00 with a warning.
Improved explanation of quality, cost, and performance metrics for broader clarity.
Enhanced recommendation and real-world example sections to better showcase DeepSeek and new models.

v1.1.7 | 2026-02-27T16:50:17.543Z | user

Expanded to support model-agnostic benchmarking across 9 major LLM providers.

Now supports any model ID with automatic provider detection based on name prefix—no hardcoded model list.
Added support for Anthropic, OpenAI, Google, Mistral, DeepSeek, xAI (Grok), MiniMax, Qwen, and Meta Llama via OpenRouter.
Documentation updated with supported model prefixes, API keys, and latest known per-token pricing for each provider.
All previous features (latency, cost, quality, consistency, recommendations) remain and apply to all supported models.

v1.1.6 | 2026-02-27T16:49:57.000Z | user

Expanded to support model-agnostic benchmarking across 9 major LLM providers.

Now supports any model ID with automatic provider detection based on name prefix—no hardcoded model list.
Added support for Anthropic, OpenAI, Google, Mistral, DeepSeek, xAI (Grok), MiniMax, Qwen, and Meta Llama via OpenRouter.
Documentation updated with supported model prefixes, API keys, and latest known per-token pricing for each provider.
All previous features (latency, cost, quality, consistency, recommendations) remain and apply to all supported models.
No code changes in this release; update is documentation-only.

v1.1.5 | 2026-02-16T20:07:53.120Z | user

Documentation reformatted with minor cleanups for clarity and consistency.
No functional or code changes in this release.
All prompts, features, and instructions remain unchanged.

v1.1.4 | 2026-02-02T03:45:04.686Z | user

1.1.4 is a documentation cleanup and simplification release.

SKILL.md significantly condensed for clarity and brevity.
Streamlined value proposition and example cost calculations.
Updated real-world model comparison table to use more recent models and prices.
Removed marketing language and focus on data-driven cost/quality analysis.
Reorganized use case and getting started sections for easier navigation.
No code or logic changes; documentation only.

v1.1.3 | 2026-02-02T03:15:28.352Z | user

Expanded support to 10 leading AI models, including latest Claude 4.5, GPT-5.2, and Gemini 3 series.
Updated model selection and pricing details to reflect 2026 releases and current rates.
Quick Start and plan descriptions now accommodate 10 models per test.
Enhanced marketing copy with up-to-date benchmarks and role-based benefits.
Real-world example and recommendations remain for user clarity.

v1.1.2 | 2026-02-02T03:03:35.305Z | user

Version 1.1.2

No code changes detected; documentation updated only.
SKILL.md rewritten for clarity and to emphasize product benefits.
Feature descriptions, real-world examples, and pricing explanations improved.
Enhanced summary of supported models and use cases.
Quick Start, API setup, and output formatting instructions are clearer and more actionable.

v1.1.1 | 2026-02-02T02:55:31.272Z | user

Major upgrade: Now supports prompt testing across Claude, OpenAI GPT, and Google Gemini, not just Claude.
Added cross-provider cost and quality comparison for 9 different LLM models.
New reporting shows latency, cost, and quality side-by-side for Anthropic, OpenAI, and Google models.
Recommendations now include fastest, cheapest, and best quality models across all providers.
Expanded use cases and provider-specific examples in documentation.

v1.1.0 | 2026-02-02T02:52:52.200Z | user

Version "1.1.0": - "✨ Multi-provider support: Claude, GPT, and Gemini" - "✨ 9 LLM models supported across 3 providers" - "✨ Cross-provider cost comparison engine" - "✨ Provider-specific API optimizations" - "✨ Enhanced recommendations with multi-provider insights" - "✨ Rebranded from Prompt Migrator to UniAI" - "🏷️ Updated tags for better discoverability (14 tags)" - "📊 Improved cost calculation accuracy" - "🔧 Added OpenAI and Google API integrations" - "📝 Updated documentation with multi-provider examples"

v1.0.1 | 2026-02-02T02:33:38.204Z | user

Version 1.0.1

Removed the file IP_PROTECTION_GUIDE.md.
Updated documentation links and support contact information to use unisai.vercel.app addresses.

v1.0.0 | 2026-02-02T02:22:51.962Z | user

Initial release - Multi-model prompt testing across OpenAI, Claude Haiku, Sonnet, and Opus with latency, cost, and quality metrics

Archive index:

Archive v1.1.9: 5 files, 17508 bytes

Files: LICENSE.md (4799b), manifest.yaml (7490b), prompt_performance_tester.py (22862b), SKILL.md (17974b), _meta.json (144b)

File v1.1.9:SKILL.md

Prompt Performance Tester

Model-agnostic prompt benchmarking across 9 providers.

Pass any model ID — provider auto-detected. Compare latency, cost, quality, and consistency across Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, and Mistral.

🚀 Why This Skill?

Problem Statement

Comparing LLM models across providers requires manual testing:

No systematic way to measure performance across models
Cost differences are significant but not easily comparable
Quality varies by use case and provider
Manual API testing is time-consuming and error-prone

The Solution

Test prompts across any model from any supported provider simultaneously. Get performance metrics and recommendations based on latency, cost, and quality.

Example Cost Comparison

For 10,000 requests/day with average 28 input + 115 output tokens:

Claude Opus 4.6: ~$30.15/day ($903/month)
Gemini 2.5 Flash-Lite: ~$0.05/day ($1.50/month)
DeepSeek Chat: ~$0.14/day ($4.20/month)
Monthly cost difference (Opus vs Flash-Lite): $901.50

✨ What You Get

Model-Agnostic Multi-Provider Testing

Pass any model ID — provider is auto-detected from the model name prefix. No hardcoded list; new models work without code changes.

| Provider | Example Models | Prefix | Required Key | |----------|---------------|--------|--------------| | Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 | claude- | ANTHROPIC_API_KEY | | OpenAI | gpt-5.2-pro, gpt-5.2, gpt-5.1 | gpt-, o1, o3 | OPENAI_API_KEY | | Google | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite | gemini- | GOOGLE_API_KEY | | Mistral | mistral-large-latest, mistral-small-latest | mistral-, mixtral- | MISTRAL_API_KEY | | DeepSeek | deepseek-chat, deepseek-reasoner | deepseek- | DEEPSEEK_API_KEY | | xAI | grok-4-1-fast, grok-3-beta | grok- | XAI_API_KEY | | MiniMax | MiniMax-M2.1 | MiniMax, minimax | MINIMAX_API_KEY | | Qwen | qwen3.5-plus, qwen3-max-instruct | qwen | DASHSCOPE_API_KEY | | Meta Llama | meta-llama/llama-4-maverick, meta-llama/llama-3.3-70b-instruct | meta-llama/, llama- | OPENROUTER_API_KEY |

Known Pricing (per 1M tokens)

| Model | Input | Output | |-------|-------|--------| | claude-opus-4-6 | $15.00 | $75.00 | | claude-sonnet-4-6 | $3.00 | $15.00 | | claude-haiku-4-5-20251001 | $1.00 | $5.00 | | gpt-5.2-pro | $21.00 | $168.00 | | gpt-5.2 | $1.75 | $14.00 | | gpt-5.1 | $2.00 | $8.00 | | gemini-2.5-pro | $1.25 | $10.00 | | gemini-2.5-flash | $0.30 | $2.50 | | gemini-2.5-flash-lite | $0.10 | $0.40 | | mistral-large-latest | $2.00 | $6.00 | | mistral-small-latest | $0.10 | $0.30 | | deepseek-chat | $0.27 | $1.10 | | deepseek-reasoner | $0.55 | $2.19 | | grok-4-1-fast | $5.00 | $25.00 | | grok-3-beta | $3.00 | $15.00 | | MiniMax-M2.1 | $0.40 | $1.60 | | qwen3.5-plus | $0.57 | $2.29 | | qwen3-max-instruct | $1.60 | $6.40 | | meta-llama/llama-4-maverick | $0.20 | $0.60 | | meta-llama/llama-3.3-70b-instruct | $0.59 | $0.79 |

Note: Unlisted models still work — cost calculation returns $0.00 with a warning. Pricing table is for reference only, not a validation gate.

Performance Metrics

Every test measures:

⚡ Latency — Response time in milliseconds
💰 Cost — Exact API cost per request (input + output tokens)
🎯 Quality — Response quality score (0–100)
📊 Token Usage — Input and output token counts
🔄 Consistency — Variance across multiple test runs
❌ Error Tracking — API failures, timeouts, rate limits

Smart Recommendations

Get instant answers to:

Which model is fastest for your prompt?
Which is most cost-effective?
Which produces best quality responses?
How much can you save by switching providers?

📊 Real-World Example

PROMPT: "Write a professional customer service response about a delayed shipment"

┌─────────────────────────────────────────────────────────────────┐
│ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE              │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  523ms                                                 │
│ Cost:     $0.000025                                             │
│ Quality:  65/100                                                │
│ Tokens:   28 in / 87 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK                        │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  710ms                                                 │
│ Cost:     $0.000048                                             │
│ Quality:  70/100                                                │
│ Tokens:   28 in / 92 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER             │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  891ms                                                 │
│ Cost:     $0.000145                                             │
│ Quality:  78/100                                                │
│ Tokens:   28 in / 102 out                                       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY                          │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  645ms                                                 │
│ Cost:     $0.000402                                             │
│ Quality:  88/100                                                │
│ Tokens:   28 in / 98 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE OPUS 4.6 (Anthropic) 🏆 HIGHEST QUALITY                 │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  1,234ms                                               │
│ Cost:     $0.001875                                             │
│ Quality:  94/100                                                │
│ Tokens:   28 in / 125 out                                       │
└─────────────────────────────────────────────────────────────────┘

🎯 RECOMMENDATIONS:
1. Most cost-effective: Gemini 2.5 Flash-Lite ($0.000025/request) — 99.98% cheaper than Opus
2. Budget pick: DeepSeek Chat ($0.000048/request) — strong quality at low cost
3. Best quality: Claude Opus 4.6 (94/100) — state-of-the-art reasoning & analysis
4. Smart pick: Claude Haiku 4.5 ($0.000145/request) — 81% cheaper, 83% quality match
5. Speed + Quality: GPT-5.2 ($0.000402/request) — excellent quality at mid-range cost

💡 Potential monthly savings (10,000 requests/day, 28 input + 115 output tokens avg):
   - Using Gemini 2.5 Flash-Lite vs Opus: $903/month saved ($1.44 vs $904.50)
   - Using DeepSeek Chat vs Opus: $899/month saved ($4.50 vs $904.50)
   - Using Claude Haiku vs Opus: $731/month saved ($173.40 vs $904.50)

Use Cases

Production Deployment

Evaluate models before production selection
Compare cost vs quality tradeoffs
Benchmark API latency across providers

Prompt Development

Test prompt variations across models
Measure quality scores consistently
Compare performance metrics

Cost Analysis

Analyze LLM API spending by model
Compare provider pricing structures
Identify cost-efficient alternatives

Performance Testing

Measure latency and response times
Test consistency across multiple runs
Evaluate quality scores

🚀 Quick Start

1. Subscribe to Skill

Click "Subscribe" on ClawhHub to get access.

2. Set API Keys

Add keys for the providers you want to test:

# Anthropic (Claude models)
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenAI (GPT models)
export OPENAI_API_KEY="sk-..."

# Google (Gemini models)
export GOOGLE_API_KEY="AI..."

# DeepSeek
export DEEPSEEK_API_KEY="..."

# xAI (Grok models)
export XAI_API_KEY="..."

# MiniMax
export MINIMAX_API_KEY="..."

# Alibaba (Qwen models)
export DASHSCOPE_API_KEY="..."

# OpenRouter (Meta Llama models)
export OPENROUTER_API_KEY="..."

# Mistral
export MISTRAL_API_KEY="..."

You only need keys for the providers you plan to test.

3. Install Dependencies

# Install only what you need
pip install anthropic          # Claude
pip install openai             # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama
pip install google-generativeai  # Gemini
pip install mistralai          # Mistral

# Or install everything
pip install anthropic openai google-generativeai mistralai

4. Run Your First Test

Option A: Python

import os
from prompt_performance_tester import PromptPerformanceTester

tester = PromptPerformanceTester()  # reads API keys from environment

results = tester.test_prompt(
    prompt_text="Write a professional email apologizing for a delayed shipment",
    models=[
        "claude-haiku-4-5-20251001",
        "gpt-5.2",
        "gemini-2.5-flash",
        "deepseek-chat",
    ],
    num_runs=3,
    max_tokens=500
)

print(tester.format_results(results))
print(f"🏆 Best quality:  {results.best_model}")
print(f"💰 Cheapest:      {results.cheapest_model}")
print(f"⚡ Fastest:       {results.fastest_model}")

Option B: CLI

# Test across multiple models
prompt-tester test "Your prompt here" \
  --models claude-haiku-4-5-20251001 gpt-5.2 gemini-2.5-flash deepseek-chat \
  --runs 3

# Export results
prompt-tester test "Your prompt here" --export results.json

🔒 Security & Privacy

API Key Safety

Keys stored in environment variables only — never hardcoded or logged
Never transmitted to UnisAI servers
HTTPS encryption for all provider API calls

Data Privacy

Your prompts are sent only to the AI providers you select for testing
Each provider has their own data retention policy (see their privacy pages)
No data stored on UnisAI infrastructure

📚 Technical Details

System Requirements

Python: 3.9+
Dependencies: anthropic, openai, google-generativeai, mistralai (install only what you need)
Platform: macOS, Linux, Windows

Architecture

Lazy client initialization — SDK clients only loaded for providers actually tested
Prefix-based routing — PROVIDER_MAP detects provider from model name; no hardcoded whitelist
OpenAI-compat path — DeepSeek, xAI, MiniMax, Qwen, and OpenRouter all use the openai SDK with a custom base_url
Pricing table — used for cost calculation only; unknown models get cost=0 with a warning

Metrics Collected

Every test captures:

Latency: Total response time (ms)
Cost: Input + output cost based on known pricing (USD)
Quality: Heuristic response score based on length, completeness (0–100)
Tokens: Exact input/output token counts per provider
Consistency: Standard deviation across multiple runs
Errors: Timeouts, rate limits, API failures

❓ Frequently Asked Questions

Q: Do I need API keys for all 9 providers? A: No. You only need keys for the providers you want to test. If you only test Claude models, you only need ANTHROPIC_API_KEY.

Q: Who pays for the API costs? A: You do. You provide your own API keys and pay each provider directly. This skill has no per-request fees.

Q: How accurate are the cost calculations? A: Costs are calculated from the known pricing table using actual token counts. Models not in the pricing table return $0.00 — the model still runs, the cost just won't be shown.

Q: Can I test models not in the pricing table? A: Yes. Any model whose name starts with a supported prefix will run. Cost will show as $0.00 for unlisted models.

Q: Can I test prompts in non-English languages? A: Yes. All supported providers handle multiple languages.

Q: Can I use this in production/CI/CD? A: Yes. Import PromptPerformanceTester directly from Python or call via CLI.

Q: What if my prompt is very long? A: Set max_tokens appropriately. The skill passes your prompt as-is to each provider's API.

🗺️ Roadmap

✅ Current Release (v1.1.8)

Model-agnostic architecture — any model ID works via prefix detection
9 providers, 20 known models with pricing
DeepSeek, xAI Grok, MiniMax, Qwen, Meta Llama as first-class providers
Claude 4.6 series (opus-4-6, sonnet-4-6)
Lazy client initialization — only loads SDKs for providers actually used
Fixed UnisAI branding throughout

🚧 Coming Soon (v1.3)

Batch testing: Test 100+ prompts simultaneously
Historical tracking: Track model performance over time
Webhook integrations: Slack, Discord, email notifications

🔮 Future (v1.3+)

A/B testing framework: Scientific prompt experimentation
Fine-tuning insights: Which models to fine-tune for your use case
Custom benchmarks: Create your own evaluation criteria
Auto-optimization: AI-powered prompt improvement suggestions

📞 Support

Email: support@unisai.vercel.app
Website: https://unisai.vercel.app
Bug Reports: support@unisai.vercel.app

📄 License & Terms

This skill is distributed via ClawhHub under the following terms.

✅ You CAN:

Use for your own business and projects
Test prompts for internal applications
Modify source code for personal use

❌ You CANNOT:

Redistribute outside ClawhHub registry
Resell or sublicense
Use UnisAI trademark without permission

Full Terms: See LICENSE.md

📝 Changelog

[1.1.8] - 2026-02-27

Fixes & Polish

Bumped version to 1.1.8
SKILL.md fully rewritten — cleaned up formatting, removed stale content
Removed old IP watermark reference (PROPRIETARY_SKILL_VEDANT_2024) from docs
Corrected watermark to PROPRIETARY_SKILL_UNISAI_2026_MULTI_PROVIDER throughout
Fixed all UnisAI branding (was UniAI in v1.1.0 changelog)
Updated pricing table to include all 20 known models
Cleaned up FAQ, Quick Start, and Use Cases sections

[1.1.6] - 2026-02-27

🏗️ Model-Agnostic Architecture

Provider auto-detected from model name prefix — no hardcoded whitelist
Any new model works automatically without code changes
Added DeepSeek, xAI Grok, MiniMax, Qwen, Meta Llama as first-class providers (9 total)
Updated Claude to 4.6 series (claude-opus-4-6, claude-sonnet-4-6)
Lazy client initialization — only loads SDKs for providers actually tested
Unified OpenAI-compat path for DeepSeek, xAI, MiniMax, Qwen, OpenRouter

[1.1.5] - 2026-02-01

🚀 Latest Models Update

GPT-5.2 Series — Added Instant, Thinking, and Pro variants
Gemini 2.5 Series — Updated to 2.5 Pro, Flash, and Flash-Lite
Claude 4.5 pricing updates
10 total models across 3 providers

[1.1.0] - 2026-01-15

✨ Major Features

Multi-provider support — Claude, GPT, Gemini
Cross-provider cost comparison
Enhanced recommendations engine
Rebranded to UnisAI

[1.0.0] - 2024-02-02

Initial Release

Claude-only prompt testing (Haiku, Sonnet, Opus)
Performance metrics: latency, cost, quality, consistency
Basic recommendations engine

Last Updated: February 27, 2026 Current Version: 1.1.8 Status: Active & Maintained

File v1.1.9:_meta.json

{ "ownerId": "kn77yjs5esft2kgsd6dpz9c92n80dgsy", "slug": "prompt-performance-tester", "version": "1.1.9", "publishedAt": 1772213259522 }

File v1.1.9:LICENSE.md

UniAI Skills - Proprietary License

Version 1.0 | Effective Date: February 2, 2024

1. GRANT OF LICENSE

UniAI ("Licensor") grants you ("Licensee") a limited, non-exclusive, non-transferable, revocable license to use the ClawhHub Skills ("Software") solely in accordance with the terms of this license agreement.

2. LICENSE RESTRICTIONS

You may NOT:

Reverse engineer, decompile, or disassemble the Software
Modify, alter, or create derivative works of the Software
Remove, obscure, or alter any proprietary notices or labels on the Software
Share, distribute, or sublicense the Software to any third party
Use the Software for commercial purposes without a commercial license
Access or use the Software beyond the scope of your subscription tier
Attempt to circumvent licensing controls or API rate limits

3. INTELLECTUAL PROPERTY RIGHTS

All intellectual property rights in and to the Software are retained by Licensor. This includes:

Source code and object code
Algorithms and methodologies
Performance optimization techniques
Quality scoring mechanisms
Proprietary data structures
Trade secrets and confidential information

4. PERMITTED USES

You may only:

Use the Software as provided through the ClawhHub platform
Access features available in your subscription tier
Create test results and reports for internal use
Share results with your team (if on a team plan)
Provide feedback to improve the Software

5. SUBSCRIPTION TIERS

Starter (Free)

5 tests per month
2 models per test
Basic features
Personal use only

Professional ($29/month)

Unlimited tests
All models supported
Advanced analytics
API access
Commercial use permitted

Enterprise ($99/month)

Team collaboration
White-label option
Custom integrations
Dedicated support
SLA guarantees

6. API KEY AND CREDENTIALS

You are responsible for keeping your API keys confidential
Do not share your license key with others
One license per person/organization
License keys are non-transferable
Unauthorized sharing may result in account termination

7. DATA PRIVACY

We do not retain your test data by default
Free tier: 30-day retention
Paid tiers: 90-day retention
You can request data deletion anytime
See Privacy Policy for full details

8. WARRANTY DISCLAIMER

THE SOFTWARE IS PROVIDED "AS-IS" WITHOUT ANY WARRANTIES. LICENSOR DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO:

Merchantability
Fitness for a particular purpose
Non-infringement
Accuracy of results

9. LIMITATION OF LIABILITY

IN NO EVENT SHALL LICENSOR BE LIABLE FOR:

Any indirect, incidental, special, or consequential damages
Loss of data, revenue, or profits
Business interruption
Even if advised of the possibility of such damages

10. TERMINATION

Licensor may terminate your license if you:

Violate any terms of this agreement
Fail to pay subscription fees
Attempt to reverse engineer the Software
Share your license key with others
Use the Software unlawfully

Upon termination:

Your access to the Software is immediately revoked
You must destroy all copies of the Software in your possession
Any refunds are subject to our refund policy

11. COMMERCIAL LICENSE

To use the Software for commercial purposes:

Starter tier: Personal use only
Professional tier: Commercial use permitted
Enterprise tier: Team commercial use permitted

For commercial use with Starter tier, contact: hello@unisai.vercel.app

12. THIRD-PARTY SERVICES

The Software uses third-party services (e.g., Anthropic API). Your use is also subject to their terms of service:

Anthropic: https://www.anthropic.com/terms
OpenAI: https://openai.com/terms (if applicable)

13. MODIFICATIONS TO SOFTWARE

Licensor reserves the right to:

Update the Software at any time
Add or remove features
Change pricing (with 30 days notice)
Discontinue the Software (with 60 days notice)

14. COMPLIANCE

You agree to comply with all applicable laws and regulations in your jurisdiction when using the Software.

15. DISPUTE RESOLUTION

Any disputes arising from this agreement shall be:

Resolved through binding arbitration
Governed by California law
Conducted in English

16. ENTIRE AGREEMENT

This agreement, along with our Privacy Policy and Terms of Service, constitutes the entire agreement between you and Licensor regarding the Software.

17. CONTACT

For licensing inquiries or support:

Email: hello@unisai.vercel.app
Website: https://unisai.vercel.app
Support: vedxnts@gmail.com
X: vedxnts

By using the Software, you acknowledge that you have read, understood, and agree to be bound by this License Agreement.

File v1.1.9:manifest.yaml

name: "Prompt Performance Tester" id: "prompt-performance-tester" version: "1.1.8" description: "Model-agnostic prompt benchmarking across 9 providers. Pass any model ID from Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, Mistral — provider auto-detected. Measures latency, cost, quality, and consistency."

homepage: "https://unisai.vercel.app" repository: "https://github.com/vedantsingh60/prompt-performance-tester" source: "included"

intellectual_property: license: "free-to-use" license_file: "LICENSE.md" copyright: "© 2026 UnisAI. All rights reserved." distribution: "via-clawhub-only" source_code_access: "included" modification: "personal-use-only" reverse_engineering: "allowed-for-security-audit"

author: company: "UnisAI" contact: "hello@unisai.vercel.app" website: "https://unisai.vercel.app"

category: "ai-testing" tags:

"prompt-testing"
"performance-analysis"
"cost-optimization"
"multi-llm"
"quality-assurance"
"benchmarking"
"llm-comparison"
"ai-testing"

pricing: model: "free"

runtime: "local" execution: "python"

required_env_vars:

"ANTHROPIC_API_KEY" # Required if testing Claude models
"OPENAI_API_KEY" # Required if testing GPT models
"GOOGLE_API_KEY" # Required if testing Gemini models
"MISTRAL_API_KEY" # Required if testing Mistral models
"DEEPSEEK_API_KEY" # Required if testing DeepSeek models
"XAI_API_KEY" # Required if testing Grok/xAI models
"MINIMAX_API_KEY" # Required if testing MiniMax models
"DASHSCOPE_API_KEY" # Required if testing Qwen/Alibaba models
"OPENROUTER_API_KEY" # Required if testing Llama/OpenRouter models primary_credential: "At least ONE provider API key is required per provider you want to test"

dependencies: python: ">=3.9" packages: - "anthropic>=0.40.0" - "openai>=1.60.0" - "google-generativeai>=0.8.0" - "mistralai>=1.3.0" install_all: "pip install anthropic openai google-generativeai mistralai" install_selective: | pip install anthropic # Claude pip install openai # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama (OpenAI-compat) pip install google-generativeai # Gemini pip install mistralai # Mistral note: "Install only the SDKs for the providers you plan to test. DeepSeek, xAI, MiniMax, Qwen, and Llama all use the openai package with a custom base URL." requirements_file: "requirements.txt"

security: data_retention: "0 days" data_flow: "prompts-sent-to-chosen-ai-providers" third_party_data_sharing: | WARNING: This skill sends your prompts to whichever AI providers you select for testing. Each provider has their own data retention and privacy policies: - Anthropic: https://www.anthropic.com/legal/privacy - OpenAI: https://openai.com/policies/privacy-policy - Google: https://ai.google.dev/gemini-api/terms - Mistral: https://mistral.ai/terms/ - DeepSeek: https://www.deepseek.com/privacy_policy - xAI: https://x.ai/privacy - OpenRouter: https://openrouter.ai/privacy api_key_storage: "Environment variables only — never hardcoded or logged" network_access: "Required to call chosen AI provider APIs"

capabilities: functions: - name: "testPrompt" description: "Test a prompt across multiple LLM models and providers" parameters: prompt_text: type: "string" description: "The prompt to benchmark" required: true models: type: "array" description: "List of model IDs to test — any model matching a supported prefix works" items: type: "string" examples: - "claude-sonnet-4-6" - "gpt-5.2" - "deepseek-chat" - "grok-4-1-fast" - "gemini-2.5-flash" required: false num_runs: type: "number" description: "Number of runs per model for consistency testing" default: 1 range: [1, 10] system_prompt: type: "string" description: "Optional system prompt" max_tokens: type: "number" description: "Maximum response tokens" default: 1000 range: [100, 4000]

environment_variables: ANTHROPIC_API_KEY: description: "Anthropic API key — required for any claude-* model" required_for_prefix: "claude-" OPENAI_API_KEY: description: "OpenAI API key — required for any gpt-, o1, o3* model" required_for_prefix: "gpt-, o1, o3" GOOGLE_API_KEY: description: "Google AI API key — required for any gemini-* model" required_for_prefix: "gemini-" MISTRAL_API_KEY: description: "Mistral API key — required for mistral-, mixtral- models" required_for_prefix: "mistral-, mixtral-" DEEPSEEK_API_KEY: description: "DeepSeek API key — required for any deepseek-* model" required_for_prefix: "deepseek-" XAI_API_KEY: description: "xAI API key — required for any grok-* model" required_for_prefix: "grok-" MINIMAX_API_KEY: description: "MiniMax API key — required for minimax* or MiniMax* models" required_for_prefix: "minimax, MiniMax" DASHSCOPE_API_KEY: description: "Alibaba DashScope API key — required for any qwen* model" required_for_prefix: "qwen" OPENROUTER_API_KEY: description: "OpenRouter API key — required for meta-llama/* or llama-* models" required_for_prefix: "meta-llama/, llama-"

support: support_email: "support@unisai.vercel.app" website: "https://unisai.vercel.app" github: "https://github.com/vedantsingh60/prompt-performance-tester" documentation: "See SKILL.md in this package" response_time: "Best effort — community supported"

restrictions:

"No redistribution outside ClawhHub registry"
"No resale or sublicensing"
"No trademark usage without permission"
"Modifications allowed for personal use only"

changelog: "1.1.8": - "🏗️ Model-agnostic architecture — provider auto-detected from model name prefix, no hardcoded whitelist" - "✨ Added DeepSeek, xAI Grok, MiniMax, Qwen as first-class providers (9 total)" - "✨ Updated Claude to 4.6 series (claude-opus-4-6, claude-sonnet-4-6)" - "✨ Any future model works automatically without code changes" - "🔧 Lazy client initialization — only loads SDKs for providers actually used" - "🔧 Unified OpenAI-compat path for DeepSeek, xAI, MiniMax, Qwen, OpenRouter" - "📝 Fixed UnisAI branding (was UniAI)" - "💰 Updated pricing table with 20 models across 9 providers" "1.1.5": - "🚀 Updated to latest 2026 models" - "✨ GPT-5.2 series (Instant, Thinking, Pro)" - "✨ Gemini 3 Pro and 2.5 series" - "✨ Claude 4.5 pricing updates" - "✨ 10 total models across 3 providers" "1.1.0": - "✨ Multi-provider support (Claude, GPT, Gemini)" - "✨ Cross-provider cost comparison" - "✨ Enhanced recommendations engine" "1.0.0": - "Initial release with Claude-only support" - "Performance metrics: latency, cost, quality, consistency"

metadata: status: "active" created_at: "2024-02-02T00:00:00Z" updated_at: "2026-02-27T00:00:00Z" maturity: "production" maintenance: "actively-maintained" compatibility: - "OpenClaw v1.0+" - "Claude Code" - "ClawhHub v2.0+" security_audit: "Source code included for security review and transparency"

Archive v1.1.8: 5 files, 17509 bytes

Files: LICENSE.md (4799b), manifest.yaml (7490b), prompt_performance_tester.py (22862b), SKILL.md (17974b), _meta.json (144b)

File v1.1.8:SKILL.md

Prompt Performance Tester

Model-agnostic prompt benchmarking across 9 providers.

Pass any model ID — provider auto-detected. Compare latency, cost, quality, and consistency across Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, and Mistral.

🚀 Why This Skill?

Problem Statement

Comparing LLM models across providers requires manual testing:

No systematic way to measure performance across models
Cost differences are significant but not easily comparable
Quality varies by use case and provider
Manual API testing is time-consuming and error-prone

The Solution

Test prompts across any model from any supported provider simultaneously. Get performance metrics and recommendations based on latency, cost, and quality.

Example Cost Comparison

For 10,000 requests/day with average 28 input + 115 output tokens:

Claude Opus 4.6: ~$30.15/day ($903/month)
Gemini 2.5 Flash-Lite: ~$0.05/day ($1.50/month)
DeepSeek Chat: ~$0.14/day ($4.20/month)
Monthly cost difference (Opus vs Flash-Lite): $901.50

✨ What You Get

Model-Agnostic Multi-Provider Testing

Pass any model ID — provider is auto-detected from the model name prefix. No hardcoded list; new models work without code changes.

| Provider | Example Models | Prefix | Required Key | |----------|---------------|--------|--------------| | Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 | claude- | ANTHROPIC_API_KEY | | OpenAI | gpt-5.2-pro, gpt-5.2, gpt-5.1 | gpt-, o1, o3 | OPENAI_API_KEY | | Google | gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite | gemini- | GOOGLE_API_KEY | | Mistral | mistral-large-latest, mistral-small-latest | mistral-, mixtral- | MISTRAL_API_KEY | | DeepSeek | deepseek-chat, deepseek-reasoner | deepseek- | DEEPSEEK_API_KEY | | xAI | grok-4-1-fast, grok-3-beta | grok- | XAI_API_KEY | | MiniMax | MiniMax-M2.1 | MiniMax, minimax | MINIMAX_API_KEY | | Qwen | qwen3.5-plus, qwen3-max-instruct | qwen | DASHSCOPE_API_KEY | | Meta Llama | meta-llama/llama-4-maverick, meta-llama/llama-3.3-70b-instruct | meta-llama/, llama- | OPENROUTER_API_KEY |

Known Pricing (per 1M tokens)

| Model | Input | Output | |-------|-------|--------| | claude-opus-4-6 | $15.00 | $75.00 | | claude-sonnet-4-6 | $3.00 | $15.00 | | claude-haiku-4-5-20251001 | $1.00 | $5.00 | | gpt-5.2-pro | $21.00 | $168.00 | | gpt-5.2 | $1.75 | $14.00 | | gpt-5.1 | $2.00 | $8.00 | | gemini-2.5-pro | $1.25 | $10.00 | | gemini-2.5-flash | $0.30 | $2.50 | | gemini-2.5-flash-lite | $0.10 | $0.40 | | mistral-large-latest | $2.00 | $6.00 | | mistral-small-latest | $0.10 | $0.30 | | deepseek-chat | $0.27 | $1.10 | | deepseek-reasoner | $0.55 | $2.19 | | grok-4-1-fast | $5.00 | $25.00 | | grok-3-beta | $3.00 | $15.00 | | MiniMax-M2.1 | $0.40 | $1.60 | | qwen3.5-plus | $0.57 | $2.29 | | qwen3-max-instruct | $1.60 | $6.40 | | meta-llama/llama-4-maverick | $0.20 | $0.60 | | meta-llama/llama-3.3-70b-instruct | $0.59 | $0.79 |

Note: Unlisted models still work — cost calculation returns $0.00 with a warning. Pricing table is for reference only, not a validation gate.

Performance Metrics

Every test measures:

⚡ Latency — Response time in milliseconds
💰 Cost — Exact API cost per request (input + output tokens)
🎯 Quality — Response quality score (0–100)
📊 Token Usage — Input and output token counts
🔄 Consistency — Variance across multiple test runs
❌ Error Tracking — API failures, timeouts, rate limits

Smart Recommendations

Get instant answers to:

Which model is fastest for your prompt?
Which is most cost-effective?
Which produces best quality responses?
How much can you save by switching providers?

📊 Real-World Example

PROMPT: "Write a professional customer service response about a delayed shipment"

┌─────────────────────────────────────────────────────────────────┐
│ GEMINI 2.5 FLASH-LITE (Google) 💰 MOST AFFORDABLE              │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  523ms                                                 │
│ Cost:     $0.000025                                             │
│ Quality:  65/100                                                │
│ Tokens:   28 in / 87 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ DEEPSEEK CHAT (DeepSeek) 💡 BUDGET PICK                        │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  710ms                                                 │
│ Cost:     $0.000048                                             │
│ Quality:  70/100                                                │
│ Tokens:   28 in / 92 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE HAIKU 4.5 (Anthropic) 🚀 BALANCED PERFORMER             │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  891ms                                                 │
│ Cost:     $0.000145                                             │
│ Quality:  78/100                                                │
│ Tokens:   28 in / 102 out                                       │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ GPT-5.2 (OpenAI) 💡 EXCELLENT QUALITY                          │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  645ms                                                 │
│ Cost:     $0.000402                                             │
│ Quality:  88/100                                                │
│ Tokens:   28 in / 98 out                                        │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ CLAUDE OPUS 4.6 (Anthropic) 🏆 HIGHEST QUALITY                 │
├─────────────────────────────────────────────────────────────────┤
│ Latency:  1,234ms                                               │
│ Cost:     $0.001875                                             │
│ Quality:  94/100                                                │
│ Tokens:   28 in / 125 out                                       │
└─────────────────────────────────────────────────────────────────┘

🎯 RECOMMENDATIONS:
1. Most cost-effective: Gemini 2.5 Flash-Lite ($0.000025/request) — 99.98% cheaper than Opus
2. Budget pick: DeepSeek Chat ($0.000048/request) — strong quality at low cost
3. Best quality: Claude Opus 4.6 (94/100) — state-of-the-art reasoning & analysis
4. Smart pick: Claude Haiku 4.5 ($0.000145/request) — 81% cheaper, 83% quality match
5. Speed + Quality: GPT-5.2 ($0.000402/request) — excellent quality at mid-range cost

💡 Potential monthly savings (10,000 requests/day, 28 input + 115 output tokens avg):
   - Using Gemini 2.5 Flash-Lite vs Opus: $903/month saved ($1.44 vs $904.50)
   - Using DeepSeek Chat vs Opus: $899/month saved ($4.50 vs $904.50)
   - Using Claude Haiku vs Opus: $731/month saved ($173.40 vs $904.50)

Use Cases

Production Deployment

Evaluate models before production selection
Compare cost vs quality tradeoffs
Benchmark API latency across providers

Prompt Development

Test prompt variations across models
Measure quality scores consistently
Compare performance metrics

Cost Analysis

Analyze LLM API spending by model
Compare provider pricing structures
Identify cost-efficient alternatives

Performance Testing

Measure latency and response times
Test consistency across multiple runs
Evaluate quality scores

🚀 Quick Start

1. Subscribe to Skill

Click "Subscribe" on ClawhHub to get access.

2. Set API Keys

Add keys for the providers you want to test:

# Anthropic (Claude models)
export ANTHROPIC_API_KEY="sk-ant-..."

# OpenAI (GPT models)
export OPENAI_API_KEY="sk-..."

# Google (Gemini models)
export GOOGLE_API_KEY="AI..."

# DeepSeek
export DEEPSEEK_API_KEY="..."

# xAI (Grok models)
export XAI_API_KEY="..."

# MiniMax
export MINIMAX_API_KEY="..."

# Alibaba (Qwen models)
export DASHSCOPE_API_KEY="..."

# OpenRouter (Meta Llama models)
export OPENROUTER_API_KEY="..."

# Mistral
export MISTRAL_API_KEY="..."

You only need keys for the providers you plan to test.

3. Install Dependencies

# Install only what you need
pip install anthropic          # Claude
pip install openai             # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama
pip install google-generativeai  # Gemini
pip install mistralai          # Mistral

# Or install everything
pip install anthropic openai google-generativeai mistralai

4. Run Your First Test

Option A: Python

import os
from prompt_performance_tester import PromptPerformanceTester

tester = PromptPerformanceTester()  # reads API keys from environment

results = tester.test_prompt(
    prompt_text="Write a professional email apologizing for a delayed shipment",
    models=[
        "claude-haiku-4-5-20251001",
        "gpt-5.2",
        "gemini-2.5-flash",
        "deepseek-chat",
    ],
    num_runs=3,
    max_tokens=500
)

print(tester.format_results(results))
print(f"🏆 Best quality:  {results.best_model}")
print(f"💰 Cheapest:      {results.cheapest_model}")
print(f"⚡ Fastest:       {results.fastest_model}")

Option B: CLI

# Test across multiple models
prompt-tester test "Your prompt here" \
  --models claude-haiku-4-5-20251001 gpt-5.2 gemini-2.5-flash deepseek-chat \
  --runs 3

# Export results
prompt-tester test "Your prompt here" --export results.json

🔒 Security & Privacy

API Key Safety

Keys stored in environment variables only — never hardcoded or logged
Never transmitted to UnisAI servers
HTTPS encryption for all provider API calls

Data Privacy

Your prompts are sent only to the AI providers you select for testing
Each provider has their own data retention policy (see their privacy pages)
No data stored on UnisAI infrastructure

📚 Technical Details

System Requirements

Python: 3.9+
Dependencies: anthropic, openai, google-generativeai, mistralai (install only what you need)
Platform: macOS, Linux, Windows

Architecture

Lazy client initialization — SDK clients only loaded for providers actually tested
Prefix-based routing — PROVIDER_MAP detects provider from model name; no hardcoded whitelist
OpenAI-compat path — DeepSeek, xAI, MiniMax, Qwen, and OpenRouter all use the openai SDK with a custom base_url
Pricing table — used for cost calculation only; unknown models get cost=0 with a warning

Metrics Collected

Every test captures:

Latency: Total response time (ms)
Cost: Input + output cost based on known pricing (USD)
Quality: Heuristic response score based on length, completeness (0–100)
Tokens: Exact input/output token counts per provider
Consistency: Standard deviation across multiple runs
Errors: Timeouts, rate limits, API failures

❓ Frequently Asked Questions

Q: Do I need API keys for all 9 providers? A: No. You only need keys for the providers you want to test. If you only test Claude models, you only need ANTHROPIC_API_KEY.

Q: Who pays for the API costs? A: You do. You provide your own API keys and pay each provider directly. This skill has no per-request fees.

Q: How accurate are the cost calculations? A: Costs are calculated from the known pricing table using actual token counts. Models not in the pricing table return $0.00 — the model still runs, the cost just won't be shown.

Q: Can I test models not in the pricing table? A: Yes. Any model whose name starts with a supported prefix will run. Cost will show as $0.00 for unlisted models.

Q: Can I test prompts in non-English languages? A: Yes. All supported providers handle multiple languages.

Q: Can I use this in production/CI/CD? A: Yes. Import PromptPerformanceTester directly from Python or call via CLI.

Q: What if my prompt is very long? A: Set max_tokens appropriately. The skill passes your prompt as-is to each provider's API.

🗺️ Roadmap

✅ Current Release (v1.1.8)

Model-agnostic architecture — any model ID works via prefix detection
9 providers, 20 known models with pricing
DeepSeek, xAI Grok, MiniMax, Qwen, Meta Llama as first-class providers
Claude 4.6 series (opus-4-6, sonnet-4-6)
Lazy client initialization — only loads SDKs for providers actually used
Fixed UnisAI branding throughout

🚧 Coming Soon (v1.3)

Batch testing: Test 100+ prompts simultaneously
Historical tracking: Track model performance over time
Webhook integrations: Slack, Discord, email notifications

🔮 Future (v1.3+)

A/B testing framework: Scientific prompt experimentation
Fine-tuning insights: Which models to fine-tune for your use case
Custom benchmarks: Create your own evaluation criteria
Auto-optimization: AI-powered prompt improvement suggestions

📞 Support

Email: support@unisai.vercel.app
Website: https://unisai.vercel.app
Bug Reports: support@unisai.vercel.app

📄 License & Terms

This skill is distributed via ClawhHub under the following terms.

✅ You CAN:

Use for your own business and projects
Test prompts for internal applications
Modify source code for personal use

❌ You CANNOT:

Redistribute outside ClawhHub registry
Resell or sublicense
Use UnisAI trademark without permission

Full Terms: See LICENSE.md

📝 Changelog

[1.1.8] - 2026-02-27

Fixes & Polish

Bumped version to 1.1.8
SKILL.md fully rewritten — cleaned up formatting, removed stale content
Removed old IP watermark reference (PROPRIETARY_SKILL_VEDANT_2024) from docs
Corrected watermark to PROPRIETARY_SKILL_UNISAI_2026_MULTI_PROVIDER throughout
Fixed all UnisAI branding (was UniAI in v1.1.0 changelog)
Updated pricing table to include all 20 known models
Cleaned up FAQ, Quick Start, and Use Cases sections

[1.1.6] - 2026-02-27

🏗️ Model-Agnostic Architecture

Provider auto-detected from model name prefix — no hardcoded whitelist
Any new model works automatically without code changes
Added DeepSeek, xAI Grok, MiniMax, Qwen, Meta Llama as first-class providers (9 total)
Updated Claude to 4.6 series (claude-opus-4-6, claude-sonnet-4-6)
Lazy client initialization — only loads SDKs for providers actually tested
Unified OpenAI-compat path for DeepSeek, xAI, MiniMax, Qwen, OpenRouter

[1.1.5] - 2026-02-01

🚀 Latest Models Update

GPT-5.2 Series — Added Instant, Thinking, and Pro variants
Gemini 2.5 Series — Updated to 2.5 Pro, Flash, and Flash-Lite
Claude 4.5 pricing updates
10 total models across 3 providers

[1.1.0] - 2026-01-15

✨ Major Features

Multi-provider support — Claude, GPT, Gemini
Cross-provider cost comparison
Enhanced recommendations engine
Rebranded to UnisAI

[1.0.0] - 2024-02-02

Initial Release

Claude-only prompt testing (Haiku, Sonnet, Opus)
Performance metrics: latency, cost, quality, consistency
Basic recommendations engine

Last Updated: February 27, 2026 Current Version: 1.1.8 Status: Active & Maintained

File v1.1.8:_meta.json

{ "ownerId": "kn77yjs5esft2kgsd6dpz9c92n80dgsy", "slug": "prompt-performance-tester", "version": "1.1.8", "publishedAt": 1772213247786 }

File v1.1.8:LICENSE.md

UniAI Skills - Proprietary License

Version 1.0 | Effective Date: February 2, 2024

1. GRANT OF LICENSE

UniAI ("Licensor") grants you ("Licensee") a limited, non-exclusive, non-transferable, revocable license to use the ClawhHub Skills ("Software") solely in accordance with the terms of this license agreement.

2. LICENSE RESTRICTIONS

You may NOT:

Reverse engineer, decompile, or disassemble the Software
Modify, alter, or create derivative works of the Software
Remove, obscure, or alter any proprietary notices or labels on the Software
Share, distribute, or sublicense the Software to any third party
Use the Software for commercial purposes without a commercial license
Access or use the Software beyond the scope of your subscription tier
Attempt to circumvent licensing controls or API rate limits

3. INTELLECTUAL PROPERTY RIGHTS

All intellectual property rights in and to the Software are retained by Licensor. This includes:

Source code and object code
Algorithms and methodologies
Performance optimization techniques
Quality scoring mechanisms
Proprietary data structures
Trade secrets and confidential information

4. PERMITTED USES

You may only:

Use the Software as provided through the ClawhHub platform
Access features available in your subscription tier
Create test results and reports for internal use
Share results with your team (if on a team plan)
Provide feedback to improve the Software

5. SUBSCRIPTION TIERS

Starter (Free)

5 tests per month
2 models per test
Basic features
Personal use only

Professional ($29/month)

Unlimited tests
All models supported
Advanced analytics
API access
Commercial use permitted

Enterprise ($99/month)

Team collaboration
White-label option
Custom integrations
Dedicated support
SLA guarantees

6. API KEY AND CREDENTIALS

You are responsible for keeping your API keys confidential
Do not share your license key with others
One license per person/organization
License keys are non-transferable
Unauthorized sharing may result in account termination

7. DATA PRIVACY

We do not retain your test data by default
Free tier: 30-day retention
Paid tiers: 90-day retention
You can request data deletion anytime
See Privacy Policy for full details

8. WARRANTY DISCLAIMER

THE SOFTWARE IS PROVIDED "AS-IS" WITHOUT ANY WARRANTIES. LICENSOR DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO:

Merchantability
Fitness for a particular purpose
Non-infringement
Accuracy of results

9. LIMITATION OF LIABILITY

IN NO EVENT SHALL LICENSOR BE LIABLE FOR:

Any indirect, incidental, special, or consequential damages
Loss of data, revenue, or profits
Business interruption
Even if advised of the possibility of such damages

10. TERMINATION

Licensor may terminate your license if you:

Violate any terms of this agreement
Fail to pay subscription fees
Attempt to reverse engineer the Software
Share your license key with others
Use the Software unlawfully

Upon termination:

Your access to the Software is immediately revoked
You must destroy all copies of the Software in your possession
Any refunds are subject to our refund policy

11. COMMERCIAL LICENSE

To use the Software for commercial purposes:

Starter tier: Personal use only
Professional tier: Commercial use permitted
Enterprise tier: Team commercial use permitted

For commercial use with Starter tier, contact: hello@unisai.vercel.app

12. THIRD-PARTY SERVICES

The Software uses third-party services (e.g., Anthropic API). Your use is also subject to their terms of service:

Anthropic: https://www.anthropic.com/terms
OpenAI: https://openai.com/terms (if applicable)

13. MODIFICATIONS TO SOFTWARE

Licensor reserves the right to:

Update the Software at any time
Add or remove features
Change pricing (with 30 days notice)
Discontinue the Software (with 60 days notice)

14. COMPLIANCE

You agree to comply with all applicable laws and regulations in your jurisdiction when using the Software.

15. DISPUTE RESOLUTION

Any disputes arising from this agreement shall be:

Resolved through binding arbitration
Governed by California law
Conducted in English

16. ENTIRE AGREEMENT

This agreement, along with our Privacy Policy and Terms of Service, constitutes the entire agreement between you and Licensor regarding the Software.

17. CONTACT

For licensing inquiries or support:

Email: hello@unisai.vercel.app
Website: https://unisai.vercel.app
Support: vedxnts@gmail.com
X: vedxnts

By using the Software, you acknowledge that you have read, understood, and agree to be bound by this License Agreement.

File v1.1.8:manifest.yaml

name: "Prompt Performance Tester" id: "prompt-performance-tester" version: "1.1.8" description: "Model-agnostic prompt benchmarking across 9 providers. Pass any model ID from Claude, GPT, Gemini, DeepSeek, Grok, MiniMax, Qwen, Llama, Mistral — provider auto-detected. Measures latency, cost, quality, and consistency."

homepage: "https://unisai.vercel.app" repository: "https://github.com/vedantsingh60/prompt-performance-tester" source: "included"

intellectual_property: license: "free-to-use" license_file: "LICENSE.md" copyright: "© 2026 UnisAI. All rights reserved." distribution: "via-clawhub-only" source_code_access: "included" modification: "personal-use-only" reverse_engineering: "allowed-for-security-audit"

author: company: "UnisAI" contact: "hello@unisai.vercel.app" website: "https://unisai.vercel.app"

category: "ai-testing" tags:

"prompt-testing"
"performance-analysis"
"cost-optimization"
"multi-llm"
"quality-assurance"
"benchmarking"
"llm-comparison"
"ai-testing"

pricing: model: "free"

runtime: "local" execution: "python"

required_env_vars:

"ANTHROPIC_API_KEY" # Required if testing Claude models
"OPENAI_API_KEY" # Required if testing GPT models
"GOOGLE_API_KEY" # Required if testing Gemini models
"MISTRAL_API_KEY" # Required if testing Mistral models
"DEEPSEEK_API_KEY" # Required if testing DeepSeek models
"XAI_API_KEY" # Required if testing Grok/xAI models
"MINIMAX_API_KEY" # Required if testing MiniMax models
"DASHSCOPE_API_KEY" # Required if testing Qwen/Alibaba models
"OPENROUTER_API_KEY" # Required if testing Llama/OpenRouter models primary_credential: "At least ONE provider API key is required per provider you want to test"

dependencies: python: ">=3.9" packages: - "anthropic>=0.40.0" - "openai>=1.60.0" - "google-generativeai>=0.8.0" - "mistralai>=1.3.0" install_all: "pip install anthropic openai google-generativeai mistralai" install_selective: | pip install anthropic # Claude pip install openai # GPT, DeepSeek, xAI, MiniMax, Qwen, Llama (OpenAI-compat) pip install google-generativeai # Gemini pip install mistralai # Mistral note: "Install only the SDKs for the providers you plan to test. DeepSeek, xAI, MiniMax, Qwen, and Llama all use the openai package with a custom base URL." requirements_file: "requirements.txt"

security: data_retention: "0 days" data_flow: "prompts-sent-to-chosen-ai-providers" third_party_data_sharing: | WARNING: This skill sends your prompts to whichever AI providers you select for testing. Each provider has their own data retention and privacy policies: - Anthropic: https://www.anthropic.com/legal/privacy - OpenAI: https://openai.com/policies/privacy-policy - Google: https://ai.google.dev/gemini-api/terms - Mistral: https://mistral.ai/terms/ - DeepSeek: https://www.deepseek.com/privacy_policy - xAI: https://x.ai/privacy - OpenRouter: https://openrouter.ai/privacy api_key_storage: "Environment variables only — never hardcoded or logged" network_access: "Required to call chosen AI provider APIs"

capabilities: functions: - name: "testPrompt" description: "Test a prompt across multiple LLM models and providers" parameters: prompt_text: type: "string" description: "The prompt to benchmark" required: true models: type: "array" description: "List of model IDs to test — any model matching a supported prefix works" items: type: "string" examples: - "claude-sonnet-4-6" - "gpt-5.2" - "deepseek-chat" - "grok-4-1-fast" - "gemini-2.5-flash" required: false num_runs: type: "number" description: "Number of runs per model for consistency testing" default: 1 range: [1, 10] system_prompt: type: "string" description: "Optional system prompt" max_tokens: type: "number" description: "Maximum response tokens" default: 1000 range: [100, 4000]

environment_variables: ANTHROPIC_API_KEY: description: "Anthropic API key — required for any claude-* model" required_for_prefix: "claude-" OPENAI_API_KEY: description: "OpenAI API key — required for any gpt-, o1, o3* model" required_for_prefix: "gpt-, o1, o3" GOOGLE_API_KEY: description: "Google AI API key — required for any gemini-* model" required_for_prefix: "gemini-" MISTRAL_API_KEY: description: "Mistral API key — required for mistral-, mixtral- models" required_for_prefix: "mistral-, mixtral-" DEEPSEEK_API_KEY: description: "DeepSeek API key — required for any deepseek-* model" required_for_prefix: "deepseek-" XAI_API_KEY: description: "xAI API key — required for any grok-* model" required_for_prefix: "grok-" MINIMAX_API_KEY: description: "MiniMax API key — required for minimax* or MiniMax* models" required_for_prefix: "minimax, MiniMax" DASHSCOPE_API_KEY: description: "Alibaba DashScope API key — required for any qwen* model" required_for_prefix: "qwen" OPENROUTER_API_KEY: description: "OpenRouter API key — required for meta-llama/* or llama-* models" required_for_prefix: "meta-llama/, llama-"

support: support_email: "support@unisai.vercel.app" website: "https://unisai.vercel.app" github: "https://github.com/vedantsingh60/prompt-performance-tester" documentation: "See SKILL.md in this package" response_time: "Best effort — community supported"

restrictions:

"No redistribution outside ClawhHub registry"
"No resale or sublicensing"
"No trademark usage without permission"
"Modifications allowed for personal use only"

changelog: "1.1.8": - "🏗️ Model-agnostic architecture — provider auto-detected from model name prefix, no hardcoded whitelist" - "✨ Added DeepSeek, xAI Grok, MiniMax, Qwen as first-class providers (9 total)" - "✨ Updated Claude to 4.6 series (claude-opus-4-6, claude-sonnet-4-6)" - "✨ Any future model works automatically without code changes" - "🔧 Lazy client initialization — only loads SDKs for providers actually used" - "🔧 Unified OpenAI-compat path for DeepSeek, xAI, MiniMax, Qwen, OpenRouter" - "📝 Fixed UnisAI branding (was UniAI)" - "💰 Updated pricing table with 20 models across 9 providers" "1.1.5": - "🚀 Updated to latest 2026 models" - "✨ GPT-5.2 series (Instant, Thinking, Pro)" - "✨ Gemini 3 Pro and 2.5 series" - "✨ Claude 4.5 pricing updates" - "✨ 10 total models across 3 providers" "1.1.0": - "✨ Multi-provider support (Claude, GPT, Gemini)" - "✨ Cross-provider cost comparison" - "✨ Enhanced recommendations engine" "1.0.0": - "Initial release with Claude-only support" - "Performance metrics: latency, cost, quality, consistency"

metadata: status: "active" created_at: "2024-02-02T00:00:00Z" updated_at: "2026-02-27T00:00:00Z" maturity: "production" maintenance: "actively-maintained" compatibility: - "OpenClaw v1.0+" - "Claude Code" - "ClawhHub v2.0+" security_audit: "Source code included for security review and transparency"

API & Reliability

Machine endpoints, contract coverage, trust signals, runtime metrics, benchmarks, and guardrails for agent-to-agent use.

MissingCLAWHUB

Machine interfaces

Contract & API

Endpoints

Dossier API Snapshot API Contract API Trust API

Contract coverage

Status

missing

Auth

None

Streaming

No

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.

Invocation examples

curl -s "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/snapshot"

curl -s "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract"

curl -s "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust"

Operational fit

Reliability & Benchmarks

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.

No benchmark suites or observed failure patterns are available.

Media & Related

Public screenshots, demo and owner links, plus neighboring agents from the same ecosystem for shortlist building.

Missingno-media

Visual context

Media & Demo

No screenshots, media assets, or demo links are available.

Comparison set

Related Agents

GITHUB_REPOSactivepieces

Rank

70

AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents

Traction

No public download signal

Freshness

Updated 2d ago

OPENCLAW

GITHUB_REPOScherry-studio

Rank

70

AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW

GITHUB_REPOSAionUi

Rank

70

Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW

GITHUB_REPOSCopilotKit

Rank

70

The Frontend for Agents & Generative UI. React + Angular

Traction

No public download signal

Freshness

Updated 23d ago

OPENCLAW

Machine Appendix

Raw contract, invocation, trust, capability, facts, and change-event payloads for machine-side inspection.

MissingCLAWHUB

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "CLAWHUB",
      "generatedAt": "2026-04-17T02:48:13.304Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}

Facts JSON

[
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Clawhub",
    "href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "traction",
    "category": "adoption",
    "label": "Adoption signal",
    "value": "1.9K downloads",
    "href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "latest_release",
    "category": "release",
    "label": "Latest release",
    "value": "1.1.9",
    "href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceType": "release",
    "confidence": "medium",
    "observedAt": "2026-02-27T17:27:39.522Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-vedantsingh60-prompt-performance-tester/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "release",
    "title": "Release 1.1.9",
    "description": "- Updated provider/model lists to include more example models and the latest pricing. - Expanded cost comparison examples to include DeepSeek Chat. - Added clarification that unlisted models are supported, but cost is shown as $0.00 with a warning. - Improved explanation of quality, cost, and performance metrics for broader clarity. - Enhanced recommendation and real-world example sections to better showcase DeepSeek and new models.",
    "href": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceUrl": "https://clawhub.ai/vedantsingh60/prompt-performance-tester",
    "sourceType": "release",
    "confidence": "medium",
    "observedAt": "2026-02-27T17:27:39.522Z",
    "isPublic": true
  }
]