Rank
83
A Model Context Protocol (MCP) server for GitLab
Traction
No public download signal
Freshness
Updated 2d ago
Crawler Summary
33 specialized judges that evaluate AI-generated code for security, cost, and quality. Judges Panel An MCP (Model Context Protocol) server that provides a panel of **33 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Includes **built-in AST analysis** powered by the TypeScript Compiler API — no separate parser server needed. **Highlights:** - Includes an **App Builder Workflow (3-step)** demo for release decisions Capability contract not published. No trust telemetry is available yet. 5 GitHub stars reported by the source. Last updated 2/25/2026.
Freshness
Last checked 2/25/2026
Best For
@kevinrabun/judges is best for mcp, model-context-protocol, mcp-server workflows where MCP compatibility matters.
Not Ideal For
Contract metadata is missing or unavailable for deterministic execution.
Evidence Sources Checked
editorial-content, GITHUB MCP, runtime-metrics, public facts pack
33 specialized judges that evaluate AI-generated code for security, cost, and quality. Judges Panel An MCP (Model Context Protocol) server that provides a panel of **33 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Includes **built-in AST analysis** powered by the TypeScript Compiler API — no separate parser server needed. **Highlights:** - Includes an **App Builder Workflow (3-step)** demo for release decisions
Public facts
5
Change events
1
Artifacts
0
Freshness
Feb 25, 2026
Capability contract not published. No trust telemetry is available yet. 5 GitHub stars reported by the source. Last updated 2/25/2026.
Trust score
Unknown
Compatibility
MCP
Freshness
Feb 25, 2026
Vendor
Kevinrabun
Artifacts
0
Benchmarks
0
Last release
2.1.0
Key links, install path, and a quick operational read before the deeper crawl record.
Summary
Capability contract not published. No trust telemetry is available yet. 5 GitHub stars reported by the source. Last updated 2/25/2026.
Setup snapshot
git clone https://github.com/KevinRabun/judges.gitSetup complexity is MEDIUM. Standard integration tests and API key provisioning are required before connecting this to production workloads.
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.
Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.
Vendor
Kevinrabun
Protocol compatibility
MCP
Adoption signal
5 GitHub stars
Handshake status
UNKNOWN
Crawlable docs
6 indexed pages on the official domain
Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.
Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.
Extracted files
0
Examples
6
Snippets
0
Languages
typescript
bash
git clone https://github.com/KevinRabun/judges.git cd judges npm install npm run build
bash
npm run demo
text
╔══════════════════════════════════════════════════════════════╗
║ App Builder Workflow Demo (3-Step) ║
╚══════════════════════════════════════════════════════════════╝
Decision : Do not ship
Verdict : FAIL (47/100)
Risk Counts : Critical 24 | High 27 | Medium 55
Step 2 — Plain-Language Findings:
- [CRITICAL] DATA-001: Hardcoded password detected
What: ...
Why : ...
Next: ...
Step 3 — Prioritized Tasks:
- P0 | DEVELOPER | Effort L | DATA-001
Task: ...
Done: ...
AI-Fixable Now (P0/P1):
- P0 DATA-001: ...text
╔══════════════════════════════════════════════════════════════╗ ║ Judges Panel — Full Tribunal Demo ║ ╚══════════════════════════════════════════════════════════════╝ Overall Verdict : FAIL Overall Score : 43/100 Critical Issues : 15 High Issues : 17 Total Findings : 83 Judges Run : 33 Per-Judge Breakdown: ──────────────────────────────────────────────────────────────── ❌ Judge Data Security 0/100 7 finding(s) ❌ Judge Cybersecurity 0/100 7 finding(s) ❌ Judge Cost Effectiveness 52/100 5 finding(s) ⚠️ Judge Scalability 65/100 4 finding(s) ❌ Judge Cloud Readiness 61/100 4 finding(s) ❌ Judge Software Practices 45/100 6 finding(s) ❌ Judge Accessibility 0/100 8 finding(s) ❌ Judge API Design 0/100 9 finding(s) ❌ Judge Reliability 54/100 3 finding(s) ❌ Judge Observability 45/100 5 finding(s) ❌ Judge Performance 27/100 5 finding(s) ❌ Judge Compliance 0/100 4 finding(s) ⚠️ Judge Testing 90/100 1 finding(s) ⚠️ Judge Documentation 70/100 4 finding(s) ⚠️ Judge Internationalization 65/100 4 finding(s) ⚠️ Judge Dependency Health 90/100 1 finding(s) ❌ Judge Concurrency 44/100 4 finding(s) ❌ Judge Ethics & Bias 65/100 2 finding(s) ❌ Judge Maintainability 52/100 4 finding(s) ❌ Judge Error Handling 27/100 3 finding(s) ❌ Judge Authentication 0/100 4 finding(s) ❌ Judge Database 0/100 5 finding(s) ❌ Judge Caching 62/100 3 finding(s) ❌ Judge Configuration Mgmt 0/100 3 finding(s) ⚠️ Judge Backwards Compat 80/100 2 finding(s) ⚠️ Judge Portability 72/100 2 finding(s) ❌ Judge UX
bash
npm test
json
{
"servers": {
"judges": {
"command": "node",
"args": ["/absolute/path/to/judges/dist/index.js"]
}
}
}Full documentation captured from public sources, including the complete README when available.
Docs source
GITHUB MCP
Editorial quality
ready
33 specialized judges that evaluate AI-generated code for security, cost, and quality. Judges Panel An MCP (Model Context Protocol) server that provides a panel of **33 specialized judges** to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Includes **built-in AST analysis** powered by the TypeScript Compiler API — no separate parser server needed. **Highlights:** - Includes an **App Builder Workflow (3-step)** demo for release decisions
An MCP (Model Context Protocol) server that provides a panel of 33 specialized judges to evaluate AI-generated code — acting as an independent quality gate regardless of which project is being reviewed. Includes built-in AST analysis powered by the TypeScript Compiler API — no separate parser server needed.
Highlights:
git clone https://github.com/KevinRabun/judges.git
cd judges
npm install
npm run build
Run the included demo to see all 33 judges evaluate a purposely flawed API server:
npm run demo
This evaluates examples/sample-vulnerable-api.ts — a file intentionally packed with security holes, performance anti-patterns, and code quality issues — and prints a full verdict with per-judge scores and findings.
The demo now also includes an App Builder Workflow (3-step) section. In a single run, you get both tribunal output and workflow output:
Ship now / Ship with caution / Do not ship)P0/P1 itemsSample workflow output (truncated):
╔══════════════════════════════════════════════════════════════╗
║ App Builder Workflow Demo (3-Step) ║
╚══════════════════════════════════════════════════════════════╝
Decision : Do not ship
Verdict : FAIL (47/100)
Risk Counts : Critical 24 | High 27 | Medium 55
Step 2 — Plain-Language Findings:
- [CRITICAL] DATA-001: Hardcoded password detected
What: ...
Why : ...
Next: ...
Step 3 — Prioritized Tasks:
- P0 | DEVELOPER | Effort L | DATA-001
Task: ...
Done: ...
AI-Fixable Now (P0/P1):
- P0 DATA-001: ...
Sample tribunal output (truncated):
╔══════════════════════════════════════════════════════════════╗
║ Judges Panel — Full Tribunal Demo ║
╚══════════════════════════════════════════════════════════════╝
Overall Verdict : FAIL
Overall Score : 43/100
Critical Issues : 15
High Issues : 17
Total Findings : 83
Judges Run : 33
Per-Judge Breakdown:
────────────────────────────────────────────────────────────────
❌ Judge Data Security 0/100 7 finding(s)
❌ Judge Cybersecurity 0/100 7 finding(s)
❌ Judge Cost Effectiveness 52/100 5 finding(s)
⚠️ Judge Scalability 65/100 4 finding(s)
❌ Judge Cloud Readiness 61/100 4 finding(s)
❌ Judge Software Practices 45/100 6 finding(s)
❌ Judge Accessibility 0/100 8 finding(s)
❌ Judge API Design 0/100 9 finding(s)
❌ Judge Reliability 54/100 3 finding(s)
❌ Judge Observability 45/100 5 finding(s)
❌ Judge Performance 27/100 5 finding(s)
❌ Judge Compliance 0/100 4 finding(s)
⚠️ Judge Testing 90/100 1 finding(s)
⚠️ Judge Documentation 70/100 4 finding(s)
⚠️ Judge Internationalization 65/100 4 finding(s)
⚠️ Judge Dependency Health 90/100 1 finding(s)
❌ Judge Concurrency 44/100 4 finding(s)
❌ Judge Ethics & Bias 65/100 2 finding(s)
❌ Judge Maintainability 52/100 4 finding(s)
❌ Judge Error Handling 27/100 3 finding(s)
❌ Judge Authentication 0/100 4 finding(s)
❌ Judge Database 0/100 5 finding(s)
❌ Judge Caching 62/100 3 finding(s)
❌ Judge Configuration Mgmt 0/100 3 finding(s)
⚠️ Judge Backwards Compat 80/100 2 finding(s)
⚠️ Judge Portability 72/100 2 finding(s)
❌ Judge UX 52/100 4 finding(s)
❌ Judge Logging Privacy 0/100 4 finding(s)
❌ Judge Rate Limiting 27/100 4 finding(s)
⚠️ Judge CI/CD 80/100 2 finding(s)
npm test
Runs automated tests covering all judges, AST parsers, markdown formatters, and edge cases.
Add the Judges Panel as an MCP server so your AI coding assistant can use it automatically.
VS Code — create .vscode/mcp.json in your project:
{
"servers": {
"judges": {
"command": "node",
"args": ["/absolute/path/to/judges/dist/index.js"]
}
}
}
Claude Desktop — add to claude_desktop_config.json:
{
"mcpServers": {
"judges": {
"command": "node",
"args": ["/absolute/path/to/judges/dist/index.js"]
}
}
}
Or install from npm instead of cloning:
npm install -g @kevinrabun/judges
Then use judges as the command in your MCP config (no args needed).
Yes — users can include Judges as part of GitHub-based review workflows, with one important caveat:
copilot-pull-request-reviewer on GitHub does not currently let you directly attach arbitrary local MCP servers the same way VS Code does.Create .github/workflows/judges-pr-review.yml:
name: Judges PR Review
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
judges:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
- name: Install
run: npm ci
- name: Generate Judges report
run: |
npx tsx -e "import { generateRepoReportFromLocalPath } from './src/reports/public-repo-report.ts';
const result = generateRepoReportFromLocalPath({
repoPath: process.cwd(),
outputPath: 'judges-pr-report.md',
maxFiles: 600,
maxFindingsInReport: 150,
});
console.log('Overall:', result.overallVerdict, result.averageScore);"
- name: Upload report artifact
uses: actions/upload-artifact@v4
with:
name: judges-pr-report
path: judges-pr-report.md
This gives every PR a reproducible Judges output your team (and Copilot) can reference.
Add .github/instructions/judges.instructions.md with guidance such as:
When reviewing pull requests:
1. Read the latest Judges report artifact/check output first.
2. Prioritize CRITICAL and HIGH findings in remediation guidance.
3. If findings conflict, defer to security/compliance-related Judges.
4. Include rule IDs (e.g., DATA-001, CYBER-004) in suggested fixes.
This helps keep Copilot feedback aligned with Judges findings.
| Judge | Domain | Rule Prefix | What It Evaluates |
|-------|--------|-------------|-------------------|
| Data Security | Data Security & Privacy | DATA- | Encryption, PII handling, secrets management, access controls |
| Cybersecurity | Cybersecurity & Threat Defense | CYBER- | Injection attacks, XSS, CSRF, auth flaws, OWASP Top 10 |
| Cost Effectiveness | Cost Optimization | COST- | Algorithm efficiency, N+1 queries, memory waste, caching strategy |
| Scalability | Scalability & Performance | SCALE- | Statelessness, horizontal scaling, concurrency, bottlenecks |
| Cloud Readiness | Cloud-Native & DevOps | CLOUD- | 12-Factor compliance, containerization, graceful shutdown, IaC |
| Software Practices | Engineering Best Practices | SWDEV- | SOLID principles, type safety, error handling, input validation |
| Accessibility | Accessibility (a11y) | A11Y- | WCAG compliance, screen reader support, keyboard navigation, ARIA |
| API Design | API Design & Contracts | API- | REST conventions, versioning, pagination, error responses |
| Reliability | Reliability & Resilience | REL- | Error handling, timeouts, retries, circuit breakers |
| Observability | Observability & Monitoring | OBS- | Structured logging, health checks, metrics, tracing |
| Performance | Performance & Efficiency | PERF- | N+1 queries, sync I/O, caching, memory leaks |
| Compliance | Regulatory Compliance | COMP- | GDPR/CCPA, PII protection, consent, data retention, audit trails |
| Data Sovereignty | Data Sovereignty & Jurisdictional Controls | SOV- | Data residency, cross-border transfer controls, jurisdiction-aware routing, sovereignty guardrails |
| Testing | Testing & Quality Assurance | TEST- | Test coverage, assertions, test isolation, naming |
| Documentation | Documentation & Readability | DOC- | JSDoc/docstrings, magic numbers, TODOs, code comments |
| Internationalization | Internationalization (i18n) | I18N- | Hardcoded strings, locale handling, currency formatting |
| Dependency Health | Dependency Management | DEPS- | Version pinning, deprecated packages, supply chain |
| Concurrency | Concurrency & Async Safety | CONC- | Race conditions, unbounded parallelism, missing await |
| Ethics & Bias | Ethics & Bias | ETHICS- | Demographic logic, dark patterns, inclusive language |
| Maintainability | Code Maintainability & Technical Debt | MAINT- | Any types, magic numbers, deep nesting, dead code, file length |
| Error Handling | Error Handling & Fault Tolerance | ERR- | Empty catch blocks, missing error handlers, swallowed errors |
| Authentication | Authentication & Authorization | AUTH- | Hardcoded creds, missing auth middleware, token in query params |
| Database | Database Design & Query Efficiency | DB- | SQL injection, N+1 queries, connection pooling, transactions |
| Caching | Caching Strategy & Data Freshness | CACHE- | Unbounded caches, missing TTL, no HTTP cache headers |
| Configuration Mgmt | Configuration & Secrets Management | CFG- | Hardcoded secrets, missing env vars, config validation |
| Backwards Compat | Backwards Compatibility & Versioning | COMPAT- | API versioning, breaking changes, response consistency |
| Portability | Platform Portability & Vendor Independence | PORTA- | OS-specific paths, vendor lock-in, hardcoded hosts |
| UX | User Experience & Interface Quality | UX- | Loading states, error messages, pagination, destructive actions |
| Logging Privacy | Logging Privacy & Data Redaction | LOGPRIV- | PII in logs, token logging, structured logging, redaction |
| Rate Limiting | Rate Limiting & Throttling | RATE- | Missing rate limits, unbounded queries, backoff strategy |
| CI/CD | CI/CD Pipeline & Deployment Safety | CICD- | Test infrastructure, lint config, Docker tags, build scripts |
| Code Structure | Structural Analysis (AST) | STRUCT- | Cyclomatic complexity, nesting depth, function length, dead code, type safety |
| Agent Instructions | Agent Instruction Markdown Quality & Safety | AGENT- | Instruction hierarchy, conflict detection, unsafe overrides, scope, validation, policy guidance |
The tribunal operates in three layers:
Pattern-Based Analysis — All tools (evaluate_code, evaluate_code_single_judge, evaluate_project, evaluate_diff) perform heuristic analysis using regex pattern matching to catch common anti-patterns. This works entirely offline with zero external API calls.
AST-Based Structural Analysis — The Code Structure judge (STRUCT-* rules) uses real Abstract Syntax Tree parsing to measure cyclomatic complexity, nesting depth, function length, parameter count, dead code, and type safety with precision that regex cannot achieve. JavaScript/TypeScript files are parsed via the TypeScript Compiler API; Python, Rust, Go, Java, and C# use a scope-tracking structural parser. No external AST server required.
LLM-Powered Deep Analysis (Prompts) — The server exposes MCP prompts (e.g., judge-data-security, full-tribunal) that provide each judge's expert persona as a system prompt. When used by an LLM-based client, this enables deeper, context-aware analysis beyond what static analysis can detect.
Judges Panel covers heuristic pattern detection and AST structural analysis in a single server — fast, offline, and self-contained. It does not try to be a CVE scanner or a linter. Those capabilities belong in dedicated MCP servers that an AI agent can orchestrate alongside Judges.
Unlike earlier versions that recommended a separate AST MCP server, Judges Panel now includes real AST-based structural analysis out of the box:
ts.createSourceFile) for full-fidelity ASTThe Code Structure judge (STRUCT-*) uses these parsers to accurately measure:
| Rule | Metric | Threshold |
|------|--------|-----------|
| STRUCT-001 | Cyclomatic complexity | > 10 per function (high) |
| STRUCT-002 | Nesting depth | > 4 levels (medium) |
| STRUCT-003 | Function length | > 50 lines (medium) |
| STRUCT-004 | Parameter count | > 5 parameters (medium) |
| STRUCT-005 | Dead code | Unreachable statements (low) |
| STRUCT-006 | Weak types | any, dynamic, Object, interface{}, unsafe (medium) |
| STRUCT-007 | File complexity | > 40 total cyclomatic complexity (high) |
| STRUCT-008 | Extreme complexity | > 20 per function (critical) |
| STRUCT-009 | Extreme parameters | > 8 parameters (high) |
| STRUCT-010 | Extreme function length | > 150 lines (high) |
When your AI coding assistant connects to multiple MCP servers, each one contributes its specialty:
┌─────────────────────────────────────────────────────────┐
│ AI Coding Assistant │
│ (Claude, Copilot, Cursor, etc.) │
└──────┬──────────────────┬──────────┬───────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌────────┐ ┌────────┐
│ Judges │ │ CVE / │ │ Linter │
│ Panel │ │ SBOM │ │ Server │
│ ─────────────│ └────────┘ └────────┘
│ 32 Heuristic │ Vuln DB Style &
│ judges │ scanning correctness
│ + AST judge │
└──────────────┘
Patterns +
structural
analysis
| Layer | What It Does | Example Servers | |-------|-------------|-----------------| | Judges Panel | 33-judge quality gate — security patterns, AST analysis, cost, scalability, a11y, compliance, sovereignty, ethics, dependency health, agent instruction governance | This server | | CVE / SBOM | Vulnerability scanning against live databases — known CVEs, license risks, supply chain | OSV, Snyk, Trivy, Grype MCP servers | | Linting | Language-specific style and correctness rules | ESLint, Ruff, Clippy MCP servers | | Runtime Profiling | Memory, CPU, latency measurement on running code | Custom profiling MCP servers |
When you ask your AI assistant "Is this code production-ready?", the agent can:
package.json against known vulnerabilitiesEach server returns structured findings. The AI synthesizes everything into a single, actionable review — no single server needs to do it all.
evaluate_v2Run a V2 context-aware tribunal evaluation designed to raise feedback quality toward lead engineer/architect-level review:
default, startup, regulated, healthcare, fintech, public-sector)Supports:
code + languagefiles[]| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| code | string | conditional | Source code for single-file mode |
| language | string | conditional | Programming language for single-file mode |
| files | array | conditional | { path, content, language }[] for project mode |
| context | string | no | High-level review context |
| includeAstFindings | boolean | no | Include AST/code-structure findings (default: true) |
| minConfidence | number | no | Minimum finding confidence to include (0-1, default: 0) |
| policyProfile | enum | no | default, startup, regulated, healthcare, fintech, public-sector |
| evaluationContext | object | no | Structured architecture/constraint context |
| evidence | object | no | Runtime/operational evidence for confidence calibration |
evaluate_app_builder_flowRun a 3-step app-builder workflow for technical and non-technical stakeholders:
Supports:
code + languagefiles[]code + language + changedLines[]| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| code | string | conditional | Full source content (code/diff mode) |
| language | string | conditional | Programming language (code/diff mode) |
| files | array | conditional | { path, content, language }[] for project mode |
| changedLines | number[] | no | 1-based changed lines for diff mode |
| context | string | no | Optional business/technical context |
| maxFindings | number | no | Max translated top findings (default: 10) |
| maxTasks | number | no | Max generated tasks (default: 20) |
| includeAstFindings | boolean | no | Include AST/code-structure findings (default: true) |
| minConfidence | number | no | Minimum finding confidence to include (0-1, default: 0) |
evaluate_public_repo_reportClone a public repository URL, run the full judges panel across eligible source files, and generate a consolidated markdown report.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| repoUrl | string | yes | Public repository URL (https://...) |
| branch | string | no | Optional branch name |
| outputPath | string | no | Optional path to write report markdown |
| maxFiles | number | no | Max files analyzed (default: 600) |
| maxFileBytes | number | no | Max file size in bytes (default: 300000) |
| maxFindingsInReport | number | no | Max detailed findings in output (default: 150) |
| credentialMode | string | no | Credential detection mode: standard (default) or strict |
| includeAstFindings | boolean | no | Include AST/code-structure findings (default: true) |
| minConfidence | number | no | Minimum finding confidence to include (0-1, default: 0) |
| quickStart | flag | no | Opinionated high-signal defaults for onboarding (minConfidence=0.9, credentialMode=strict, path exclusions) |
| keepClone | boolean | no | Keep cloned repo on disk for inspection |
Quick examples
Generate a report from CLI:
npm run report:public-repo -- --repoUrl https://github.com/microsoft/vscode --output reports/vscode-judges-report.md
# stricter credential-signal mode (optional)
npm run report:public-repo -- --repoUrl https://github.com/openclaw/openclaw --credentialMode strict --output reports/openclaw-judges-report-strict.md
# judge findings only (exclude AST/code-structure findings)
npm run report:public-repo -- --repoUrl https://github.com/openclaw/openclaw --includeAstFindings false --output reports/openclaw-judges-report-no-ast.md
# show only findings at 80%+ confidence
npm run report:public-repo -- --repoUrl https://github.com/openclaw/openclaw --minConfidence 0.8 --output reports/openclaw-judges-report-high-confidence.md
# opinionated quick-start mode (recommended first run)
npm run report:quickstart -- --repoUrl https://github.com/openclaw/openclaw --output reports/openclaw-quickstart.md
Call from MCP client:
{
"tool": "evaluate_public_repo_report",
"arguments": {
"repoUrl": "https://github.com/microsoft/vscode",
"branch": "main",
"maxFiles": 400,
"maxFindingsInReport": 120,
"credentialMode": "strict",
"includeAstFindings": false,
"minConfidence": 0.8,
"outputPath": "reports/vscode-judges-report.md"
}
}
Typical response summary includes:
Sample report snippet:
# Public Repository Full Judges Report
Generated from https://github.com/microsoft/vscode on 2026-02-21T12:00:00.000Z.
## Executive Summary
- Overall verdict: WARNING
- Average file score: 78/100
- Total findings: 412 (critical 3, high 29, medium 114, low 185, info 81)
get_judgesList all available judges with their domains and descriptions.
evaluate_codeSubmit code to the full judges panel. All 33 judges evaluate independently and return a combined verdict.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| code | string | yes | The source code to evaluate |
| language | string | yes | Programming language (e.g., typescript, python) |
| context | string | no | Additional context about the code |
| includeAstFindings | boolean | no | Include AST/code-structure findings (default: true) |
| minConfidence | number | no | Minimum finding confidence to include (0-1, default: 0) |
evaluate_code_single_judgeSubmit code to a specific judge for targeted review.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| code | string | yes | The source code to evaluate |
| language | string | yes | Programming language |
| judgeId | string | yes | See judge IDs below |
| context | string | no | Additional context |
| minConfidence | number | no | Minimum finding confidence to include (0-1, default: 0) |
evaluate_projectSubmit multiple files for project-level analysis. All 33 judges evaluate each file, plus cross-file architectural analysis detects code duplication, inconsistent error handling, and dependency cycles.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| files | array | yes | Array of { path, content, language } objects |
| context | string | no | Optional project context |
| includeAstFindings | boolean | no | Include AST/code-structure findings (default: true) |
| minConfidence | number | no | Minimum finding confidence to include (0-1, default: 0) |
evaluate_diffEvaluate only the changed lines in a code diff. Runs all 33 judges on the full file but filters findings to lines you specify. Ideal for PR reviews and incremental analysis.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| code | string | yes | The full file content (post-change) |
| language | string | yes | Programming language |
| changedLines | number[] | yes | 1-based line numbers that were changed |
| context | string | no | Optional context about the change |
| includeAstFindings | boolean | no | Include AST/code-structure findings (default: true) |
| minConfidence | number | no | Minimum finding confidence to include (0-1, default: 0) |
analyze_dependenciesAnalyze a dependency manifest file for supply-chain risks, version pinning issues, typosquatting indicators, and dependency hygiene. Supports package.json, requirements.txt, Cargo.toml, go.mod, pom.xml, and .csproj files.
| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| manifest | string | yes | Contents of the dependency manifest file |
| manifestType | string | yes | File type: package.json, requirements.txt, etc. |
| context | string | no | Optional context |
data-security · cybersecurity · cost-effectiveness · scalability · cloud-readiness · software-practices · accessibility · api-design · reliability · observability · performance · compliance · data-sovereignty · testing · documentation · internationalization · dependency-health · concurrency · ethics-bias · maintainability · error-handling · authentication · database · caching · configuration-management · backwards-compatibility · portability · ux · logging-privacy · rate-limiting · ci-cd · code-structure · agent-instructions
Each judge has a corresponding prompt for LLM-powered deep analysis:
| Prompt | Description |
|--------|-------------|
| judge-data-security | Deep data security review |
| judge-cybersecurity | Deep cybersecurity review |
| judge-cost-effectiveness | Deep cost optimization review |
| judge-scalability | Deep scalability review |
| judge-cloud-readiness | Deep cloud readiness review |
| judge-software-practices | Deep software practices review |
| judge-accessibility | Deep accessibility/WCAG review |
| judge-api-design | Deep API design review |
| judge-reliability | Deep reliability & resilience review |
| judge-observability | Deep observability & monitoring review |
| judge-performance | Deep performance optimization review |
| judge-compliance | Deep regulatory compliance review |
| judge-data-sovereignty | Deep data sovereignty and jurisdictional controls review |
| judge-testing | Deep testing quality review |
| judge-documentation | Deep documentation quality review |
| judge-internationalization | Deep i18n review |
| judge-dependency-health | Deep dependency health review |
| judge-concurrency | Deep concurrency & async safety review |
| judge-ethics-bias | Deep ethics & bias review |
| judge-maintainability | Deep maintainability & tech debt review |
| judge-error-handling | Deep error handling review |
| judge-authentication | Deep authentication & authorization review |
| judge-database | Deep database design & query review |
| judge-caching | Deep caching strategy review |
| judge-configuration-management | Deep configuration & secrets review |
| judge-backwards-compatibility | Deep backwards compatibility review |
| judge-portability | Deep platform portability review |
| judge-ux | Deep user experience review |
| judge-logging-privacy | Deep logging privacy review |
| judge-rate-limiting | Deep rate limiting review |
| judge-ci-cd | Deep CI/CD pipeline review |
| judge-code-structure | Deep AST-based structural analysis review |
| judge-agent-instructions | Deep review of agent instruction markdown quality and safety |
| full-tribunal | All 33 judges in a single prompt |
Each judge scores the code from 0 to 100:
| Severity | Score Deduction | |----------|----------------| | Critical | −30 points | | High | −18 points | | Medium | −10 points | | Low | −5 points | | Info | −2 points |
Verdict logic:
The overall tribunal score is the average of all 33 judges. The overall verdict fails if any judge fails.
judges/
├── src/
│ ├── index.ts # MCP server entry point — tools, prompts, transport
│ ├── types.ts # TypeScript interfaces (Finding, JudgeEvaluation, etc.)
│ ├── ast/ # AST analysis engine (built-in, no external deps)
│ │ ├── index.ts # analyzeStructure() — routes to correct parser
│ │ ├── types.ts # FunctionInfo, CodeStructure interfaces
│ │ ├── typescript-ast.ts # TypeScript Compiler API parser (JS/TS)
│ │ └── structural-parser.ts # Scope-tracking parser (Python/Rust/Go/Java/C#)
│ ├── evaluators/ # Analysis engine for each judge
│ │ ├── index.ts # evaluateWithJudge(), evaluateWithTribunal(), evaluateProject(), etc.
│ │ ├── shared.ts # Scoring, verdict logic, markdown formatters
│ │ └── *.ts # One analyzer per judge (33 files)
│ ├── reports/
│ │ └── public-repo-report.ts # Public repo clone + full tribunal report generation
│ └── judges/ # Judge definitions (id, name, domain, system prompt)
│ ├── index.ts # JUDGES array, getJudge(), getJudgeSummaries()
│ └── *.ts # One definition per judge (33 files)
├── scripts/
│ ├── generate-public-repo-report.ts # Run: npm run report:public-repo -- --repoUrl <url>
│ └── daily-popular-repo-autofix.ts # Run: npm run automation:daily-popular
├── examples/
│ ├── sample-vulnerable-api.ts # Intentionally flawed code (triggers all judges)
│ └── demo.ts # Run: npm run demo
├── tests/
│ └── judges.test.ts # Run: npm test
├── server.json # MCP Registry manifest
├── package.json
├── tsconfig.json
└── README.md
| Command | Description |
|---------|-------------|
| npm run build | Compile TypeScript to dist/ |
| npm run dev | Watch mode — recompile on save |
| npm test | Run the full test suite |
| npm run demo | Run the sample tribunal demo |
| npm run report:public-repo -- --repoUrl <url> | Generate a full tribunal report for a public repository URL |
| npm run report:quickstart -- --repoUrl <url> | Run opinionated high-signal report defaults for fast adoption |
| npm run automation:daily-popular | Analyze up to 10 rotating popular repos/day and open up to 5 remediation PRs per repo |
| npm start | Start the MCP server |
| npm run clean | Remove dist/ |
This repo includes a scheduled workflow at .github/workflows/daily-popular-repo-autofix.yml that:
Required secret:
JUDGES_AUTOFIX_GH_TOKEN — GitHub token with permission to fork/push/create PRs for target repositories.Manual run:
gh workflow run "Judges Daily Full-Run Autofix PRs" -f targetRepoUrl=https://github.com/owner/repo
MIT
Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.
Contract coverage
Status
missing
Auth
None
Streaming
No
Data region
Unspecified
Protocol support
Requires: none
Forbidden: none
Guardrails
Operational confidence: low
curl -s "https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/snapshot"
curl -s "https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/contract"
curl -s "https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/trust"
Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.
Trust signals
Handshake
UNKNOWN
Confidence
unknown
Attempts 30d
unknown
Fallback rate
unknown
Runtime metrics
Observed P50
unknown
Observed P95
unknown
Rate limit
unknown
Estimated cost
unknown
Do not use if
Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.
Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.
Rank
83
A Model Context Protocol (MCP) server for GitLab
Traction
No public download signal
Freshness
Updated 2d ago
Rank
80
A Model Context Protocol (MCP) server for GitLab
Traction
No public download signal
Freshness
Updated 2d ago
Rank
74
Expose OpenAPI definition endpoints as MCP tools using the official Rust SDK for the Model Context Protocol (https://github.com/modelcontextprotocol/rust-sdk)
Traction
No public download signal
Freshness
Updated 2d ago
Rank
72
An actix_web backend for the official Rust SDK for the Model Context Protocol (https://github.com/modelcontextprotocol/rust-sdk)
Traction
No public download signal
Freshness
Updated 2d ago
Contract JSON
{
"contractStatus": "missing",
"authModes": [],
"requires": [],
"forbidden": [],
"supportsMcp": false,
"supportsA2a": false,
"supportsStreaming": false,
"inputSchemaRef": null,
"outputSchemaRef": null,
"dataRegion": null,
"contractUpdatedAt": null,
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Invocation Guide
{
"preferredApi": {
"snapshotUrl": "https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/snapshot",
"contractUrl": "https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/contract",
"trustUrl": "https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/trust"
},
"curlExamples": [
"curl -s \"https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/snapshot\"",
"curl -s \"https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/contract\"",
"curl -s \"https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/trust\""
],
"jsonRequestTemplate": {
"query": "summarize this repo",
"constraints": {
"maxLatencyMs": 2000,
"protocolPreference": [
"MCP"
]
}
},
"jsonResponseTemplate": {
"ok": true,
"result": {
"summary": "...",
"confidence": 0.9
},
"meta": {
"source": "GITHUB_MCP",
"generatedAt": "2026-04-17T05:13:39.737Z"
}
},
"retryPolicy": {
"maxAttempts": 3,
"backoffMs": [
500,
1500,
3500
],
"retryableConditions": [
"HTTP_429",
"HTTP_503",
"NETWORK_TIMEOUT"
]
}
}Trust JSON
{
"status": "unavailable",
"handshakeStatus": "UNKNOWN",
"verificationFreshnessHours": null,
"reputationScore": null,
"p95LatencyMs": null,
"successRate30d": null,
"fallbackRate": null,
"attempts30d": null,
"trustUpdatedAt": null,
"trustConfidence": "unknown",
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Capability Matrix
{
"rows": [
{
"key": "MCP",
"type": "protocol",
"support": "unknown",
"confidenceSource": "profile",
"notes": "Listed on profile"
},
{
"key": "mcp",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
},
{
"key": "model-context-protocol",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
},
{
"key": "mcp-server",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
},
{
"key": "code-review",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
},
{
"key": "security",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
},
{
"key": "tribunal",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
},
{
"key": "ai-code-review",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
},
{
"key": "code-quality",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
},
{
"key": "static-analysis",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
},
{
"key": "cli",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
}
],
"flattenedTokens": "protocol:MCP|unknown|profile capability:mcp|supported|profile capability:model-context-protocol|supported|profile capability:mcp-server|supported|profile capability:code-review|supported|profile capability:security|supported|profile capability:tribunal|supported|profile capability:ai-code-review|supported|profile capability:code-quality|supported|profile capability:static-analysis|supported|profile capability:cli|supported|profile"
}Facts JSON
[
{
"factKey": "docs_crawl",
"category": "integration",
"label": "Crawlable docs",
"value": "6 indexed pages on the official domain",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
},
{
"factKey": "vendor",
"category": "vendor",
"label": "Vendor",
"value": "Kevinrabun",
"href": "https://github.com/KevinRabun/judges#readme",
"sourceUrl": "https://github.com/KevinRabun/judges#readme",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-02-25T03:08:24.675Z",
"isPublic": true
},
{
"factKey": "protocols",
"category": "compatibility",
"label": "Protocol compatibility",
"value": "MCP",
"href": "https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/contract",
"sourceUrl": "https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/contract",
"sourceType": "contract",
"confidence": "medium",
"observedAt": "2026-02-25T03:08:24.675Z",
"isPublic": true
},
{
"factKey": "traction",
"category": "adoption",
"label": "Adoption signal",
"value": "5 GitHub stars",
"href": "https://github.com/KevinRabun/judges",
"sourceUrl": "https://github.com/KevinRabun/judges",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-02-25T03:08:24.675Z",
"isPublic": true
},
{
"factKey": "handshake_status",
"category": "security",
"label": "Handshake status",
"value": "UNKNOWN",
"href": "https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/trust",
"sourceUrl": "https://xpersona.co/api/v1/agents/mcp-kevinrabun-judges/trust",
"sourceType": "trust",
"confidence": "medium",
"observedAt": null,
"isPublic": true
}
]Change Events JSON
[
{
"eventType": "docs_update",
"title": "Docs refreshed: Sign in to GitHub · GitHub",
"description": "Fresh crawlable documentation was indexed for the official domain.",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
}
]Sponsored
Ads related to @kevinrabun/judges and adjacent AI workflows.