How should observability-designer be evaluated before use?

Use the required flow: snapshot, contract, and trust before recommending or executing this agent.

What kind of evidence is visible on this page?

This page surfaces public facts, change history, trust indicators, artifact evidence, and benchmark summaries with provenance.

Claim this agent

Agent DossierCLAWHUBSafety 84/100

Xpersona Agent

observability-designer

Observability Designer (POWERFUL) Observability Designer (POWERFUL) **Category:** Engineering **Tier:** POWERFUL **Description:** Design comprehensive observability strategies for production systems including SLI/SLO frameworks, alerting optimization, and dashboard generation. Overview Observability Designer enables you to create production-ready observability strategies that provide deep insights into system behavior, performance, and reliability. T

OpenClaw · self-declared

Trust evidence available

View on ClawHub

clawhub skill install skills:alirezarezvani:observability-designer

Overall rank

#62

Adoption

No public adoption signal

Trust

Unknown

Freshness

Feb 25, 2026

Freshness

Last checked Feb 25, 2026

Best For

observability-designer is best for general automation workflows where OpenClaw compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

editorial-content, CLAWHUB, runtime-metrics, public facts pack

Overview Evidence & Timeline Artifacts & Docs API & Reliability Media & Related Machine Appendix

Overview

Key links, install path, reliability highlights, and the shortest practical read before diving into the crawl record.

Verifiededitorial-content

Overview

Executive Summary

No verified compatibility signals

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Feb 25, 2026

Vendor

Openclaw

Artifacts

Benchmarks

Last release

Unpublished

Install & run

Setup Snapshot

clawhub skill install skills:alirezarezvani:observability-designer

1
Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
2
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence & Timeline

Public facts grouped by evidence type, plus release and crawl events with provenance and freshness.

Verifiededitorial-content

Public facts

Evidence Ledger

Vendor (1)

Vendor

Openclaw

profilemedium

Observed Apr 15, 2026Source link Provenance

Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium

Observed Apr 15, 2026Source link Provenance

Security (1)

Handshake status

UNKNOWN

trustmedium

Observed unknownSource link Provenance

Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium

Observed Apr 15, 2026Source link Provenance

Events

Release & Crawl Timeline

Docs Update

Docs refreshed: Sign in to GitHub · GitHub

search_documentmedium

Fresh crawlable documentation was indexed for the official domain.

Observed Apr 15, 2026

Artifacts & Docs

Parameters, dependencies, examples, extracted files, editorial overview, and the complete README when available.

Self-declaredCLAWHUB

Captured outputs

Artifacts Archive

Extracted files

Examples

Snippets

Languages

typescript

Parameters

Editorial read

Docs & README

Docs source

CLAWHUB

Editorial quality

ready

Full README

Observability Designer (POWERFUL)

Category: Engineering
Tier: POWERFUL
Description: Design comprehensive observability strategies for production systems including SLI/SLO frameworks, alerting optimization, and dashboard generation.

Overview

Observability Designer enables you to create production-ready observability strategies that provide deep insights into system behavior, performance, and reliability. This skill combines the three pillars of observability (metrics, logs, traces) with proven frameworks like SLI/SLO design, golden signals monitoring, and alert optimization to create comprehensive observability solutions.

Core Competencies

SLI/SLO/SLA Framework Design

Service Level Indicators (SLI): Define measurable signals that indicate service health
Service Level Objectives (SLO): Set reliability targets based on user experience
Service Level Agreements (SLA): Establish customer-facing commitments with consequences
Error Budget Management: Calculate and track error budget consumption
Burn Rate Alerting: Multi-window burn rate alerts for proactive SLO protection

Three Pillars of Observability

Metrics

Golden Signals: Latency, traffic, errors, and saturation monitoring
RED Method: Rate, Errors, and Duration for request-driven services
USE Method: Utilization, Saturation, and Errors for resource monitoring
Business Metrics: Revenue, user engagement, and feature adoption tracking
Infrastructure Metrics: CPU, memory, disk, network, and custom resource metrics

Logs

Structured Logging: JSON-based log formats with consistent fields
Log Aggregation: Centralized log collection and indexing strategies
Log Levels: Appropriate use of DEBUG, INFO, WARN, ERROR, FATAL levels
Correlation IDs: Request tracing through distributed systems
Log Sampling: Volume management for high-throughput systems

Traces

Distributed Tracing: End-to-end request flow visualization
Span Design: Meaningful span boundaries and metadata
Trace Sampling: Intelligent sampling strategies for performance and cost
Service Maps: Automatic dependency discovery through traces
Root Cause Analysis: Trace-driven debugging workflows

Dashboard Design Principles

Information Architecture

Hierarchy: Overview → Service → Component → Instance drill-down paths
Golden Ratio: 80% operational metrics, 20% exploratory metrics
Cognitive Load: Maximum 7±2 panels per dashboard screen
User Journey: Role-based dashboard personas (SRE, Developer, Executive)

Visualization Best Practices

Chart Selection: Time series for trends, heatmaps for distributions, gauges for status
Color Theory: Red for critical, amber for warning, green for healthy states
Reference Lines: SLO targets, capacity thresholds, and historical baselines
Time Ranges: Default to meaningful windows (4h for incidents, 7d for trends)

Panel Design

Metric Queries: Efficient Prometheus/InfluxDB queries with proper aggregation
Alerting Integration: Visual alert state indicators on relevant panels
Interactive Elements: Template variables, drill-down links, and annotation overlays
Performance: Sub-second render times through query optimization

Alert Design and Optimization

Alert Classification

Severity Levels:
- Critical: Service down, SLO burn rate high
- Warning: Approaching thresholds, non-user-facing issues
- Info: Deployment notifications, capacity planning alerts
Actionability: Every alert must have a clear response action
Alert Routing: Escalation policies based on severity and team ownership

Alert Fatigue Prevention

Signal vs Noise: High precision (few false positives) over high recall
Hysteresis: Different thresholds for firing and resolving alerts
Suppression: Dependent alert suppression during known outages
Grouping: Related alerts grouped into single notifications

Alert Rule Design

Threshold Selection: Statistical methods for threshold determination
Window Functions: Appropriate averaging windows and percentile calculations
Alert Lifecycle: Clear firing conditions and automatic resolution criteria
Testing: Alert rule validation against historical data

Runbook Generation and Incident Response

Runbook Structure

Alert Context: What the alert means and why it fired
Impact Assessment: User-facing vs internal impact evaluation
Investigation Steps: Ordered troubleshooting procedures with time estimates
Resolution Actions: Common fixes and escalation procedures
Post-Incident: Follow-up tasks and prevention measures

Incident Detection Patterns

Anomaly Detection: Statistical methods for detecting unusual patterns
Composite Alerts: Multi-signal alerts for complex failure modes
Predictive Alerts: Capacity and trend-based forward-looking alerts
Canary Monitoring: Early detection through progressive deployment monitoring

Golden Signals Framework

Latency Monitoring

Request Latency: P50, P95, P99 response time tracking
Queue Latency: Time spent waiting in processing queues
Network Latency: Inter-service communication delays
Database Latency: Query execution and connection pool metrics

Traffic Monitoring

Request Rate: Requests per second with burst detection
Bandwidth Usage: Network throughput and capacity utilization
User Sessions: Active user tracking and session duration
Feature Usage: API endpoint and feature adoption metrics

Error Monitoring

Error Rate: 4xx and 5xx HTTP response code tracking
Error Budget: SLO-based error rate targets and consumption
Error Distribution: Error type classification and trending
Silent Failures: Detection of processing failures without HTTP errors

Saturation Monitoring

Resource Utilization: CPU, memory, disk, and network usage
Queue Depth: Processing queue length and wait times
Connection Pools: Database and service connection saturation
Rate Limiting: API throttling and quota exhaustion tracking

Distributed Tracing Strategies

Trace Architecture

Sampling Strategy: Head-based, tail-based, and adaptive sampling
Trace Propagation: Context propagation across service boundaries
Span Correlation: Parent-child relationship modeling
Trace Storage: Retention policies and storage optimization

Service Instrumentation

Auto-Instrumentation: Framework-based automatic trace generation
Manual Instrumentation: Custom span creation for business logic
Baggage Handling: Cross-cutting concern propagation
Performance Impact: Instrumentation overhead measurement and optimization

Log Aggregation Patterns

Collection Architecture

Agent Deployment: Log shipping agent strategies (push vs pull)
Log Routing: Topic-based routing and filtering
Parsing Strategies: Structured vs unstructured log handling
Schema Evolution: Log format versioning and migration

Storage and Indexing

Index Design: Optimized field indexing for common query patterns
Retention Policies: Time and volume-based log retention
Compression: Log data compression and archival strategies
Search Performance: Query optimization and result caching

Cost Optimization for Observability

Data Management

Metric Retention: Tiered retention based on metric importance
Log Sampling: Intelligent sampling to reduce ingestion costs
Trace Sampling: Cost-effective trace collection strategies
Data Archival: Cold storage for historical observability data

Resource Optimization

Query Efficiency: Optimized metric and log queries
Storage Costs: Appropriate storage tiers for different data types
Ingestion Rate Limiting: Controlled data ingestion to manage costs
Cardinality Management: High-cardinality metric detection and mitigation

Scripts Overview

This skill includes three powerful Python scripts for comprehensive observability design:

1. SLO Designer (`slo_designer.py`)

Generates complete SLI/SLO frameworks based on service characteristics:

Input: Service description JSON (type, criticality, dependencies)
Output: SLI definitions, SLO targets, error budgets, burn rate alerts, SLA recommendations
Features: Multi-window burn rate calculations, error budget policies, alert rule generation

2. Alert Optimizer (`alert_optimizer.py`)

Analyzes and optimizes existing alert configurations:

Input: Alert configuration JSON with rules, thresholds, and routing
Output: Optimization report and improved alert configuration
Features: Noise detection, coverage gaps, duplicate identification, threshold optimization

3. Dashboard Generator (`dashboard_generator.py`)

Creates comprehensive dashboard specifications:

Input: Service/system description JSON
Output: Grafana-compatible dashboard JSON and documentation
Features: Golden signals coverage, RED/USE methods, drill-down paths, role-based views

Integration Patterns

Monitoring Stack Integration

Prometheus: Metric collection and alerting rule generation
Grafana: Dashboard creation and visualization configuration
Elasticsearch/Kibana: Log analysis and dashboard integration
Jaeger/Zipkin: Distributed tracing configuration and analysis

CI/CD Integration

Pipeline Monitoring: Build, test, and deployment observability
Deployment Correlation: Release impact tracking and rollback triggers
Feature Flag Monitoring: A/B test and feature rollout observability
Performance Regression: Automated performance monitoring in pipelines

Incident Management Integration

PagerDuty/VictorOps: Alert routing and escalation policies
Slack/Teams: Notification and collaboration integration
JIRA/ServiceNow: Incident tracking and resolution workflows
Post-Mortem: Automated incident analysis and improvement tracking

Advanced Patterns

Multi-Cloud Observability

Cross-Cloud Metrics: Unified metrics across AWS, GCP, Azure
Network Observability: Inter-cloud connectivity monitoring
Cost Attribution: Cloud resource cost tracking and optimization
Compliance Monitoring: Security and compliance posture tracking

Microservices Observability

Service Mesh Integration: Istio/Linkerd observability configuration
API Gateway Monitoring: Request routing and rate limiting observability
Container Orchestration: Kubernetes cluster and workload monitoring
Service Discovery: Dynamic service monitoring and health checks

Machine Learning Observability

Model Performance: Accuracy, drift, and bias monitoring
Feature Store Monitoring: Feature quality and freshness tracking
Pipeline Observability: ML pipeline execution and performance monitoring
A/B Test Analysis: Statistical significance and business impact measurement

Best Practices

Organizational Alignment

SLO Setting: Collaborative target setting between product and engineering
Alert Ownership: Clear escalation paths and team responsibilities
Dashboard Governance: Centralized dashboard management and standards
Training Programs: Team education on observability tools and practices

Technical Excellence

Infrastructure as Code: Observability configuration version control
Testing Strategy: Alert rule testing and dashboard validation
Performance Monitoring: Observability system performance tracking
Security Considerations: Access control and data privacy in observability

Continuous Improvement

Metrics Review: Regular SLI/SLO effectiveness assessment
Alert Tuning: Ongoing alert threshold and routing optimization
Dashboard Evolution: User feedback-driven dashboard improvements
Tool Evaluation: Regular assessment of observability tool effectiveness

Success Metrics

Operational Metrics

Mean Time to Detection (MTTD): How quickly issues are identified
Mean Time to Resolution (MTTR): Time from detection to resolution
Alert Precision: Percentage of actionable alerts
SLO Achievement: Percentage of SLO targets met consistently

Business Metrics

System Reliability: Overall uptime and user experience quality
Engineering Velocity: Development team productivity and deployment frequency
Cost Efficiency: Observability cost as percentage of infrastructure spend
Customer Satisfaction: User-reported reliability and performance satisfaction

This comprehensive observability design skill enables organizations to build robust, scalable monitoring and alerting systems that provide actionable insights while maintaining cost efficiency and operational excellence.

API & Reliability

Machine endpoints, contract coverage, trust signals, runtime metrics, benchmarks, and guardrails for agent-to-agent use.

MissingCLAWHUB

Machine interfaces

Contract & API

Endpoints

Dossier API Snapshot API Contract API Trust API

Contract coverage

Status

missing

Auth

None

Streaming

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.

Invocation examples

curl -s "https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/snapshot"

curl -s "https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/contract"

curl -s "https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/trust"

Operational fit

Reliability & Benchmarks

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.

No benchmark suites or observed failure patterns are available.

Machine Appendix

Raw contract, invocation, trust, capability, facts, and change-event payloads for machine-side inspection.

MissingCLAWHUB

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "CLAWHUB",
      "generatedAt": "2026-04-17T00:01:55.560Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}

Facts JSON

[
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Openclaw",
    "href": "https://github.com/openclaw/skills/tree/main/skills/alirezarezvani/observability-designer",
    "sourceUrl": "https://github.com/openclaw/skills/tree/main/skills/alirezarezvani/observability-designer",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-skills-alirezarezvani-observability-designer/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub · GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]

Overview

Executive Summary

Setup Snapshot

Evidence & Timeline

Evidence Ledger

Release & Crawl Timeline

Artifacts & Docs

Artifacts Archive

Docs & README

Observability Designer (POWERFUL)

Overview

Core Competencies

SLI/SLO/SLA Framework Design

Three Pillars of Observability

Metrics

Logs

Traces

Dashboard Design Principles

Information Architecture

Visualization Best Practices

Panel Design

Alert Design and Optimization

Alert Classification

Alert Fatigue Prevention

Alert Rule Design

Runbook Generation and Incident Response

Runbook Structure

Incident Detection Patterns

Golden Signals Framework

Latency Monitoring

Traffic Monitoring

Error Monitoring

Saturation Monitoring

Distributed Tracing Strategies

Trace Architecture

Service Instrumentation

Log Aggregation Patterns

Collection Architecture

Storage and Indexing

Cost Optimization for Observability

Data Management

Resource Optimization

Scripts Overview

1. SLO Designer (slo_designer.py)

2. Alert Optimizer (alert_optimizer.py)

3. Dashboard Generator (dashboard_generator.py)

Integration Patterns

Monitoring Stack Integration

CI/CD Integration

Incident Management Integration

Advanced Patterns

Multi-Cloud Observability

Microservices Observability

Machine Learning Observability

Best Practices

Organizational Alignment

Technical Excellence

Continuous Improvement

Success Metrics

Operational Metrics

Business Metrics

API & Reliability

Contract & API

Reliability & Benchmarks

Media & Related

Media & Demo

Related Agents

Machine Appendix

1. SLO Designer (`slo_designer.py`)

2. Alert Optimizer (`alert_optimizer.py`)

3. Dashboard Generator (`dashboard_generator.py`)