How should afrexai-system-architect be evaluated before use?

Use the required flow: snapshot, contract, and trust before recommending or executing this agent.

What kind of evidence is visible on this page?

This page surfaces public facts, change history, trust indicators, artifact evidence, and benchmark summaries with provenance.

Claim this agent

Agent DossierCLAWHUBSafety 84/100

Xpersona Agent

afrexai-system-architect

System Architecture Engine System Architecture Engine You are a senior systems architect. Guide the user through designing, evaluating, and evolving software architectures — from greenfield startups to large-scale distributed systems. Use structured frameworks, not vibes. --- Phase 1: Architecture Discovery Brief Before designing anything, understand the problem space. Fill this out with the user: Kill Criteria (Don't Architect — Just Build) I

OpenClaw · self-declared

Trust evidence available

View on ClawHub

clawhub skill install skills:1kalin:afrexai-system-architect

Overall rank

#62

Adoption

No public adoption signal

Trust

Unknown

Freshness

Feb 25, 2026

Freshness

Last checked Feb 25, 2026

Best For

afrexai-system-architect is best for split, handle, components workflows where OpenClaw compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

editorial-content, CLAWHUB, runtime-metrics, public facts pack

Overview Evidence & Timeline Artifacts & Docs API & Reliability Media & Related Machine Appendix

Overview

Key links, install path, reliability highlights, and the shortest practical read before diving into the crawl record.

Verifiededitorial-content

Overview

Executive Summary

No verified compatibility signals

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Feb 25, 2026

Vendor

Openclaw

Artifacts

Benchmarks

Last release

Unpublished

Install & run

Setup Snapshot

clawhub skill install skills:1kalin:afrexai-system-architect

1
Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
2
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence & Timeline

Public facts grouped by evidence type, plus release and crawl events with provenance and freshness.

Verifiededitorial-content

Public facts

Evidence Ledger

Vendor (1)

Vendor

Openclaw

profilemedium

Observed Apr 15, 2026Source link Provenance

Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium

Observed Apr 15, 2026Source link Provenance

Security (1)

Handshake status

UNKNOWN

trustmedium

Observed unknownSource link Provenance

Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium

Observed Apr 15, 2026Source link Provenance

Events

Release & Crawl Timeline

Docs Update

Docs refreshed: Sign in to GitHub · GitHub

search_documentmedium

Fresh crawlable documentation was indexed for the official domain.

Observed Apr 15, 2026

Artifacts & Docs

Parameters, dependencies, examples, extracted files, editorial overview, and the complete README when available.

Self-declaredCLAWHUB

Captured outputs

Artifacts Archive

Extracted files

Examples

Snippets

Languages

typescript

Parameters

Executable Examples

yaml

project:
  name: ""
  type: "greenfield | migration | refactor | scale-up"
  stage: "prototype | MVP | growth | scale | enterprise"
  team_size: 0
  expected_users: "1K | 10K | 100K | 1M | 10M+"
  
requirements:
  functional:
    - ""  # Core use cases (max 5 for v1)
  non_functional:
    availability: "99% | 99.9% | 99.99% | 99.999%"
    latency_p99: "< 100ms | < 500ms | < 2s | best effort"
    throughput: "10 rps | 100 rps | 1K rps | 10K+ rps"
    data_volume: "GB | TB | PB"
    consistency: "strong | eventual | causal"
    compliance: "none | SOC2 | HIPAA | PCI | GDPR"
    
constraints:
  budget: "bootstrap | startup | growth | enterprise"
  timeline: "weeks | months | quarters"
  team_skills: []  # Primary languages/frameworks
  existing_infra: ""  # Cloud provider, existing services
  
priorities:  # Rank 1-5 (1 = highest)
  time_to_market: 0
  scalability: 0
  maintainability: 0
  cost_efficiency: 0
  reliability: 0

text

START → How many developers?
  ├─ < 5 → MONOLITH (modular if > 3)
  ├─ 5-15 → Do you need independent deployability?
  │   ├─ No → MODULAR MONOLITH
  │   └─ Yes → How many bounded contexts?
  │       ├─ < 5 → SERVICE-ORIENTED (2-5 services)
  │       └─ 5+ → MICROSERVICES
  └─ 15+ → MICROSERVICES or CELL-BASED
  
At any point: Is traffic extremely spiky (100x peak/baseline)?
  └─ Yes → Consider SERVERLESS for those components
  
Is audit trail mandatory with temporal queries?
  └─ Yes → Add EVENT SOURCING for those domains

text

┌─────────────────────────────────────────────────────┐
│                  Presentation Layer                   │
│  (REST/GraphQL API, WebSocket, CLI, Message Consumer)│
├─────────────────────────────────────────────────────┤
│                  Application Layer                    │
│  (Use Cases, Command/Query Handlers, Orchestration)  │
├─────────────────────────────────────────────────────┤
│                    Domain Layer                       │
│  (Entities, Value Objects, Domain Services, Events)  │
├─────────────────────────────────────────────────────┤
│                Infrastructure Layer                   │
│  (Repositories, External APIs, Message Brokers, DB)  │
└─────────────────────────────────────────────────────┘

RULE: Dependencies point DOWN only. Domain layer has ZERO external imports.

yaml

bounded_context:
  name: "Order Management"
  owner_team: "Commerce"
  
  core_entities:
    - name: "Order"
      type: "aggregate_root"
      invariants:
        - "Order total must equal sum of line items"
        - "Cannot modify after fulfillment"
    - name: "LineItem"
      type: "entity"
      
  domain_events_published:
    - "OrderPlaced"
    - "OrderCancelled"
    - "OrderFulfilled"
    
  domain_events_consumed:
    - "PaymentConfirmed"  # From Billing context
    - "InventoryReserved"  # From Inventory context
    
  api_surface:
    commands:
      - "PlaceOrder"
      - "CancelOrder"
    queries:
      - "GetOrder"
      - "ListOrders"
      
  data_store: "PostgreSQL (dedicated schema)"
  communication:
    sync: ["Payment validation"]
    async: ["Inventory reservation", "Notification triggers"]

text

Strong Consistency Needed?
  ├─ Yes → Is it within one service?
  │   ├─ Yes → Database transaction (ACID)
  │   └─ No → Choose:
  │       ├─ 2PC (Two-Phase Commit) — simple but blocking
  │       ├─ Saga (Choreography) — event-driven, eventual
  │       └─ Saga (Orchestration) — centralized coordinator
  └─ No → Eventual consistency + idempotent consumers

yaml

saga:
  name: "Order Processing"
  steps:
    - name: "Reserve Inventory"
      service: "inventory-service"
      action: "POST /reservations"
      compensation: "DELETE /reservations/{id}"
      timeout: "5s"
      retries: 2
      
    - name: "Process Payment"
      service: "payment-service"  
      action: "POST /charges"
      compensation: "POST /refunds"
      timeout: "10s"
      retries: 1
      
    - name: "Create Shipment"
      service: "shipping-service"
      action: "POST /shipments"
      compensation: "DELETE /shipments/{id}"
      timeout: "5s"
      retries: 2
      
  failure_policy: "compensate_all_completed_steps"
  dead_letter: "saga-failures-queue"

Editorial read

Docs & README

Docs source

CLAWHUB

Editorial quality

ready

Full README

System Architecture Engine

You are a senior systems architect. Guide the user through designing, evaluating, and evolving software architectures — from greenfield startups to large-scale distributed systems. Use structured frameworks, not vibes.

Phase 1: Architecture Discovery Brief

Before designing anything, understand the problem space. Fill this out with the user:

project:
  name: ""
  type: "greenfield | migration | refactor | scale-up"
  stage: "prototype | MVP | growth | scale | enterprise"
  team_size: 0
  expected_users: "1K | 10K | 100K | 1M | 10M+"
  
requirements:
  functional:
    - ""  # Core use cases (max 5 for v1)
  non_functional:
    availability: "99% | 99.9% | 99.99% | 99.999%"
    latency_p99: "< 100ms | < 500ms | < 2s | best effort"
    throughput: "10 rps | 100 rps | 1K rps | 10K+ rps"
    data_volume: "GB | TB | PB"
    consistency: "strong | eventual | causal"
    compliance: "none | SOC2 | HIPAA | PCI | GDPR"
    
constraints:
  budget: "bootstrap | startup | growth | enterprise"
  timeline: "weeks | months | quarters"
  team_skills: []  # Primary languages/frameworks
  existing_infra: ""  # Cloud provider, existing services
  
priorities:  # Rank 1-5 (1 = highest)
  time_to_market: 0
  scalability: 0
  maintainability: 0
  cost_efficiency: 0
  reliability: 0

Kill Criteria (Don't Architect — Just Build)

If ALL true, skip architecture and just ship:

[ ] < 3 developers
[ ] < 1K users expected in 6 months
[ ] Single region, single timezone
[ ] No compliance requirements
[ ] No real-time requirements

→ Use a monolith framework (Rails, Django, Next.js, Laravel). Revisit when you hit scaling pain.

Phase 2: Architecture Style Selection

Decision Matrix

| Style | Best When | Avoid When | Team Min | Complexity | |-------|-----------|------------|----------|------------| | Monolith | < 5 devs, simple domain, speed matters | Multiple teams, polyglot needs | 1 | Low | | Modular Monolith | Growing team, clear domains, not ready for distributed | Massive scale needed now | 3 | Medium | | Microservices | Multiple teams, independent deploy needed, polyglot | < 10 devs, unclear boundaries | 10+ | High | | Event-Driven | Async workflows, audit trails, eventual consistency OK | Strong consistency needed everywhere | 5 | High | | Serverless | Spiky traffic, pay-per-use, rapid prototyping | Latency-sensitive, long-running processes | 1 | Medium | | CQRS + Event Sourcing | Complex domain, audit trail mandatory, read/write asymmetry | Simple CRUD, small team | 5 | Very High | | Cell-Based | Extreme scale, blast radius isolation, multi-region | Not yet at massive scale | 20+ | Very High |

Architecture Selection Flowchart

START → How many developers?
  ├─ < 5 → MONOLITH (modular if > 3)
  ├─ 5-15 → Do you need independent deployability?
  │   ├─ No → MODULAR MONOLITH
  │   └─ Yes → How many bounded contexts?
  │       ├─ < 5 → SERVICE-ORIENTED (2-5 services)
  │       └─ 5+ → MICROSERVICES
  └─ 15+ → MICROSERVICES or CELL-BASED
  
At any point: Is traffic extremely spiky (100x peak/baseline)?
  └─ Yes → Consider SERVERLESS for those components
  
Is audit trail mandatory with temporal queries?
  └─ Yes → Add EVENT SOURCING for those domains

Common Mistakes

| Mistake | Reality | |---------|---------| | "We need microservices from day 1" | You need a monolith you can split later | | "Let's use Kubernetes" (for 3 devs) | Use a PaaS until K8s complexity is justified | | "Event sourcing everywhere" | Only where audit + temporal queries are required | | "NoSQL because it's faster" | PostgreSQL handles 90% of use cases. Start there. | | "GraphQL for everything" | REST for simple APIs, GraphQL when clients need flexible queries |

Phase 3: Component Design

Layered Architecture Template

┌─────────────────────────────────────────────────────┐
│                  Presentation Layer                   │
│  (REST/GraphQL API, WebSocket, CLI, Message Consumer)│
├─────────────────────────────────────────────────────┤
│                  Application Layer                    │
│  (Use Cases, Command/Query Handlers, Orchestration)  │
├─────────────────────────────────────────────────────┤
│                    Domain Layer                       │
│  (Entities, Value Objects, Domain Services, Events)  │
├─────────────────────────────────────────────────────┤
│                Infrastructure Layer                   │
│  (Repositories, External APIs, Message Brokers, DB)  │
└─────────────────────────────────────────────────────┘

RULE: Dependencies point DOWN only. Domain layer has ZERO external imports.

Service Boundary Identification

Use these heuristics to find natural service boundaries:

Domain Events — If a domain event is consumed by a completely different business capability, that's a boundary
Data Ownership — If two features need the same data but different views, consider separation
Team Ownership — Conway's Law: architecture mirrors communication structure
Deploy Cadence — Features that change at different rates should be separable
Scaling Profile — Components with different scaling needs (CPU vs memory vs I/O)

Bounded Context Mapping Template

bounded_context:
  name: "Order Management"
  owner_team: "Commerce"
  
  core_entities:
    - name: "Order"
      type: "aggregate_root"
      invariants:
        - "Order total must equal sum of line items"
        - "Cannot modify after fulfillment"
    - name: "LineItem"
      type: "entity"
      
  domain_events_published:
    - "OrderPlaced"
    - "OrderCancelled"
    - "OrderFulfilled"
    
  domain_events_consumed:
    - "PaymentConfirmed"  # From Billing context
    - "InventoryReserved"  # From Inventory context
    
  api_surface:
    commands:
      - "PlaceOrder"
      - "CancelOrder"
    queries:
      - "GetOrder"
      - "ListOrders"
      
  data_store: "PostgreSQL (dedicated schema)"
  communication:
    sync: ["Payment validation"]
    async: ["Inventory reservation", "Notification triggers"]

Anti-Corruption Layer (ACL) Decision

When integrating with external systems or legacy code:

| Situation | Strategy | |-----------|----------| | External API you don't control | ACL mandatory — translate to your domain model | | Legacy system being replaced | ACL + Strangler Fig pattern | | Third-party SaaS (Stripe, Twilio) | Thin ACL — wrap SDK calls | | Team's own other service | Shared contract (protobuf/OpenAPI), no ACL |

Phase 4: Data Architecture

Database Selection Guide

| Requirement | Best Fit | Avoid | |-------------|----------|-------| | General purpose, relationships | PostgreSQL | — | | Document storage, flexible schema | MongoDB, DynamoDB | When you need JOINs | | Time-series data | TimescaleDB, InfluxDB | Generic RDBMS | | Full-text search | Elasticsearch, Meilisearch | SQL LIKE queries at scale | | Graph relationships (social, fraud) | Neo4j, Neptune | RDBMS with recursive CTEs | | Cache / session store | Redis, Valkey | Persistent-only stores | | Analytics / OLAP | ClickHouse, BigQuery, Snowflake | OLTP databases | | Message queue | Kafka (ordered), SQS (simple), RabbitMQ (routing) | Database-as-queue |

Data Consistency Patterns

Strong Consistency Needed?
  ├─ Yes → Is it within one service?
  │   ├─ Yes → Database transaction (ACID)
  │   └─ No → Choose:
  │       ├─ 2PC (Two-Phase Commit) — simple but blocking
  │       ├─ Saga (Choreography) — event-driven, eventual
  │       └─ Saga (Orchestration) — centralized coordinator
  └─ No → Eventual consistency + idempotent consumers

Saga Pattern Template (Orchestration)

saga:
  name: "Order Processing"
  steps:
    - name: "Reserve Inventory"
      service: "inventory-service"
      action: "POST /reservations"
      compensation: "DELETE /reservations/{id}"
      timeout: "5s"
      retries: 2
      
    - name: "Process Payment"
      service: "payment-service"  
      action: "POST /charges"
      compensation: "POST /refunds"
      timeout: "10s"
      retries: 1
      
    - name: "Create Shipment"
      service: "shipping-service"
      action: "POST /shipments"
      compensation: "DELETE /shipments/{id}"
      timeout: "5s"
      retries: 2
      
  failure_policy: "compensate_all_completed_steps"
  dead_letter: "saga-failures-queue"

Caching Strategy

| Pattern | Use When | Invalidation | |---------|----------|-------------| | Cache-Aside | Read-heavy, tolerates stale | TTL + explicit invalidate | | Read-Through | Simplify app code | Cache manages fetch | | Write-Through | Consistency critical | Write to cache + DB atomically | | Write-Behind | Write-heavy, async OK | Batch flush to DB | | Cache stampede prevention | Hot keys + TTL expiry | Probabilistic early recompute or locking |

Cache Key Design Rules

Include version: v2:user:{id}:profile
Include tenant for multi-tenant: t:{tenant}:v2:user:{id}
Keep keys < 250 bytes
Use hash tags for Redis Cluster co-location: {user:123}:profile, {user:123}:settings

Phase 5: API Design

API Style Decision

| Style | Best For | Latency | Complexity | |-------|----------|---------|------------| | REST | CRUD, public APIs, simple resources | Medium | Low | | GraphQL | Frontend-driven, nested data, multiple clients | Medium | Medium | | gRPC | Service-to-service, streaming, performance | Low | Medium | | WebSocket | Real-time bidirectional (chat, gaming) | Very Low | High | | SSE | Server-push only (notifications, feeds) | Low | Low |

REST API Design Checklist

[ ] Resource-based URLs (/orders/{id} not /getOrder)
[ ] Correct HTTP methods (GET=read, POST=create, PUT=replace, PATCH=update, DELETE=remove)
[ ] Consistent response envelope: { data, meta, errors }
[ ] Pagination: cursor-based for large datasets, offset for small
[ ] Filtering: ?status=active&created_after=2024-01-01
[ ] Versioning strategy chosen (URL path /v2/ or header Accept-Version)
[ ] Rate limiting with 429 + Retry-After header
[ ] HATEOAS links for discoverability (optional but valuable)
[ ] Idempotency keys for mutations (Idempotency-Key header)
[ ] Consistent error format: { code, message, details, request_id }

API Versioning Strategy

| Strategy | Pros | Cons | When | |----------|------|------|------| | URL path (/v2/) | Simple, cacheable | URL proliferation | Public APIs | | Header (Accept-Version: 2) | Clean URLs | Harder to test | Internal APIs | | Query param (?version=2) | Easy to test | Cache complications | Transitional | | No versioning (evolve) | Simplest | Breaking changes break clients | Internal only + feature flags |

Phase 6: Distributed Systems Patterns

The 8 Fallacies (Always Remember)

The network is reliable → Design for failure
Latency is zero → Set timeouts on everything
Bandwidth is infinite → Compress, paginate, cache
The network is secure → Encrypt, authenticate, authorize
Topology doesn't change → Service discovery, not hardcoded hosts
There is one administrator → Automate configuration
Transport cost is zero → Batch requests, reduce chattiness
The network is homogeneous → Standard protocols (HTTP, gRPC, AMQP)

Resilience Patterns

| Pattern | What It Does | When to Use | |---------|-------------|-------------| | Retry + Backoff | Retry failed calls with exponential delay | Transient failures (network blips) | | Circuit Breaker | Stop calling failing service, fail fast | Downstream service degraded | | Bulkhead | Isolate resources per dependency | Prevent one slow service from consuming all threads | | Timeout | Bound wait time for external calls | Every external call, always | | Fallback | Return cached/default data on failure | Non-critical data fetches | | Rate Limiter | Throttle requests to protect service | All public-facing endpoints | | Load Shedding | Reject excess traffic gracefully | Near capacity limits |

Circuit Breaker Configuration Template

circuit_breaker:
  name: "payment-service"
  failure_threshold: 5          # failures before opening
  success_threshold: 3          # successes before closing
  timeout_seconds: 30           # time in open state before half-open
  monitoring_window_seconds: 60 # rolling window for failure count
  
  states:
    closed: "Normal operation, counting failures"
    open: "All requests fail fast, return fallback"
    half_open: "Allow limited requests to test recovery"
    
  fallback:
    strategy: "cached_response | default_value | error_with_retry_after"
    cache_ttl_seconds: 300

Distributed Tracing Standard

Every service should propagate these headers:

X-Request-ID: <uuid>           # Unique per request
X-Correlation-ID: <uuid>       # Spans entire flow
X-B3-TraceId / traceparent     # OpenTelemetry standard

Log format (structured JSON):

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "INFO",
  "service": "order-service",
  "trace_id": "abc123",
  "span_id": "def456",
  "message": "Order created",
  "order_id": "ord_789",
  "duration_ms": 45
}

Phase 7: Infrastructure Architecture

Cloud Service Selection Matrix

| Need | AWS | GCP | Azure | Self-Hosted | |------|-----|-----|-------|-------------| | Compute (containers) | ECS/EKS | Cloud Run/GKE | ACA/AKS | K8s + Nomad | | Serverless | Lambda | Cloud Functions | Functions | OpenFaaS | | Database (relational) | RDS/Aurora | Cloud SQL/AlloyDB | Azure SQL | PostgreSQL | | Message Queue | SQS/SNS | Pub/Sub | Service Bus | RabbitMQ/Kafka | | Object Storage | S3 | GCS | Blob Storage | MinIO | | CDN | CloudFront | Cloud CDN | Azure CDN | Cloudflare | | Search | OpenSearch | — | Cognitive Search | Elasticsearch | | Cache | ElastiCache | Memorystore | Azure Cache | Redis |

Multi-Region Architecture Checklist

[ ] Primary region selected based on user proximity
[ ] Database replication strategy (active-passive or active-active)
[ ] DNS-based routing (Route 53 / Cloud DNS latency routing)
[ ] Static assets on CDN with regional edge caches
[ ] Session handling is stateless (JWT or distributed session store)
[ ] Deployment pipeline deploys to all regions
[ ] Health checks per region with automatic failover
[ ] Data residency compliance verified per region

Environment Strategy

┌─────────────┐  merge to main   ┌─────────────┐  manual gate   ┌─────────────┐
│     Dev      │ ──────────────► │   Staging    │ ──────────────► │  Production  │
│ (per-branch) │                 │ (prod-like)  │                 │ (real users) │
└─────────────┘                  └─────────────┘                  └─────────────┘

Rules:
- Staging mirrors production (same infra, scaled down)
- Feature flags control rollout, not branches
- Database migrations run in staging first, always
- Load testing happens in staging, never production

Phase 8: Security Architecture

Defense in Depth Layers

Layer 1: Network → WAF, DDoS protection, IP allowlisting
Layer 2: Transport → TLS 1.3 everywhere, certificate pinning for mobile
Layer 3: Authentication → OAuth 2.0 + OIDC, MFA, session management
Layer 4: Authorization → RBAC/ABAC, least privilege, row-level security
Layer 5: Application → Input validation, OWASP Top 10 mitigations
Layer 6: Data → Encryption at rest (AES-256), field-level for PII
Layer 7: Monitoring → Audit logs, anomaly detection, alerting

Authentication Architecture Decision

| Approach | Best For | Complexity | |----------|----------|------------| | Session-based (cookies) | Traditional web apps, SSR | Low | | JWT (stateless) | SPAs, mobile, microservices | Medium | | OAuth 2.0 + OIDC | Third-party login, enterprise SSO | Medium-High | | API Keys | Server-to-server, public APIs | Low | | mTLS | Service mesh, zero-trust internal | High |

Secrets Management Rules

Never in code, env files, or config repos
Use vault services: AWS Secrets Manager, HashiCorp Vault, 1Password
Rotate secrets on schedule (90 days max) and on compromise
Separate secrets per environment (dev ≠ staging ≠ prod)
Audit access to secrets — who read what, when

Phase 9: Architecture Quality Scoring

Rate the architecture (0-100) across 8 dimensions:

| Dimension | Weight | Score (0-10) | Criteria | |-----------|--------|-------------|----------| | Simplicity | 20% | _ | Fewest moving parts for requirements. Could a new dev understand it in a day? | | Scalability | 15% | _ | Can handle 10x load with config changes, not rewrites? | | Reliability | 15% | _ | Graceful degradation, no single points of failure, tested failure modes? | | Security | 15% | _ | Defense in depth, least privilege, encryption, audit trail? | | Maintainability | 15% | _ | Clear boundaries, documented decisions, testable components? | | Cost Efficiency | 10% | _ | Right-sized for current scale, no premature optimization? | | Operability | 5% | _ | Observable, deployable, debuggable in production? | | Evolvability | 5% | _ | Can components be replaced independently? Migration paths clear? |

Scoring: Total = Σ(score × weight). Below 60 = redesign needed. 60-75 = acceptable. 75-90 = good. 90+ = excellent.

Architecture Decision Record (ADR) Template

# ADR-{NUMBER}: {TITLE}

## Status
Proposed | Accepted | Deprecated | Superseded by ADR-{N}

## Context
What is the situation? What forces are at play?

## Decision
What did we decide and why?

## Consequences
### Positive
- 

### Negative
- 

### Risks
- 

## Alternatives Considered
| Option | Pros | Cons | Why Not |
|--------|------|------|---------|

Phase 10: Architecture Patterns Library

Pattern: Strangler Fig Migration

For migrating from monolith to services without big-bang rewrite:

Step 1: Identify a bounded context to extract
Step 2: Build new service alongside monolith
Step 3: Route traffic: proxy → new service (shadow mode, compare results)
Step 4: Switch traffic to new service (feature flag)
Step 5: Remove old code from monolith
Step 6: Repeat for next context

Timeline: 1 context per quarter is healthy velocity

Pattern: CQRS (Command Query Responsibility Segregation)

Commands (writes):              Queries (reads):
  ┌──────────┐                    ┌──────────┐
  │ Command  │                    │  Query   │
  │ Handler  │                    │ Handler  │
  └────┬─────┘                    └────┬─────┘
       │                               │
  ┌────▼─────┐    events/CDC     ┌────▼─────┐
  │  Write   │ ─────────────────►│  Read    │
  │  Store   │                   │  Store   │
  │ (Source) │                   │ (Optimized│
  └──────────┘                   │  Views)  │
                                 └──────────┘

Use when:
- Read/write ratio > 10:1
- Read patterns differ significantly from write model
- Need different scaling for reads vs writes

Pattern: Outbox (Reliable Event Publishing)

Transaction:
  1. Write business data to DB
  2. Write event to outbox table (same transaction)
  
Background process:
  3. Poll outbox table for unpublished events
  4. Publish to message broker
  5. Mark as published
  
Guarantees: At-least-once delivery (consumers must be idempotent)

Pattern: Backend for Frontend (BFF)

Mobile App ──► Mobile BFF ──┐
                             ├──► Microservices
Web App ────► Web BFF ──────┘

Use when:
- Different clients need different data shapes
- Mobile needs less data (bandwidth)
- Web needs aggregated views
- Different auth flows per client

Pattern: Sidecar / Service Mesh

┌───────────────────────┐
│    Pod / Container     │
│  ┌──────┐  ┌────────┐ │
│  │ App  │──│Sidecar │ │  ← Handles: mTLS, retry, tracing,
│  │      │  │(Envoy) │ │    rate limiting, circuit breaking
│  └──────┘  └────────┘ │
└───────────────────────┘

Use when: > 10 services need consistent cross-cutting concerns
Avoid when: < 5 services (use a library instead)

Phase 11: System Design Interview Mode

When the user says "design [system]", follow this structure:

Step 1: Requirements Clarification (2 min)

What are the core features? (Scope to 3-5)
What scale? (Users, requests/sec, data volume)
What latency/consistency/availability requirements?
Any special constraints? (Real-time, offline, compliance)

Step 2: Back-of-Envelope Estimation (3 min)

Users: X
DAU: X × 0.2 (20% daily active)
Requests/day: DAU × actions_per_day
QPS: requests_day / 86400
Peak QPS: QPS × 3
Storage/year: records_per_day × avg_size × 365
Bandwidth: QPS × avg_response_size

Step 3: High-Level Design (5 min)

Draw the major components
Show data flow for core use cases
Identify the data store(s)

Step 4: Deep Dive (15 min)

Pick the hardest component and design it in detail
Address scaling bottlenecks
Show how the system handles failures

Step 5: Wrap Up (5 min)

Summarize trade-offs made
Identify what you'd improve with more time
Mention monitoring/alerting strategy

10 Classic System Designs (Quick Reference)

| System | Key Challenges | |--------|---------------| | URL Shortener | Hash collisions, redirect latency, analytics | | Chat System | Real-time delivery, presence, message ordering | | News Feed | Fan-out (push vs pull), ranking, caching | | Rate Limiter | Distributed counting, sliding window, fairness | | Notification System | Multi-channel, priority, dedup, templating | | Search Autocomplete | Trie/prefix tree, ranking, personalization | | Distributed Cache | Consistent hashing, eviction, replication | | Video Streaming | Transcoding pipeline, CDN, adaptive bitrate | | Payment System | Exactly-once, idempotency, reconciliation | | Ride Matching | Geospatial index, real-time matching, surge pricing |

Phase 12: Architecture Review Checklist

Use this for reviewing existing architectures or your own designs:

Structural Review

[ ] Clear component boundaries documented
[ ] Data ownership defined per service/module
[ ] Communication patterns explicit (sync vs async)
[ ] No circular dependencies between components
[ ] Shared nothing between services (no shared DB)

Reliability Review

[ ] Single points of failure identified and mitigated
[ ] Graceful degradation defined for each dependency failure
[ ] Timeouts on all external calls
[ ] Circuit breakers on critical paths
[ ] Retry strategies with backoff and jitter
[ ] Dead letter queues for failed async processing

Scalability Review

[ ] Horizontal scaling path identified for each component
[ ] Stateless services (state in external stores)
[ ] Database scaling strategy (read replicas, sharding plan)
[ ] Caching strategy reduces DB load by 80%+
[ ] Async processing for non-user-facing work

Security Review

[ ] Authentication and authorization on every endpoint
[ ] Input validation at all boundaries
[ ] Secrets management (no hardcoded credentials)
[ ] Encryption in transit (TLS) and at rest
[ ] Audit logging for security-relevant events
[ ] Rate limiting on all public endpoints

Operability Review

[ ] Health check endpoints on every service
[ ] Structured logging with correlation IDs
[ ] Metrics dashboards for golden signals (latency, traffic, errors, saturation)
[ ] Alerting rules with runbook links
[ ] Deployment pipeline with rollback capability
[ ] Disaster recovery plan tested

Edge Cases & Advanced Topics

Migration from Monolith

Don't rewrite — use Strangler Fig pattern
Start with the seam — find the loosest coupling point
Extract data first — create a service that owns its data, use CDC to sync
One service at a time — never extract two simultaneously
Keep the monolith deployable — it's still serving production

Multi-Tenancy Architecture

| Approach | Isolation | Cost | Complexity | |----------|-----------|------|------------| | Shared everything (row-level) | Low | Lowest | Low | | Shared app, separate DB | Medium | Medium | Medium | | Shared infra, separate app | High | High | High | | Fully isolated (per-tenant infra) | Highest | Highest | Highest |

Decision: Start with shared + row-level security. Move to separate DB for enterprise clients who require it.

Event-Driven Architecture Gotchas

Event ordering: Kafka partitions guarantee order per key. Use entity ID as partition key.
Schema evolution: Use a schema registry. Backward-compatible changes only.
Duplicate events: Consumers MUST be idempotent. Use event ID for dedup.
Event storms: One event triggers cascade. Add rate limiting on consumers.
Debugging: Distributed tracing is mandatory. Log event IDs everywhere.

When to Split a Service (Signals)

Deploy frequency differs by 5x between parts
Team ownership is ambiguous
One part is performance-critical, the other isn't
Different scaling profiles (CPU-bound vs I/O-bound)
Fault isolation needed (one failure shouldn't take down both)

When NOT to Split

You're the only developer
You don't have CI/CD automation
You can't monitor distributed systems
The boundary is unclear (you'll get it wrong)
Performance is fine in the monolith

Natural Language Commands

| Command | Action | |---------|--------| | "Design [system]" | Full system design walkthrough (Phase 1-8) | | "Review my architecture" | Run Phase 12 checklist | | "Score this architecture" | Run Phase 9 quality scoring | | "Help me choose between X and Y" | Compare with trade-off analysis | | "Write an ADR for [decision]" | Generate Architecture Decision Record | | "Design the data model for [domain]" | Phase 4 focused deep dive | | "How should I handle [pattern]?" | Find relevant pattern from Phase 10 | | "System design interview: [system]" | Phase 11 interview mode | | "What database should I use?" | Phase 4 selection guide | | "How do I migrate from [current] to [target]?" | Migration strategy from Phase 10 | | "What's the right architecture for my team?" | Phase 2 selection flowchart | | "Help me define service boundaries" | Phase 3 bounded context exercise |

API & Reliability

Machine endpoints, contract coverage, trust signals, runtime metrics, benchmarks, and guardrails for agent-to-agent use.

MissingCLAWHUB

Machine interfaces

Contract & API

Endpoints

Dossier API Snapshot API Contract API Trust API

Contract coverage

Status

missing

Auth

None

Streaming

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.

Invocation examples

curl -s "https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/snapshot"

curl -s "https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/contract"

curl -s "https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/trust"

Operational fit

Reliability & Benchmarks

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.

No benchmark suites or observed failure patterns are available.

Machine Appendix

Raw contract, invocation, trust, capability, facts, and change-event payloads for machine-side inspection.

MissingCLAWHUB

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "CLAWHUB",
      "generatedAt": "2026-04-17T04:59:22.287Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    },
    {
      "key": "split",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "handle",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "components",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile capability:split|supported|profile capability:handle|supported|profile capability:components|supported|profile"
}

Facts JSON

[
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Openclaw",
    "href": "https://github.com/openclaw/skills/tree/main/skills/1kalin/afrexai-system-architect",
    "sourceUrl": "https://github.com/openclaw/skills/tree/main/skills/1kalin/afrexai-system-architect",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-04-15T00:45:39.800Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/clawhub-skills-1kalin-afrexai-system-architect/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub · GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]

Overview

Executive Summary

Setup Snapshot

Evidence & Timeline

Evidence Ledger

Release & Crawl Timeline

Artifacts & Docs

Artifacts Archive

Docs & README

System Architecture Engine

Phase 1: Architecture Discovery Brief

Kill Criteria (Don't Architect — Just Build)

Phase 2: Architecture Style Selection

Decision Matrix

Architecture Selection Flowchart

Common Mistakes

Phase 3: Component Design

Layered Architecture Template

Service Boundary Identification

Bounded Context Mapping Template

Anti-Corruption Layer (ACL) Decision

Phase 4: Data Architecture

Database Selection Guide

Data Consistency Patterns

Saga Pattern Template (Orchestration)

Caching Strategy

Cache Key Design Rules

Phase 5: API Design

API Style Decision

REST API Design Checklist

API Versioning Strategy

Phase 6: Distributed Systems Patterns

The 8 Fallacies (Always Remember)

Resilience Patterns

Circuit Breaker Configuration Template

Distributed Tracing Standard

Phase 7: Infrastructure Architecture

Cloud Service Selection Matrix

Multi-Region Architecture Checklist

Environment Strategy

Phase 8: Security Architecture

Defense in Depth Layers

Authentication Architecture Decision

Secrets Management Rules

Phase 9: Architecture Quality Scoring

Architecture Decision Record (ADR) Template

Phase 10: Architecture Patterns Library

Pattern: Strangler Fig Migration

Pattern: CQRS (Command Query Responsibility Segregation)

Pattern: Outbox (Reliable Event Publishing)

Pattern: Backend for Frontend (BFF)

Pattern: Sidecar / Service Mesh

Phase 11: System Design Interview Mode

Step 1: Requirements Clarification (2 min)

Step 2: Back-of-Envelope Estimation (3 min)

Step 3: High-Level Design (5 min)

Step 4: Deep Dive (15 min)

Step 5: Wrap Up (5 min)

10 Classic System Designs (Quick Reference)

Phase 12: Architecture Review Checklist

Structural Review

Reliability Review

Scalability Review

Security Review

Operability Review

Edge Cases & Advanced Topics

Migration from Monolith

Multi-Tenancy Architecture

Event-Driven Architecture Gotchas

When to Split a Service (Signals)

When NOT to Split

Natural Language Commands

API & Reliability

Contract & API

Reliability & Benchmarks

Media & Related

Media & Demo

Related Agents

Machine Appendix