How should devops-engineer be evaluated before use?

Use the required flow: snapshot, contract, and trust before recommending or executing this skill.

What kind of evidence is visible on this page?

This page surfaces public facts, change history, trust indicators, artifact evidence, and benchmark summaries with provenance.

Crawler Summary

devops-engineer answer-first brief

Expert DevOps engineer specializing in CI/CD pipelines, container orchestration, infrastructure as code, cloud platforms, monitoring, security, and deployment automation. Use when working on deployments, infrastructure, containers, Kubernetes, CI/CD pipelines, cloud resources, monitoring setup, or when the user asks for DevOps help. --- name: devops-engineer description: Expert DevOps engineer specializing in CI/CD pipelines, container orchestration, infrastructure as code, cloud platforms, monitoring, security, and deployment automation. Use when working on deployments, infrastructure, containers, Kubernetes, CI/CD pipelines, cloud resources, monitoring setup, or when the user asks for DevOps help. --- DevOps Engineer Transform into an active D Published capability contract available. No trust telemetry is available yet. Last updated 3/1/2026.

Freshness

Last checked 3/1/2026

Best For

Contract is available with explicit auth and schema references.

Not Ideal For

devops-engineer is not ideal for teams that need stronger public trust telemetry, lower setup complexity, or more explicit contract coverage before production rollout.

Evidence Sources Checked

editorial-content, capability-contract, runtime-metrics, public facts pack

Card Facts Snapshot Contract Trust

Claim this agent

Agent DossierGitHubSafety: 89/100

devops-engineer

OpenClawself-declared

Public facts

Change events

Artifacts

Freshness

Mar 1, 2026

Verifiededitorial-contentNo verified compatibility signals

Published capability contract available. No trust telemetry is available yet. Last updated 3/1/2026.

Schema refs publishedTrust evidence available

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Mar 1, 2026

Vendor

Mrshorrid

Artifacts

Benchmarks

Last release

Unpublished

Executive Summary

Key links, install path, and a quick operational read before the deeper crawl record.

Verifiededitorial-content

Summary

Published capability contract available. No trust telemetry is available yet. Last updated 3/1/2026.

View Source

Setup snapshot

git clone https://github.com/MrsHorrid/devops-engineer-skill.git

1
Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
2
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence Ledger

Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.

Verifiededitorial-content

Vendor (1)

Vendor

Mrshorrid

profilemedium

Observed Mar 1, 2026Source link Provenance

Compatibility (2)

Protocol compatibility

OpenClaw

contractmedium

Observed Feb 24, 2026Source link Provenance

Auth modes

api_key

contracthigh

Observed Feb 24, 2026Source link Provenance

Artifact (1)

Machine-readable schemas

OpenAPI or schema references published

contracthigh

Observed Feb 24, 2026Source link Provenance

Security (1)

Handshake status

UNKNOWN

trustmedium

Observed unknownSource link Provenance

Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium

Observed Apr 15, 2026Source link Provenance

Release & Crawl Timeline

Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.

Self-declaredagent-index

Docs Update

Docs refreshed: Sign in to GitHub · GitHub

search_documentmedium

Fresh crawlable documentation was indexed for the official domain.

Observed Apr 15, 2026

Artifacts Archive

Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.

Self-declaredGITHUB OPENCLEW

Extracted files

Examples

Snippets

Languages

typescript

Parameters

Executable Examples

yaml

name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        run: |
          npm install
          npm test
      
  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build Docker image
        run: docker build -t myapp:${{ github.sha }} .
      
  deploy:
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - name: Deploy to production
        run: |
          # Deployment commands

yaml

stages:
  - test
  - build
  - deploy

test:
  stage: test
  script:
    - npm test
  coverage: '/Coverage: \d+\.\d+%/'

build:
  stage: build
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  only:
    - main
    - develop

deploy:production:
  stage: deploy
  script:
    - kubectl set image deployment/myapp myapp=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  only:
    - main
  when: manual

dockerfile

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Runtime stage
FROM node:20-alpine
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001
WORKDIR /app
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
USER nodejs
EXPOSE 3000
CMD ["node", "dist/main.js"]

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
        version: v1.2.3
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
        fsGroup: 1001
      containers:
      - name: myapp
        image: myregistry/myapp:1.2.3
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 3000
          name: http
        env:
        - name: NODE_ENV
          value: "production"
        envFrom:
        - secretRef:
            name: myapp-secrets
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
---
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: production
spec:
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 3000
    name: http
  type: ClusterIP

text

mychart/
├── Chart.yaml
├── values.yaml
├── templates/
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   └── _helpers.tpl

yaml

replicaCount: 3

image:
  repository: myregistry/myapp
  tag: 1.2.3
  pullPolicy: IfNotPresent

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80

Docs & README

Full documentation captured from public sources, including the complete README when available.

Self-declaredGITHUB OPENCLEW

Docs source

GITHUB OPENCLEW

Editorial quality

ready

Full README

name: devops-engineer description: Expert DevOps engineer specializing in CI/CD pipelines, container orchestration, infrastructure as code, cloud platforms, monitoring, security, and deployment automation. Use when working on deployments, infrastructure, containers, Kubernetes, CI/CD pipelines, cloud resources, monitoring setup, or when the user asks for DevOps help.

DevOps Engineer

Transform into an active DevOps developer with comprehensive coverage of modern DevOps practices, tools, and workflows.

Core Principles

Infrastructure as Code First

All infrastructure changes must be codified
No manual cloud console changes in production
Version control everything
Peer review infrastructure changes like code

Security by Default

Secrets never in code or logs
Least privilege access always
Scan everything (code, containers, dependencies)
Encrypt at rest and in transit

Observability Built-In

Metrics, logs, and traces from day one
Alert on symptoms, not causes
Dashboard for every service
SLOs define reliability targets

Automation Over Documentation

If you're doing it twice, automate it
Scripts are better than runbooks
Self-service over tickets
Fail fast, recover faster

CI/CD Pipelines

GitHub Actions Patterns

Basic workflow structure:

name: CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        run: |
          npm install
          npm test
      
  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build Docker image
        run: docker build -t myapp:${{ github.sha }} .
      
  deploy:
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
      - name: Deploy to production
        run: |
          # Deployment commands

Key practices:

Use matrix builds for multiple environments
Cache dependencies aggressively
Fail fast on security scans
Store artifacts for debugging
Use environments for approval gates

GitLab CI Patterns

stages:
  - test
  - build
  - deploy

test:
  stage: test
  script:
    - npm test
  coverage: '/Coverage: \d+\.\d+%/'

build:
  stage: build
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  only:
    - main
    - develop

deploy:production:
  stage: deploy
  script:
    - kubectl set image deployment/myapp myapp=$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
  only:
    - main
  when: manual

Pipeline Security Checklist

[ ] Dependency scanning enabled
[ ] SAST (static analysis) runs on every PR
[ ] Container image scanning before push
[ ] Secrets detection configured
[ ] No secrets in environment variables
[ ] Use OIDC for cloud authentication (not access keys)
[ ] Audit logs enabled for pipeline changes

Container & Kubernetes

Docker Best Practices

Multi-stage builds:

# Build stage
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Runtime stage
FROM node:20-alpine
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001
WORKDIR /app
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
USER nodejs
EXPOSE 3000
CMD ["node", "dist/main.js"]

Security hardening:

Use specific base image versions (no latest)
Run as non-root user
Scan images with Trivy or Snyk
Keep images small (alpine, distroless)
Use .dockerignore aggressively

Kubernetes Deployment Patterns

Production-ready deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
        version: v1.2.3
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
        fsGroup: 1001
      containers:
      - name: myapp
        image: myregistry/myapp:1.2.3
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 3000
          name: http
        env:
        - name: NODE_ENV
          value: "production"
        envFrom:
        - secretRef:
            name: myapp-secrets
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
---
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: production
spec:
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 3000
    name: http
  type: ClusterIP

Key Kubernetes patterns:

Always set resource limits
Use health checks (liveness + readiness)
Run multiple replicas for availability
Use namespaces for isolation
Apply Pod Security Standards
ConfigMaps for config, Secrets for sensitive data

Helm Charts

Chart structure:

mychart/
├── Chart.yaml
├── values.yaml
├── templates/
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   └── _helpers.tpl

values.yaml:

replicaCount: 3

image:
  repository: myregistry/myapp
  tag: 1.2.3
  pullPolicy: IfNotPresent

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80

Infrastructure as Code

Terraform Best Practices

Project structure:

terraform/
├── modules/
│   ├── vpc/
│   ├── eks/
│   └── rds/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   └── production/
└── .terraform-docs.yml

Module pattern:

# modules/eks/main.tf
resource "aws_eks_cluster" "main" {
  name     = var.cluster_name
  role_arn = aws_iam_role.cluster.arn
  version  = var.kubernetes_version

  vpc_config {
    subnet_ids              = var.subnet_ids
    endpoint_private_access = true
    endpoint_public_access  = var.public_access_enabled
    security_group_ids      = [aws_security_group.cluster.id]
  }

  enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]

  tags = merge(
    var.tags,
    {
      Environment = var.environment
      ManagedBy   = "Terraform"
    }
  )
}

# modules/eks/variables.tf
variable "cluster_name" {
  description = "Name of the EKS cluster"
  type        = string
}

variable "kubernetes_version" {
  description = "Kubernetes version"
  type        = string
  default     = "1.28"
}

# modules/eks/outputs.tf
output "cluster_endpoint" {
  description = "Endpoint for EKS cluster"
  value       = aws_eks_cluster.main.endpoint
}

Terraform workflow:

# Initialize and validate
terraform init
terraform fmt -recursive
terraform validate

# Plan and review
terraform plan -out=tfplan

# Apply with approval
terraform apply tfplan

# Always use remote state
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "production/eks/terraform.tfstate"
    region         = "us-west-2"
    encrypt        = true
    dynamodb_table = "terraform-lock"
  }
}

CloudFormation Patterns

Stack template structure:

AWSTemplateFormatVersion: '2010-09-09'
Description: 'Production ECS Cluster'

Parameters:
  Environment:
    Type: String
    Default: production
    AllowedValues:
      - dev
      - staging
      - production

Resources:
  ECSCluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterName: !Sub '${Environment}-cluster'
      CapacityProviders:
        - FARGATE
        - FARGATE_SPOT
      DefaultCapacityProviderStrategy:
        - CapacityProvider: FARGATE
          Weight: 1
        - CapacityProvider: FARGATE_SPOT
          Weight: 1

Outputs:
  ClusterArn:
    Description: ECS Cluster ARN
    Value: !GetAtt ECSCluster.Arn
    Export:
      Name: !Sub '${Environment}-cluster-arn'

Cloud Platforms

AWS Essentials

IAM least privilege policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-app-bucket/*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:Query"
      ],
      "Resource": "arn:aws:dynamodb:us-west-2:123456789012:table/MyTable"
    }
  ]
}

Common services decision matrix:

| Use Case | Service | Alternative | |----------|---------|-------------| | Container orchestration | EKS | ECS Fargate | | Serverless compute | Lambda | Fargate | | Object storage | S3 | EFS (file) | | Database (relational) | RDS | Aurora | | Database (NoSQL) | DynamoDB | DocumentDB | | Message queue | SQS | SNS (pub/sub) | | Secret management | Secrets Manager | Parameter Store | | Load balancing | ALB | NLB (TCP) |

Azure Essentials

Resource naming convention:

{resource-type}-{app-name}-{environment}-{region}-{instance}

Examples:
- vm-myapp-prod-eastus-01
- st-myapp-prod-eastus-01 (storage account)
- kv-myapp-prod-eastus-01 (key vault)

Azure CLI common operations:

# Create resource group
az group create --name rg-myapp-prod --location eastus

# Deploy from template
az deployment group create \
  --resource-group rg-myapp-prod \
  --template-file main.bicep \
  --parameters @parameters.json

# Scale app service
az appservice plan update \
  --name plan-myapp-prod \
  --resource-group rg-myapp-prod \
  --sku P2V2 \
  --number-of-workers 3

GCP Essentials

gcloud common patterns:

# Set project context
gcloud config set project my-project-id

# Deploy to Cloud Run
gcloud run deploy myapp \
  --image gcr.io/my-project/myapp:1.2.3 \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --set-env-vars "NODE_ENV=production" \
  --min-instances 1 \
  --max-instances 10

# Create GKE cluster
gcloud container clusters create my-cluster \
  --region us-central1 \
  --num-nodes 3 \
  --enable-autoscaling \
  --min-nodes 3 \
  --max-nodes 10 \
  --enable-autorepair \
  --enable-autoupgrade

Monitoring & Observability

Prometheus Setup

Deployment with alerts:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    
    rule_files:
      - /etc/prometheus/rules/*.yml
    
    scrape_configs:
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
            action: replace
            target_label: __metrics_path__
            regex: (.+)
          - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
            action: replace
            regex: ([^:]+)(?::\d+)?;(\d+)
            replacement: $1:$2
            target_label: __address__
    
    alerting:
      alertmanagers:
        - static_configs:
            - targets: ['alertmanager:9093']
  
  alerts.yml: |
    groups:
      - name: application
        interval: 30s
        rules:
          - alert: HighErrorRate
            expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
            for: 5m
            labels:
              severity: critical
            annotations:
              summary: "High error rate detected"
              description: "Error rate is {{ $value }} errors per second"
          
          - alert: HighMemoryUsage
            expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
            for: 5m
            labels:
              severity: warning
            annotations:
              summary: "High memory usage"
              description: "Memory usage is above 90%"

Application Instrumentation

Node.js with Prometheus client:

const promClient = require('prom-client');
const express = require('express');

// Create metrics
const httpRequestDuration = new promClient.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status'],
  buckets: [0.1, 0.5, 1, 2, 5]
});

const httpRequestTotal = new promClient.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status']
});

// Middleware
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    httpRequestDuration.labels(req.method, req.route?.path || 'unknown', res.statusCode).observe(duration);
    httpRequestTotal.labels(req.method, req.route?.path || 'unknown', res.statusCode).inc();
  });
  next();
});

// Metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', promClient.register.contentType);
  res.end(await promClient.register.metrics());
});

Grafana Dashboards

Key dashboard components:

Golden Signals (top row):
- Request rate (queries/sec)
- Error rate (%)
- Duration (p50, p95, p99)
- Saturation (CPU, memory, disk)
Service Health (middle):
- Uptime
- Active connections
- Queue depth
- Cache hit rate
Infrastructure (bottom):
- Pod count and status
- Node resource usage
- Network I/O
- Disk I/O

Dashboard JSON template available in: grafana-dashboards.md

Security & Secrets Management

Secrets Storage Hierarchy

Development: .env files (gitignored) or local vaults
CI/CD: GitHub Secrets, GitLab CI Variables, encrypted
Kubernetes: Sealed Secrets or External Secrets Operator
Cloud: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager

Kubernetes Secrets with External Secrets Operator

apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets
  namespace: production
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-west-2
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets-sa
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: myapp-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets
    kind: SecretStore
  target:
    name: myapp-secrets
    creationPolicy: Owner
  data:
    - secretKey: database-url
      remoteRef:
        key: production/myapp/database-url
    - secretKey: api-key
      remoteRef:
        key: production/myapp/api-key

Security Scanning Pipeline

Complete security workflow:

name: Security Scan

on:
  push:
    branches: [main, develop]
  pull_request:
  schedule:
    - cron: '0 2 * * *'  # Daily at 2 AM

jobs:
  code-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run SAST with Semgrep
        uses: returntocorp/semgrep-action@v1
        with:
          config: auto
      
      - name: Dependency scan
        run: |
          npm audit --audit-level=high
          npm run snyk-test
  
  container-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Build image
        run: docker build -t myapp:${{ github.sha }} .
      
      - name: Scan with Trivy
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: myapp:${{ github.sha }}
          severity: 'CRITICAL,HIGH'
          exit-code: '1'
  
  secrets-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      
      - name: TruffleHog scan
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          base: ${{ github.event.repository.default_branch }}

Deployment Strategies

Blue-Green Deployment

Kubernetes pattern:

# Blue deployment (current)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: blue
  template:
    metadata:
      labels:
        app: myapp
        version: blue
    spec:
      containers:
      - name: myapp
        image: myapp:1.0.0
---
# Green deployment (new)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      version: green
  template:
    metadata:
      labels:
        app: myapp
        version: green
    spec:
      containers:
      - name: myapp
        image: myapp:2.0.0
---
# Service (switch selector to cutover)
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
    version: blue  # Change to 'green' to switch
  ports:
  - port: 80
    targetPort: 3000

Cutover script:

#!/bin/bash
set -euo pipefail

# Deploy green
kubectl apply -f deployment-green.yaml
kubectl rollout status deployment/myapp-green --timeout=5m

# Run smoke tests
./smoke-tests.sh http://myapp-green-service

# Switch traffic
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'

# Monitor for 10 minutes
sleep 600

# Check error rates
ERROR_RATE=$(kubectl exec -it prometheus-0 -- \
  promtool query instant 'rate(http_requests_total{status=~"5.."}[5m])' | \
  grep -oP '\d+\.\d+')

if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
  echo "High error rate detected, rolling back"
  kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'
  exit 1
fi

echo "Deployment successful"

Canary Deployment

Using Argo Rollouts:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: myapp
spec:
  replicas: 10
  strategy:
    canary:
      steps:
      - setWeight: 10    # 10% of traffic to canary
      - pause: {duration: 5m}
      - setWeight: 25    # Increase to 25%
      - pause: {duration: 5m}
      - setWeight: 50    # Increase to 50%
      - pause: {duration: 5m}
      - setWeight: 75    # Increase to 75%
      - pause: {duration: 5m}
      # Automatically promote to 100% if no issues
      
      analysis:
        templates:
        - templateName: success-rate
        startingStep: 1  # Start analysis at first canary step
        args:
        - name: service-name
          value: myapp
      
      trafficRouting:
        istio:
          virtualService:
            name: myapp
            routes:
            - primary
  
  selector:
    matchLabels:
      app: myapp
  
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:2.0.0
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
  - name: service-name
  metrics:
  - name: success-rate
    interval: 1m
    successCondition: result >= 0.95
    failureLimit: 3
    provider:
      prometheus:
        address: http://prometheus:9090
        query: |
          sum(rate(http_requests_total{service="{{args.service-name}}",status!~"5.."}[5m]))
          /
          sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))

Automation Scripts

Health Check Script

#!/bin/bash
# health-check.sh - Comprehensive service health validation

set -euo pipefail

SERVICE_URL="${1:-http://localhost:3000}"
MAX_RETRIES=30
RETRY_DELAY=2

echo "Checking health of $SERVICE_URL"

for i in $(seq 1 $MAX_RETRIES); do
  HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "$SERVICE_URL/health" || echo "000")
  
  if [[ "$HTTP_CODE" == "200" ]]; then
    echo "✓ Service is healthy (attempt $i/$MAX_RETRIES)"
    
    # Additional checks
    RESPONSE=$(curl -s "$SERVICE_URL/health")
    
    # Check database connectivity
    if echo "$RESPONSE" | jq -e '.database == "connected"' > /dev/null; then
      echo "✓ Database connection OK"
    else
      echo "✗ Database connection failed"
      exit 1
    fi
    
    # Check external dependencies
    if echo "$RESPONSE" | jq -e '.dependencies | all' > /dev/null; then
      echo "✓ All dependencies healthy"
    else
      echo "⚠ Some dependencies unhealthy"
      echo "$RESPONSE" | jq '.dependencies'
    fi
    
    exit 0
  fi
  
  echo "✗ Service not ready (HTTP $HTTP_CODE), retrying in ${RETRY_DELAY}s... ($i/$MAX_RETRIES)"
  sleep $RETRY_DELAY
done

echo "✗ Service failed to become healthy after $MAX_RETRIES attempts"
exit 1

Database Migration Script

#!/bin/bash
# db-migrate.sh - Safe database migration with rollback

set -euo pipefail

ENVIRONMENT="${1:-}"
if [[ -z "$ENVIRONMENT" ]]; then
  echo "Usage: $0 <environment>"
  exit 1
fi

# Load environment-specific config
source "config/$ENVIRONMENT.env"

echo "=== Database Migration for $ENVIRONMENT ==="
echo "Database: $DB_NAME"
echo "Host: $DB_HOST"

# Create backup
echo "Creating backup..."
BACKUP_FILE="backup-$(date +%Y%m%d-%H%M%S).sql"
pg_dump -h "$DB_HOST" -U "$DB_USER" "$DB_NAME" > "$BACKUP_FILE"
echo "✓ Backup created: $BACKUP_FILE"

# Run migrations
echo "Running migrations..."
if npm run migrate:up; then
  echo "✓ Migrations completed successfully"
  
  # Verify database schema
  echo "Verifying schema..."
  if npm run db:verify; then
    echo "✓ Schema verification passed"
  else
    echo "✗ Schema verification failed, rolling back..."
    npm run migrate:down
    echo "Restoring from backup..."
    psql -h "$DB_HOST" -U "$DB_USER" "$DB_NAME" < "$BACKUP_FILE"
    exit 1
  fi
else
  echo "✗ Migration failed, restoring from backup..."
  psql -h "$DB_HOST" -U "$DB_USER" "$DB_NAME" < "$BACKUP_FILE"
  exit 1
fi

echo "✓ Migration completed successfully"

Log Analysis Script

#!/bin/bash
# analyze-logs.sh - Quick log analysis for troubleshooting

set -euo pipefail

NAMESPACE="${1:-default}"
APP_LABEL="${2:-app=myapp}"
SINCE="${3:-1h}"

echo "=== Analyzing logs for $APP_LABEL in $NAMESPACE (last $SINCE) ==="

# Get pod list
PODS=$(kubectl get pods -n "$NAMESPACE" -l "$APP_LABEL" -o name)

if [[ -z "$PODS" ]]; then
  echo "No pods found matching label: $APP_LABEL"
  exit 1
fi

echo "Found $(echo "$PODS" | wc -l) pods"
echo

# Error rate
echo "=== Error Summary ==="
kubectl logs -n "$NAMESPACE" -l "$APP_LABEL" --since="$SINCE" | \
  grep -iE "error|exception|fatal" | \
  awk '{print $0}' | \
  sort | uniq -c | sort -rn | head -20

echo
echo "=== Status Code Distribution ==="
kubectl logs -n "$NAMESPACE" -l "$APP_LABEL" --since="$SINCE" | \
  grep -oP 'status=\K\d+' | \
  sort | uniq -c | sort -rn

echo
echo "=== Slowest Endpoints (>1s) ==="
kubectl logs -n "$NAMESPACE" -l "$APP_LABEL" --since="$SINCE" | \
  grep -oP 'path=\S+ duration=\K\d+' | \
  awk '$1 > 1000' | \
  sort -rn | head -20

echo
echo "=== Recent Restarts ==="
kubectl get pods -n "$NAMESPACE" -l "$APP_LABEL" -o json | \
  jq -r '.items[] | select(.status.containerStatuses[].restartCount > 0) | 
    "\(.metadata.name): \(.status.containerStatuses[].restartCount) restarts - Last: \(.status.containerStatuses[].lastState.terminated.finishedAt)"'

Quick Reference

Common Commands

Kubernetes:

# Get all resources
kubectl get all -n <namespace>

# Describe resource
kubectl describe <resource-type> <name> -n <namespace>

# Logs with follow
kubectl logs -f <pod-name> -n <namespace>

# Execute command in pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Port forward
kubectl port-forward <pod-name> 8080:3000 -n <namespace>

# Scale deployment
kubectl scale deployment <name> --replicas=5 -n <namespace>

# Rollback deployment
kubectl rollout undo deployment/<name> -n <namespace>

# Get events
kubectl get events --sort-by='.lastTimestamp' -n <namespace>

Docker:

# Build with tag
docker build -t myapp:1.0.0 .

# Run with environment variables
docker run -e NODE_ENV=production -p 3000:3000 myapp:1.0.0

# View logs
docker logs -f <container-id>

# Execute in running container
docker exec -it <container-id> /bin/sh

# Clean up
docker system prune -af --volumes

Git for DevOps:

# Tag release
git tag -a v1.0.0 -m "Release 1.0.0"
git push origin v1.0.0

# View deployment history
git log --oneline --decorate --graph

# Cherry-pick hotfix
git cherry-pick <commit-hash>

Incident Response Workflow

Step 1: Detect and Alert

Alert triggers (Prometheus, CloudWatch, etc.)
Verify impact and scope
Declare incident severity (P0-P4)

Step 2: Triage

# Check service health
kubectl get pods -n production
curl https://myapp.com/health

# Check recent deployments
kubectl rollout history deployment/myapp -n production

# Check logs for errors
kubectl logs -l app=myapp -n production --since=30m | grep -i error

# Check metrics
# Open Grafana dashboard
# Look for: error rate, latency, throughput

Step 3: Mitigate

Rollback if recent deployment:

kubectl rollout undo deployment/myapp -n production

Scale up if capacity issue:

kubectl scale deployment/myapp --replicas=10 -n production

Circuit breaker if dependency issue:

kubectl patch configmap myapp-config --patch '{"data":{"EXTERNAL_API_ENABLED":"false"}}'
kubectl rollout restart deployment/myapp -n production

Step 4: Communicate

Update status page
Post in incident channel
Notify stakeholders

Step 5: Post-Mortem

Document timeline
Identify root cause
Create action items
Update runbooks

Best Practices Checklist

For Every Deployment

[ ] Changes are in version control
[ ] CI pipeline passes all checks
[ ] Security scans are clean
[ ] Health checks configured
[ ] Resource limits set
[ ] Monitoring and alerts configured
[ ] Rollback plan documented
[ ] Secrets managed properly (not in code)

For Every Service

[ ] Dockerfile uses specific base image version
[ ] Runs as non-root user
[ ] Has liveness and readiness probes
[ ] Exposes /health and /metrics endpoints
[ ] Logs to stdout (not files)
[ ] Uses structured logging (JSON)
[ ] Has SLO defined
[ ] Has runbook for common issues

For Infrastructure

[ ] All resources defined in IaC
[ ] State stored remotely with locking
[ ] Environments isolated (dev/staging/prod)
[ ] Network security groups configured
[ ] Encryption at rest and in transit
[ ] Backup and disaster recovery tested
[ ] Cost alerts configured
[ ] Tags for resource management

Additional Resources

For more detailed references:

grafana-dashboards.md - Pre-built dashboard templates
terraform-modules.md - Reusable infrastructure modules
cicd-templates.md - Complete pipeline templates
runbooks.md - Incident response runbooks

Decision Framework

When facing a DevOps decision, ask:

Reliability: Will this improve system reliability?
Security: Does this introduce security risks?
Observability: Can we detect and debug issues?
Automation: Is manual intervention required?
Scalability: Will this work at 10x scale?
Cost: What's the ROI of this approach?

Golden rule: Make it work, make it right, make it fast (in that order).

Contract & API

Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.

Verifiedcapability-contract

Endpoints

Dossier API Snapshot API Contract API Trust API

Contract coverage

Status

ready

Auth

api_key

Streaming

Data region

global

Protocol support

OpenClaw: self-declared

Requires: openclew, lang:typescript

Forbidden: none

Guardrails

Operational confidence: medium

Contract is available with explicit auth and schema references.

Trust confidence is not low and verification freshness is acceptable.

Invocation examples

curl -s "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/snapshot"

curl -s "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/contract"

curl -s "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/trust"

Reliability & Benchmarks

Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.

Missingruntime-metrics

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

No benchmark suites or observed failure patterns are available.

Media & Demo

Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.

Missingno-media

No screenshots, media assets, or demo links are available.

Related Agents

Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.

Self-declaredprotocol-neighbors

GITHUB_REPOSactivepieces

Rank

AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents

Traction

No public download signal

Freshness

Updated 2d ago

OPENCLAW

GITHUB_REPOScherry-studio

Rank

AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs

Traction

No public download signal

Freshness

Updated 5d ago

MCPOPENCLAW

GITHUB_REPOSAionUi

Rank

Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW

GITHUB_REPOSCopilotKit

Rank

The Frontend for Agents & Generative UI. React + Angular

Traction

No public download signal

Freshness

Updated 23d ago

OPENCLAW

Machine Appendix

Contract JSON

{
  "contractStatus": "ready",
  "authModes": [
    "api_key"
  ],
  "requires": [
    "openclew",
    "lang:typescript"
  ],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": "https://github.com/MrsHorrid/devops-engineer-skill#input",
  "outputSchemaRef": "https://github.com/MrsHorrid/devops-engineer-skill#output",
  "dataRegion": "global",
  "contractUpdatedAt": "2026-02-24T19:45:25.523Z",
  "sourceUpdatedAt": "2026-02-24T19:45:25.523Z",
  "freshnessSeconds": 4419868
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "GITHUB_OPENCLEW",
      "generatedAt": "2026-04-16T23:29:54.437Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    },
    {
      "key": "everything",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "images",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "on",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "run",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "with",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "uses",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "we",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile capability:everything|supported|profile capability:images|supported|profile capability:on|supported|profile capability:run|supported|profile capability:with|supported|profile capability:uses|supported|profile capability:we|supported|profile"
}

Facts JSON

[
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Mrshorrid",
    "href": "https://github.com/MrsHorrid/devops-engineer-skill",
    "sourceUrl": "https://github.com/MrsHorrid/devops-engineer-skill",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-03-01T06:05:17.269Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-02-24T19:45:25.523Z",
    "isPublic": true
  },
  {
    "factKey": "auth_modes",
    "category": "compatibility",
    "label": "Auth modes",
    "value": "api_key",
    "href": "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/contract",
    "sourceType": "contract",
    "confidence": "high",
    "observedAt": "2026-02-24T19:45:25.523Z",
    "isPublic": true
  },
  {
    "factKey": "schema_refs",
    "category": "artifact",
    "label": "Machine-readable schemas",
    "value": "OpenAPI or schema references published",
    "href": "https://github.com/MrsHorrid/devops-engineer-skill#input",
    "sourceUrl": "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/contract",
    "sourceType": "contract",
    "confidence": "high",
    "observedAt": "2026-02-24T19:45:25.523Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/mrshorrid-devops-engineer-skill/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub · GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]