How should clawvid be evaluated before use?

Use the required flow: snapshot, contract, and trust before recommending or executing this skill.

What kind of evidence is visible on this page?

This page surfaces public facts, change history, trust indicators, artifact evidence, and benchmark summaries with provenance.

Crawler Summary

clawvid answer-first brief

ClawVid ClawVid Generate short-form videos (YouTube Shorts, TikTok, Reels) from text prompts. You are the orchestrator. You plan scenes, write prompts, generate a workflow JSON, and call clawvid generate to execute the full pipeline. --- 🚨 MANDATORY: READ THIS SKILL EVERY TIME **Before creating ANY video workflow, you MUST read this entire SKILL.md file.** This skill contains critical rules about: - Workflow JSON structure Capability contract not published. No trust telemetry is available yet. 5 GitHub stars reported by the source. Last updated 2/25/2026.

Freshness

Last checked 2/25/2026

Best For

clawvid is best for accent, pick, style workflows where OpenClaw compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack

Card Facts Snapshot Contract Trust

Claim this agent

Agent DossierGitHubSafety: 89/100

clawvid

OpenClawself-declared

Public facts

Change events

Artifacts

Freshness

Feb 25, 2026

Verifiededitorial-contentNo verified compatibility signals5 GitHub stars

Capability contract not published. No trust telemetry is available yet. 5 GitHub stars reported by the source. Last updated 2/25/2026.

5 GitHub starsTrust evidence available

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Feb 25, 2026

Vendor

Neur0map

Artifacts

Benchmarks

Last release

Unpublished

Executive Summary

Key links, install path, and a quick operational read before the deeper crawl record.

Verifiededitorial-content

Summary

Capability contract not published. No trust telemetry is available yet. 5 GitHub stars reported by the source. Last updated 2/25/2026.

View Source

Setup snapshot

git clone https://github.com/neur0map/clawvid.git

1
Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
2
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence Ledger

Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.

Verifiededitorial-content

Vendor (1)

Vendor

Neur0map

profilemedium

Observed Feb 25, 2026Source link Provenance

Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium

Observed Feb 25, 2026Source link Provenance

Adoption (1)

Adoption signal

5 GitHub stars

profilemedium

Observed Feb 25, 2026Source link Provenance

Security (1)

Handshake status

UNKNOWN

trustmedium

Observed unknownSource link Provenance

Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium

Observed Apr 15, 2026Source link Provenance

Release & Crawl Timeline

Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.

Self-declaredagent-index

Docs Update

Docs refreshed: Sign in to GitHub · GitHub

search_documentmedium

Fresh crawlable documentation was indexed for the official domain.

Observed Apr 15, 2026

Artifacts Archive

Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.

Self-declaredGITHUB OPENCLEW

Extracted files

Examples

Snippets

Languages

typescript

Parameters

Executable Examples

json

{
  "id": "intro",
  "type": "talking_head",
  "image_generation": {
    "model": "fal-ai/nano-banana-pro",
    "input": {
      "prompt": "Friendly female news anchor, professional attire, neutral background, looking at camera",
      "aspect_ratio": "9:16"
    }
  },
  "talking_head": {
    "model": "veed/fabric-1.0/text",
    "input": {
      "text": "Welcome to today's deep dive into one of history's greatest mysteries...",
      "resolution": "720p",
      "voice_description": "Confident female voice, American accent, news anchor style"
    }
  },
  "timing": {}
}

json

{
  "id": "scene_3",
  "type": "static",
  "static_image": {
    "url": "https://example.com/historical-map.jpg",
    "fit": "contain",
    "background": "#000000"
  },
  "narration": "This map from 1706 shows...",
  "timing": { "duration": 10 },
  "effects": ["kenburns_slow_zoom"]
}

json

{
  "id": "scene_2",
  "transition": {
    "model": "fal-ai/vidu/q3/image-to-video",
    "duration": "4",
    "prompt": "Smooth camera transition, continuous motion",
    "style": "3d_animation"  // optional, for PixVerse
  },
  "type": "image",
  ...
}

json

{
  "scenes": [
    {
      "id": "scene_1",
      "type": "video",
      "narration": "Welcome to the show!",
      "image_generation": { ... },
      "video_generation": { ... }
    },
    {
      "id": "scene_2",
      "transition": {
        "model": "fal-ai/vidu/q3/image-to-video",
        "duration": "4",
        "prompt": "Smooth transition, chef continues cooking, camera stays fixed"
      },
      "type": "image",
      "narration": "First, gather your ingredients...",
      "image_generation": { ... }
    },
    {
      "id": "scene_3",
      "transition": {
        "model": "fal-ai/vidu/q3/image-to-video",
        "duration": "4",
        "prompt": "Smooth transition, chef mixing bowl, continuous motion"
      },
      "type": "video",
      "narration": "Mix until smooth...",
      "image_generation": { ... },
      "video_generation": { ... }
    }
  ]
}

json

{
  "video_generation": {
    "model": "fal-ai/vidu/image-to-video",
    "input": {
      "prompt": "Chef whisks batter while camera stays completely fixed and static, only chef and whisk move, NOT realistic",
      "duration": "4",
      "movement_amplitude": "small"
    }
  }
}

json

{
  "consistency": {
    "reference_prompt": "Cartoon French chef character, white hat, blue apron, kitchen background, Pixar style",
    "seed": 55555555,
    "model": "fal-ai/nano-banana-pro"
  }
}

Docs & README

Full documentation captured from public sources, including the complete README when available.

Self-declaredGITHUB OPENCLEW

Docs source

GITHUB OPENCLEW

Editorial quality

ready

Full README

ClawVid

Generate short-form videos (YouTube Shorts, TikTok, Reels) from text prompts.

You are the orchestrator. You plan scenes, write prompts, generate a workflow JSON, and call clawvid generate to execute the full pipeline.

🚨 MANDATORY: READ THIS SKILL EVERY TIME

Before creating ANY video workflow, you MUST read this entire SKILL.md file.

This skill contains critical rules about:

Workflow JSON structure and required fields
Transition system for smooth scene changes
When to use video vs image scenes
Model-specific parameters and limitations
Common mistakes to avoid

Do not rely on memory. Read this file fresh each time.

⚠️ CRITICAL: Workflow Rules

Scene Type Rules

| Scene Type | When to Use | Motion Source | |------------|-------------|---------------| | type: "image" | Narration-heavy, descriptions, establishing shots | Ken Burns effects only | | type: "video" | Action moments, reveals, dramatic beats | AI video generation | | type: "static" | Real photos, maps, documents to SHOW as-is | None (displayed as-is) | | type: "talking_head" | AI presenter, character speaking | VEED Fabric lip-sync |

Key insight: Each type: "video" scene generates an independent 4-8s clip. Without transitions, these clips are hard-cut together, causing jarring jumps.

🆕 Talking Head Videos (VEED Fabric 1.0)

Use type: "talking_head" to create AI presenter videos with lip-synced speech:

{
  "id": "intro",
  "type": "talking_head",
  "image_generation": {
    "model": "fal-ai/nano-banana-pro",
    "input": {
      "prompt": "Friendly female news anchor, professional attire, neutral background, looking at camera",
      "aspect_ratio": "9:16"
    }
  },
  "talking_head": {
    "model": "veed/fabric-1.0/text",
    "input": {
      "text": "Welcome to today's deep dive into one of history's greatest mysteries...",
      "resolution": "720p",
      "voice_description": "Confident female voice, American accent, news anchor style"
    }
  },
  "timing": {}
}

How it works:

Image model generates the presenter/character face
VEED Fabric creates lip-synced video from the face + speech text
Video includes generated audio (no separate TTS needed)

Talking head fields:

text: The speech to lip-sync (required)
resolution: 720p or 480p (default: 720p)
voice_description: Voice styling (e.g., "British accent", "Deep male voice")

When to use talking head:

AI news anchor / presenter explaining topics
Character introductions or commentary
Educational content with a host
Any "person speaking to camera" format

Cost: ~$0.50 per talking head clip

⚠️ Important: Talking head scenes generate their own audio. Don't add separate narration in workflow — the talking_head.input.text IS the narration.

🆕 Static Images (Reference Photos, Maps, Documents)

Use type: "static" when you want to display an existing image without AI generation:

{
  "id": "scene_3",
  "type": "static",
  "static_image": {
    "url": "https://example.com/historical-map.jpg",
    "fit": "contain",
    "background": "#000000"
  },
  "narration": "This map from 1706 shows...",
  "timing": { "duration": 10 },
  "effects": ["kenburns_slow_zoom"]
}

When to use static images:

Historical photographs or maps (Library of Congress, archives)
Reference images the user provides
Screenshots or documents
Any real-world image that should NOT be AI-regenerated

Static image fields:

url: URL or local path to the image
fit: contain (letterbox), cover (crop), or fill (stretch)
background: Background color for letterboxing (default: black)

⚠️ Static images are SHOWN, not used for image-to-video generation.

🔴 TRANSITION RULES (CRITICAL FOR SMOOTH VIDEO)

Problem: Video models generate isolated clips. Concatenating them creates jarring cuts with no motion continuity.

Solution: Use the transition field to generate interpolated videos between scenes.

How Transitions Work

When a scene has a transition object:

ClawVid takes the previous scene's image as the start frame
ClawVid takes the current scene's image as the end frame
The video model generates a smooth morph/transition between them
Result: Seamless flow like ComfyUI/professional editing

Transition Schema

{
  "id": "scene_2",
  "transition": {
    "model": "fal-ai/vidu/q3/image-to-video",
    "duration": "4",
    "prompt": "Smooth camera transition, continuous motion",
    "style": "3d_animation"  // optional, for PixVerse
  },
  "type": "image",
  ...
}

Supported models for transitions:

fal-ai/vidu/q3/image-to-video — Best quality, smooth morphing ($0.50-1.50)
fal-ai/pixverse/image-to-video — Good quality, supports style ($0.45)

When to Use Transitions

| Scenario | Use Transition? | Notes | |----------|-----------------|-------| | Cooking show (fixed camera) | ✅ YES on every scene | Creates continuous "footage" feel | | Horror (jump cuts intentional) | ⚠️ SELECTIVE | Use on atmosphere scenes, skip for jump scares | | Talking head / tutorial | ✅ YES | Smooth presenter movements | | Fast-paced montage | ❌ NO | Hard cuts are stylistically appropriate | | Scene with dramatic reveal | ❌ NO | Hard cut adds impact |

Example: Cooking Show with Transitions

{
  "scenes": [
    {
      "id": "scene_1",
      "type": "video",
      "narration": "Welcome to the show!",
      "image_generation": { ... },
      "video_generation": { ... }
    },
    {
      "id": "scene_2",
      "transition": {
        "model": "fal-ai/vidu/q3/image-to-video",
        "duration": "4",
        "prompt": "Smooth transition, chef continues cooking, camera stays fixed"
      },
      "type": "image",
      "narration": "First, gather your ingredients...",
      "image_generation": { ... }
    },
    {
      "id": "scene_3",
      "transition": {
        "model": "fal-ai/vidu/q3/image-to-video",
        "duration": "4",
        "prompt": "Smooth transition, chef mixing bowl, continuous motion"
      },
      "type": "video",
      "narration": "Mix until smooth...",
      "image_generation": { ... },
      "video_generation": { ... }
    }
  ]
}

Note: The first scene cannot have a transition (no previous scene to transition FROM).

Video Scene Best Practices

When using type: "video":

Keep motion minimal — Use "movement_amplitude": "small" for stability
Static camera prompts — Add "camera stays completely fixed and static, only subject moves"
NOT realistic — Add "NOT realistic" to animated content prompts to maintain style
Match the image — Video prompt should describe motion OF the image, not new content

{
  "video_generation": {
    "model": "fal-ai/vidu/image-to-video",
    "input": {
      "prompt": "Chef whisks batter while camera stays completely fixed and static, only chef and whisk move, NOT realistic",
      "duration": "4",
      "movement_amplitude": "small"
    }
  }
}

Image Consistency Rules

For consistent characters/settings across scenes:

Use reference image — Set consistency.reference_prompt and consistency.seed
Use edit model — fal-ai/nano-banana-pro/edit maintains reference style
Same seed — All scene images use the same seed for consistency
Detailed base prompt — Include character description in every scene prompt

{
  "consistency": {
    "reference_prompt": "Cartoon French chef character, white hat, blue apron, kitchen background, Pixar style",
    "seed": 55555555,
    "model": "fal-ai/nano-banana-pro"
  }
}

Fixed Camera Style (Cooking Shows, Tutorials)

For content that should feel like "one continuous shot":

Same camera angle in ALL prompts — "Fixed camera, medium wide angle, straight-on at chest height"
NO Ken Burns effects — Remove all effects: ["kenburns_*"]
Transitions on EVERY scene (except first) — Creates continuous footage feel
Consistent framing description — Same composition text in every image prompt

Example image prompt for fixed camera:

"Fixed camera cooking show shot, medium wide angle view, cute cartoon chef behind kitchen counter, same TV studio kitchen set, bright even studio lighting, static straight-on camera angle at chest height, [SCENE SPECIFIC ACTION], Pixar Disney 3D animation style"

🆕 Frame Chaining (Seamless Video Continuity)

For butter-smooth scene transitions, enable frame chaining. This extracts the last frame of each video and uses it as the start frame for the next scene.

How Frame Chaining Works

Scene 1: Generate image → generate video → extract end frame
Scene 2: Use Scene 1's end frame as start + Scene 2's image as end → interpolate video
Scene 3: Use Scene 2's end frame as start + Scene 3's image as end → interpolate video
Repeat...

This creates videos that literally pick up exactly where the previous scene ended.

Enable Frame Chaining

Add video_settings to your workflow:

{
  "name": "My Video",
  "video_settings": {
    "chain_frames": true,
    "chain_model": "fal-ai/vidu/q3/image-to-video",  // optional, defaults to scene model
    "chain_duration": "5"  // optional
  },
  "scenes": [ ... ]
}

Frame Chaining vs Transitions

| Feature | Transitions | Frame Chaining | |---------|-------------|----------------| | Input | Two scene images | Previous video's end frame + current image | | Continuity | Good (image → image morph) | Best (actual frame continuity) | | Use case | Style morphs, location changes | Same character/scene evolving | | Cost | +1 video per transition | Same (replaces standard video gen) |

When to use frame chaining:

Cooking shows (continuous action)
Character following (same subject through scenes)
Talking head videos
Any content where motion should flow seamlessly

When to use transitions instead:

Location jumps (kitchen → outdoor → store)
Style changes (realistic → animated)
When hard cuts are intentionally dramatic

Note: When chain_frames is enabled, the transition field on scenes is ignored (chaining provides better continuity).

⚠️ CRITICAL: Execution Rules

Time Expectations

Before starting any generation, tell the user:

"Video generation takes 20-25 minutes for a typical 6-scene video. This includes:

TTS narration (~1-2 min)

Image generation (~3-5 min)

Video clip generation (~8-12 min)

Transition generation (~3-5 min if using transitions)

Sound effects + music (~2-3 min)

Transcription for subtitles (~2-3 min)

Audio mixing + Remotion render (~3-5 min)

I'll keep you updated on progress. Ready to start?"

Process Management Rules

DO NOT set timeouts on clawvid commands. The pipeline runs many sequential API calls and will complete on its own.

When running clawvid generate:

Start the process without a timeout (or use a very long one like 3600s)
Use process poll to check status periodically
Report progress to the user as phases complete
Let the process finish naturally — do not kill it
If the user wants to cancel, ask for explicit confirmation first

Example execution:

# CORRECT - no timeout, let it run
clawvid generate --workflow workflow.json

# WRONG - timeout will kill the process mid-generation
# timeout 600 clawvid generate --workflow workflow.json

Cost Expectations

| Quality | Video Clips | Transitions | Estimated Cost | |---------|-------------|-------------|----------------| | budget | 1 clip | 0 | $1-2 | | balanced | 2-3 clips | 0 | $3-5 | | balanced + transitions | 2-3 clips | 4-6 | $6-10 | | max_quality | 3+ clips (Vidu) | 5-7 | $12-20 |

Premium video models (Kling 2.6 Pro, Vidu Q3) and transitions cost more but produce much smoother results.

How It Works

You create a workflow JSON file describing every scene, prompt, model, timing, transitions, sound effects, and music.
You call clawvid generate --workflow workflow.json to execute it.
ClawVid handles all fal.ai API calls, audio processing, sound effect generation, music generation, Remotion rendering, and FFmpeg post-production.
Output: finished videos in output/{date}-{slug}/ for each platform.

You control everything through the workflow JSON and config.json. No code changes needed.

Initial Setup (First-Time Users)

When a user first invokes ClawVid or has no preferences.json, run this setup flow.

Step 1: Platform Selection

Which platforms do you create for? (can pick multiple)
1. YouTube Shorts (16:9, up to 60s)
2. TikTok (9:16, up to 60s)
3. Instagram Reels (9:16, up to 90s)
4. All of the above

Step 2: Default Template

What type of content do you mainly create?
1. horror — Scary stories, creepypasta, true crime
2. motivation — Quotes, success stories, self-improvement
3. quiz — Trivia, "did you know", interactive questions
4. reddit — Reddit post readings, AITA, confessions
5. custom — I'll define my own style each time

Step 3: Quality Mode

How should I balance quality vs cost?
1. max_quality — Premium models (Vidu/Kling), best motion, $8-15 per video
2. balanced — Default models, 2-3 video clips, $3-5 per video
3. budget — Fewer clips, faster generation, $1-2 per video

Step 4: Visual Style

What visual style fits your brand?
1. Photorealistic
2. Cinematic
3. Illustrated
4. Anime/Manga
5. Minimal/Clean
6. Mixed (choose per video)

Step 5: Voice Preferences

Voice Style:
1. Use my own voice (provide recordings)
2. AI voice — male, deep
3. AI voice — female, warm
4. No narration (music/text only)

Pacing: 0.8 (slow) to 1.2 (fast), default 1.0

Step 6: AI Model Selection

Ask the user which models to use for each generation type, or let them choose custom per-video:

Which AI models would you like to use? (or choose "custom" to pick per-video)

📷 IMAGE GENERATION:
1. fal-ai/kling-image/v3/text-to-image — Fast, good quality ($0.03)
2. fal-ai/nano-banana-pro — Best for consistency/reference ($0.15)
3. custom — Choose per video

🎬 VIDEO GENERATION:
1. fal-ai/kandinsky5-pro/image-to-video — Budget, 5s clips ($0.04-0.12)
2. fal-ai/kling-video/v2.6/pro/image-to-video — Better motion, 5s ($0.35)
3. fal-ai/vidu/q3/image-to-video — Best quality, 8s clips ($1.50+)
4. custom — Choose per video

🎵 MUSIC GENERATION:
1. beatoven/music-generation — AI-generated background music ($0.10)
2. none — I'll provide my own music files
3. custom — Choose per video

🔊 SOUND EFFECTS:
1. beatoven/sound-effect-generation — AI-generated SFX ($0.10 each)
2. none — No sound effects
3. custom — Choose per video

🗣️ TTS (Text-to-Speech):
1. fal-ai/qwen-3-tts/voice-design/1.7b — AI voice design ($0.09/1K chars)
2. none — I'll provide my own voice recordings
3. custom — Choose per video

📝 SUBTITLES:
1. enabled — Word-by-word animated subtitles (uses Whisper for timing)
2. disabled — No subtitles
3. custom — Choose per video

Save Preferences

After setup, save to preferences.json (gitignored):

{
  "platforms": ["tiktok"],
  "template": "horror",
  "quality_mode": "max_quality",
  "voice": {
    "style": "ai_male_deep",
    "pacing": 0.85
  },
  "visual_style": "anime",
  "models": {
    "image": "fal-ai/kling-image/v3/text-to-image",
    "video": "fal-ai/vidu/q3/image-to-video",
    "music": "beatoven/music-generation",
    "sound_effects": "beatoven/sound-effect-generation",
    "tts": "fal-ai/qwen-3-tts/voice-design/1.7b",
    "subtitles": "enabled"
  },
  "created_at": "2026-02-13",
  "updated_at": "2026-02-13"
}

Or run: clawvid setup (interactive) / clawvid setup --reset (start over).

Per-Video Creation Flow

Phase 1: Understand the Request

Ask targeted questions based on how specific the user is:

Vague request:

User: "Make a horror video"

You: "Got it — horror video. A few questions:
1. What's the story/topic?
2. Do you have a script, or should I write one?
3. Any specific scenes you're imagining?"

Specific request:

User: "Make a horror video about a guy who finds a VHS tape in his attic"

You: "Perfect premise. Let me confirm:
1. POV: First-person narrator or third-person?
2. Tone: Slow-burn dread or jump scares?
3. Ending: Resolved, cliffhanger, or ambiguous?"

Phase 1.5: Research & Reference Gathering

CRITICAL: Before building the workflow, gather accurate information and reference images.

When to Research (use `web_search` + `web_fetch`):

| Scenario | Action | |----------|--------| | Vague topic without details | Research to find interesting angles/facts | | "Did you know" / quiz / trivia | Verify facts, find accurate stats | | How-to / recipe / tutorial | Search for accurate steps and details | | Historical / scientific claims | Fact-check before including in narration | | Trending topics | Search for latest info and context | | User provides no source | Research authoritative sources |

Example research flow:

User: "Make a video about how to boil the perfect egg"

You: [uses web_search for "perfect boiled egg timing methods"]
     [uses web_fetch on top cooking sites]

"Did some research! Here's what I found:
- Soft boil: 6-7 min
- Medium: 9-10 min  
- Hard boil: 12-13 min
- Pro tip: Ice bath immediately after

Want me to use these timings in the video?"

Reference Image Gathering:

When visual consistency matters or the user needs specific imagery:

Search for reference images related to the topic/style
Send options to user chat — always show what you found
Get explicit confirmation before using any image
Download approved images and use as reference_image in workflow

User: "Make a video about ancient Tartarian architecture"

You: [searches for reference images]
     [sends 3-4 options to chat]

"Found some reference images for the Tartarian aesthetic:
[image 1] - Ornate domed building
[image 2] - Victorian exhibition hall
[image 3] - Old sepia photograph style

Which style should I use as the reference for consistent visuals?
Or should I generate without a reference?"

Media Sharing Rules:

ALWAYS send to user chat:

✅ All reference images gathered from web (before using)
✅ Research summaries with sources
✅ Generated sample images (if doing test generations)
✅ Final rendered video — send via message tool when complete

Use the message tool to send media:

message action=send filePath=/path/to/video.mp4 caption="Here's your video!"

Phase 2: Confirm Format & Style

"Your defaults are 9:16, 60 seconds, horror template. Want to keep these or adjust?
- Keep defaults
- Change duration (30s / 90s)
- Different template
- Different visual style"

For content requiring smooth motion (cooking shows, tutorials, presentations):

"This type of content works best with:
- Fixed camera angle (same framing every scene)
- Transitions between scenes (smooth interpolation)
- Minimal video motion (prevents jarring clips)

This adds ~$4-6 for transitions but looks much more professional. Enable transitions?"

Phase 3: Template-Specific Questions

Horror:

Scene intensity: subtle / moderate / intense
Era/aesthetic: modern / retro-VHS / gothic / industrial
Does the narrator survive?
Sound effects: ambient (wind, creaks) / impact (door slams, crashes) / both

Motivation:

Quote source: famous quotes / user-provided / AI-generated
Visual subjects: nature / urban / people / abstract
Call to action at end?

Quiz:

Number of questions: 3, 5, or custom
Difficulty and reveal style

Reddit:

Subreddit style: nosleep / AITA / TIFU / confession
Include username/votes display?

Cooking Show / Tutorial:

Fixed camera or dynamic angles?
One host or multiple characters?
Transitions enabled? (RECOMMENDED)
Voice style: professional, friendly, enthusiastic?

Phase 4: Build the Plan

Present a scene breakdown before generating:

"Here's my plan for 'Chef Pierre's Kitchen' (60s, cooking show):

SCENES (7 total):
1. [0-10s]  VIDEO — Chef intro, arms wide (NO transition - first scene)
2. [10-25s] IMAGE — Ingredient reveal (TRANSITION from scene 1)
3. [25-35s] VIDEO — Mixing batter (TRANSITION from scene 2)
4. [35-46s] IMAGE — Secret tip moment (TRANSITION from scene 3)
5. [46-56s] VIDEO — Pan swirl technique (TRANSITION from scene 4)
6. [56-65s] VIDEO — The flip! (TRANSITION from scene 5)
7. [65-78s] VIDEO — Final presentation (TRANSITION from scene 6)

TRANSITIONS: 6 (smooth interpolation between all scenes)
CAMERA: Fixed, medium-wide, straight-on

SOUND EFFECTS:
- Scene 1: Applause (0s offset, 4s duration)
- Scene 3: Whisking sounds (0s offset, 5s duration)
- Scene 5: Sizzling pan (1s offset, 4s duration)
- Scene 7: Applause + outro fanfare (3s offset, 4s duration)

AUDIO: French-accented male host voice, upbeat cooking show music
EFFECTS: None (fixed camera style)

Estimated: 7 images + 5 video clips + 6 transitions + 7 TTS + 4 SFX + 1 music track
Time: ~25-30 minutes
Cost: ~$10-14 (using Vidu Q3 for transitions)

Ready to proceed?"

Phase 5: Generate the Workflow JSON

After approval, create the workflow JSON file and run it.

Remember: Tell the user it will take 20-30 minutes before starting!

Complete Workflow JSON Schema

Required Top-Level Fields

{
  "name": "Video Title",
  "template": "quiz",
  "timing_mode": "tts_driven",
  "scene_padding_seconds": 0.3,

  "consistency": {
    "reference_prompt": "Character/setting description for consistency",
    "seed": 12345678,
    "model": "fal-ai/nano-banana-pro"
  },

  "scenes": [ ... ],

  "audio": {
    "tts": { ... },
    "music": { ... }
  },

  "subtitles": {
    "enabled": true,
    "style": { ... }
  },

  "output": {
    "filename": "output_name.mp4",
    "resolution": "1080x1920",
    "fps": 30,
    "format": "mp4"
  }
}

Scene Schema

{
  "id": "scene_1",
  "description": "Human-readable description",
  "type": "video",  // or "image"
  "timing": {},

  "transition": {  // OPTIONAL - only on scenes 2+
    "model": "fal-ai/vidu/q3/image-to-video",
    "duration": "4",
    "prompt": "Smooth transition description"
  },

  "narration": "What the voice says for this scene",

  "image_generation": {
    "model": "fal-ai/nano-banana-pro/edit",
    "input": {
      "prompt": "Detailed visual description",
      "negative_prompt": "Things to avoid",
      "aspect_ratio": "9:16",
      "seed": 12345678
    }
  },

  "video_generation": {  // Only for type: "video"
    "model": "fal-ai/vidu/image-to-video",
    "input": {
      "prompt": "Motion description, camera stays fixed, only subject moves",
      "duration": "4",
      "movement_amplitude": "small"
    }
  },

  "sound_effects": [
    {
      "prompt": "Sound description",
      "timing_offset": 0,
      "duration": 4,
      "volume": 0.6
    }
  ],

  "effects": []  // Ken Burns, vignette, grain, etc.
}

Production Workflow Example

For high-quality horror videos with visual consistency, use this structure:

{
  "name": "The Watchers - Horror Production",
  "template": "horror",
  "timing_mode": "tts_driven",
  "scene_padding_seconds": 0.5,
  "min_scene_duration_seconds": 5,

  "consistency": {
    "reference_prompt": "Full-body character design of a dark animated horror entity...",
    "seed": 666,
    "resolution": "2K"
  },

  "scenes": [
    {
      "id": "frame_1",
      "description": "Exterior - Abandoned mansion at night, establishing shot",
      "type": "video",
      "timing": {},
      "narration": "They say the mansion on Ashwood Lane has been empty for forty years...",

      "image_generation": {
        "model": "fal-ai/nano-banana-pro/edit",
        "input": {
          "prompt": "Wide establishing shot looking up at a massive three-story Victorian Gothic mansion at night...",
          "aspect_ratio": "9:16"
        }
      },

      "video_generation": {
        "model": "fal-ai/vidu/q3/image-to-video",
        "input": {
          "prompt": "Slow steady dolly push toward the mansion entrance from the gate...",
          "duration": "8",
          "resolution": "720p"
        }
      },

      "sound_effects": [
        {
          "prompt": "Howling wind gusting through dead tree branches at night...",
          "timing_offset": 0,
          "duration": 8,
          "volume": 0.6
        }
      ],

      "effects": ["vignette_heavy", "grain", "flicker_subtle"]
    }
  ],

  "audio": {
    "tts": {
      "model": "fal-ai/qwen-3-tts/voice-design/1.7b",
      "voice_prompt": "A low raspy whispering male voice, speaking slowly with dread...",
      "language": "en",
      "speed": 0.85
    },
    "music": {
      "generate": true,
      "prompt": "Dark ambient horror soundtrack, deep pulsing sub-bass drones in D minor...",
      "duration": 60,
      "volume": 0.15,
      "fade_in": 3,
      "fade_out": 4
    }
  },

  "subtitles": {
    "enabled": true,
    "style": {
      "font": "Impact",
      "color": "#ffffff",
      "stroke_color": "#000000",
      "stroke_width": 5,
      "position": "center",
      "animation": "word_by_word",
      "font_size": 72
    }
  },

  "output": {
    "filename": "the_watchers_horror.mp4",
    "fps": 30,
    "format": "mp4",
    "platforms": ["tiktok"]
  }
}

Model Selection Guide

All models are configured in config.json under the fal section. Use full fal.ai model IDs in workflow JSON.

Image Models

| Model | When to Use | Cost | Notes | |-------|-------------|------|-------| | fal-ai/kling-image/v3/text-to-image | Standard scenes | $0.03 | Uses aspect_ratio (e.g. "9:16") | | fal-ai/nano-banana-pro | Reference images | $0.15 | For consistency base | | fal-ai/nano-banana-pro/edit | Consistent scenes | $0.15 | Edit from reference |

Video Models (Image-to-Video)

| Model | Duration | Cost | Quality | Notes | |-------|----------|------|---------|-------| | fal-ai/kandinsky5-pro/image-to-video | 5s | $0.04-0.12 | Good | Use duration: "5s" (with "s" suffix!) | | fal-ai/kling-video/v2.6/pro/image-to-video | 5s | $0.35 | Better | Premium motion | | fal-ai/vidu/image-to-video | 4s | $0.20 | Good | Basic Vidu | | fal-ai/vidu/q3/image-to-video | 1-16s | $0.50-1.50 | Best | Smoothest motion, transitions |

⚠️ Duration format matters:

kandinsky5-pro requires: "duration": "5s" (with "s" suffix)
Kling/Vidu use: "duration": "5" or "duration": "8" (number as string)

Talking Head Models (Lip-Sync)

| Model | Resolution | Cost | Notes | |-------|------------|------|-------| | veed/fabric-1.0/text | 720p, 480p | ~$0.50 | Best lip-sync, generates audio |

Use for: AI presenters, news anchors, character dialogue, any "person speaking" content.

Transition Models

| Model | Cost | Quality | Notes | |-------|------|---------|-------| | fal-ai/vidu/q3/image-to-video | $0.50-1.50 | Best | Smooth morphing between keyframes | | fal-ai/pixverse/image-to-video | $0.45 | Good | Supports style parameter |

Audio Models

| Model | Purpose | Cost | |-------|---------|------| | fal-ai/qwen-3-tts/voice-design/1.7b | Voice-designed TTS narration | $0.09/1K chars | | fal-ai/whisper | Transcription for subtitle timing | $0.001/sec | | beatoven/sound-effect-generation | AI sound effect generation (1-35s) | $0.10/req | | beatoven/music-generation | AI background music generation (5-150s) | $0.10/req |

Scene Planning Rules

For a 60-second video:

5-8 scenes total
3-6 video clips for max_quality, 2-3 for balanced
Each scene 5-15 seconds
Front-load video clips — the opening matters most
Use type: "image" with Ken Burns effects for narration-heavy scenes
Use type: "video" for dramatic moments that need motion
Add transitions for smooth continuous footage (cooking shows, tutorials, presentations)
Add sound effects to 3-4 key scenes for immersion

Decision Tree: Video vs Image vs Transition

Is this the first scene?
├── YES → type: "video" (strong hook), NO transition
└── NO → Does this scene need motion?
    ├── YES → type: "video"
    │   └── Should it flow smoothly from previous scene?
    │       ├── YES → ADD transition
    │       └── NO (jump cut is intentional) → NO transition
    └── NO → type: "image"
        └── Should it flow smoothly from previous scene?
            ├── YES → ADD transition
            └── NO → NO transition, use Ken Burns for subtle motion

Content Type Recommendations

| Content Type | Video Scenes | Transitions | Effects | |--------------|--------------|-------------|---------| | Horror | 3-4 | Selective | vignette, grain, flicker | | Cooking Show | 4-5 | ALL (except first) | None (clean look) | | Tutorial | 3-4 | ALL (except first) | None or minimal | | Motivation | 2-3 | Optional | kenburns on images | | Quiz/Trivia | 2-3 | None | Clean, vibrant | | Fast montage | 3-5 | None (hard cuts) | Template-dependent |

CLI Commands

# Generate video from workflow JSON (full pipeline)
clawvid generate --workflow workflow.json
clawvid generate --workflow workflow.json --quality max_quality
clawvid generate --workflow workflow.json --template horror --skip-cache

# PHASED GENERATION - Generate in stages with review
clawvid generate --workflow workflow.json --phase images      # Images only, pause for review
clawvid generate --workflow workflow.json --phase videos      # Videos only (uses existing images)
clawvid generate --workflow workflow.json --phase audio       # Audio only
clawvid generate --workflow workflow.json --phase render      # Render only

# VISION QA - Check images for issues before continuing
clawvid generate --workflow workflow.json --qa                # Enable QA checks
clawvid generate --workflow workflow.json --qa-auto-fix       # Auto-regenerate failed images

# SELECTIVE REGENERATION - Fix specific scenes
clawvid generate --workflow workflow.json --regenerate scene_5,scene_6
clawvid generate --workflow workflow.json --use-existing-images --regenerate scene_3

# Re-render from a previous run's assets
clawvid render --run output/2026-02-11-haunted-library/
clawvid render --run output/2026-02-11-haunted-library/ --all-platforms
clawvid render --run output/2026-02-11-haunted-library/ --platform tiktok

# Preview workflow in Remotion
clawvid preview --workflow workflow.json
clawvid preview --workflow workflow.json --platform youtube

# Launch Remotion studio for visual editing
clawvid studio

# Configure preferences
clawvid setup
clawvid setup --reset

Pipeline Flow (what `generate` does)

Phase 1 (1-2 min): Load config, validate workflow, create output directory
Phase 2 (2-4 min): Generate TTS narration for all scenes
Phase 3 (3-5 min): Generate images (kling-image or nano-banana-pro)
Phase 4 (8-12 min): Generate video clips (slowest phase)
Phase 5 (3-5 min): Generate transitions (if any scenes have transition field)
Phase 6 (1-2 min): Generate sound effects (beatoven)
Phase 7 (1-2 min): Generate background music (beatoven)
Phase 8 (2-3 min): Transcribe narration with Whisper (for word-level subtitles)
Phase 9 (1-2 min): Mix audio (narration + music + SFX)
Phase 10 (2-3 min): Render with Remotion + FFmpeg post-processing

Total: ~20-30 minutes for a 6-scene video with transitions

Available Effects

Effects are applied per-scene via the effects array. Names are fuzzy-matched.

| Effect | Variants | Description | |--------|----------|-------------| | vignette | vignette_subtle, vignette_heavy | Dark edges | | grain | grain_subtle, grain_heavy | Film grain noise | | ken_burns | kenburns_slow_zoom, kenburns_slow_pan, kenburns_zoom_out | Zoom/pan on images | | flicker | flicker_subtle | Light flickering | | glitch | glitch_subtle, glitch_heavy | RGB splitting | | chromatic_aberration | chromatic_aberration_subtle | Color fringing |

Note: For fixed-camera content (cooking shows), do NOT use Ken Burns effects.

Templates

Templates apply color grading, overlays, and default effects.

horror

Color grading: saturate(0.6) brightness(0.85) contrast(1.15)
Default effects: vignette, grain
Voice: Deep, slow (pacing 0.85)

motivation

Color grading: saturate(1.1) brightness(1.05) sepia(0.12)
Default effects: (none)
Voice: Warm, confident (pacing 1.0)

quiz

Color grading: saturate(1.25) brightness(1.08) contrast(1.1)
Default effects: (none)
Voice: Energetic, clear (pacing 1.1)

Color grading: saturate(0.9) brightness(0.95)
Default effects: (none)
Voice: Casual, conversational (pacing 1.0)

🔍 Vision QA: Detecting Image Issues

ClawVid includes Vision QA to automatically detect common issues in AI-generated images.

What Vision QA Checks For

| Issue Type | Severity | Example | |------------|----------|---------| | hallucinated_text | Error | "PROJECT: MIDNIGHT ECHO" appearing in image | | unwanted_logo | Error | History Channel logo, stock watermarks | | stock_image | Warning | Generic stock photo look | | style_drift | Warning | Image style differs from reference | | missing_element | Warning | Requested subject not visible |

Common Hallucinations to Watch For

Image models often hallucinate text/logos when prompted with certain terms:

Trigger words that cause issues:

"History Channel style" → Adds History logo
"documentary" → Adds fake title cards
"professional" → Adds stock watermarks
"news" → Adds news tickers/logos
"Netflix/HBO style" → Adds streaming logos

Safe alternatives:

"cinematic" instead of "History Channel style"
"film grain, moody lighting" instead of "documentary"
"high quality, detailed" instead of "professional"

Using Vision QA

# Check images after generation
clawvid generate --workflow x.json --qa

# Auto-fix by regenerating with sanitized prompts
clawvid generate --workflow x.json --qa-auto-fix

Preventing Issues in Prompts

Always include in negative_prompt:

"negative_prompt": "text, watermark, logo, brand, copyright, title card, news ticker, TV graphics, stock photo"

Phased Generation for Manual Review

For critical projects, generate images first and review:

# Step 1: Generate images only
clawvid generate --workflow x.json --phase images

# Step 2: Review images in output folder
# Step 3: Fix problematic scenes
clawvid generate --workflow x.json --regenerate scene_5,scene_6

# Step 4: Continue with videos
clawvid generate --workflow x.json --phase videos --use-existing-images

Skip QA for Specific Scenes

Add skip_qa: true to scenes that should bypass checking:

{
  "id": "scene_3",
  "type": "static",
  "skip_qa": true,
  "static_image": { "url": "..." }
}

Common Mistakes to Avoid

❌ Mistake: No transitions on continuous content

// WRONG - cooking show with hard cuts
{ "id": "scene_2", "type": "video", ... }
{ "id": "scene_3", "type": "video", ... }

// CORRECT - smooth flow
{ "id": "scene_2", "transition": { "model": "fal-ai/vidu/q3/image-to-video", "duration": "4", "prompt": "..." }, "type": "video", ... }

❌ Mistake: Transition on first scene

// WRONG - no previous scene to transition from
{ "id": "scene_1", "transition": { ... }, ... }

// CORRECT - first scene has no transition
{ "id": "scene_1", "type": "video", ... }

❌ Mistake: Video prompt doesn't match image

// WRONG - video describes new content
"image_prompt": "Chef standing at counter"
"video_prompt": "Chef running through kitchen"

// CORRECT - video describes motion OF the image
"image_prompt": "Chef standing at counter"
"video_prompt": "Chef gestures while standing at counter, camera fixed"

❌ Mistake: Ken Burns on fixed-camera content

// WRONG - breaks the "fixed camera" illusion
"effects": ["kenburns_slow_zoom"]

// CORRECT - no camera movement effects
"effects": []

❌ Mistake: Prompts that trigger hallucinated text/logos

// WRONG - triggers History Channel branding
"prompt": "Dark cinematic documentary style, History Channel conspiracy aesthetic..."

// CORRECT - same style without brand references
"prompt": "Dark cinematic film style, moody lighting, dramatic shadows, blue and orange color grading, film grain texture...",
"negative_prompt": "text, watermark, logo, brand, title card, TV graphics"

❌ Mistake: Inconsistent camera descriptions

// WRONG - different angles break continuity
"scene_1 prompt": "...wide angle shot..."
"scene_2 prompt": "...close-up shot..."
"scene_3 prompt": "...overhead view..."

// CORRECT - same angle throughout
"All prompts": "Fixed camera cooking show shot, medium wide angle view, straight-on at chest height, ..."

Conversation Flow

Full Flow

1. READ THIS SKILL — Every time, fresh
2. CHECK PREFERENCES — Load preferences.json or run setup
3. GATHER REQUIREMENTS — Topic, format, style questions
4. DECIDE ON TRANSITIONS — For continuous content, recommend transitions
5. BUILD PLAN — Present scene breakdown with transition plan
6. GET APPROVAL — Wait for explicit "go"
7. WARN ABOUT TIME — "This will take 20-30 minutes. Ready?"
8. GENERATE WORKFLOW — Create the workflow JSON (following ALL rules above)
9. EXECUTE — Run clawvid generate --workflow <file> (NO TIMEOUT!)
10. MONITOR — Poll process and report progress
11. REVIEW — Check outputs
12. DELIVER — Compress, send video to chat, show cost summary

Delivery Checklist

When generation completes:

Compress video for chat delivery (ffmpeg H.264, CRF 26-28, ~15-20MB target)
Send video to user via message tool with filePath
Show cost summary and output location
Ask for feedback — any scenes to regenerate?

# Compress for chat delivery
ffmpeg -y -i output/.../tiktok/final.mp4 \
  -c:v libx264 -preset fast -crf 28 \
  -c:a aac -b:a 128k \
  output/.../tiktok/playable.mp4

# Send to user
message action=send filePath=/path/to/playable.mp4 caption="🎬 Your video is ready!"

Quality Checks

After each generation step, verify:

Images

Matches scene description and mood
Consistent style with other scenes (same character, same setting)
Correct camera angle if using fixed-camera style

Videos

Smooth motion, no flickering
Subject stays coherent throughout
Camera movement matches prompt (or stays fixed if specified)

Transitions

Smooth morph between scenes
No jarring jumps in character position
Consistent lighting/style throughout

Audio

Clear pronunciation
Correct pacing
Music doesn't overpower narration

Tips

Always read this SKILL.md before creating a workflow
Front-load video clips (opening matters most)
Use transitions for professional, continuous content
Add "movement_amplitude": "small" for stable video clips
Include "camera stays fixed" in video prompts for cooking/tutorial content
Add "NOT realistic" to animated content video prompts
Use same seed across all scene images for character consistency
Never timeout the generate process — let it complete naturally
Compress output before sending to chat (telegram limit ~50MB)

Contract & API

Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.

MissingGITHUB OPENCLEW

Endpoints

Dossier API Snapshot API Contract API Trust API

Contract coverage

Status

missing

Auth

None

Streaming

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.

Invocation examples

curl -s "https://xpersona.co/api/v1/agents/neur0map-clawvid/snapshot"

curl -s "https://xpersona.co/api/v1/agents/neur0map-clawvid/contract"

curl -s "https://xpersona.co/api/v1/agents/neur0map-clawvid/trust"

Reliability & Benchmarks

Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.

Missingruntime-metrics

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.

No benchmark suites or observed failure patterns are available.

Media & Demo

Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.

Missingno-media

No screenshots, media assets, or demo links are available.

Related Agents

Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.

Self-declaredprotocol-neighbors

GITHUB_REPOSactivepieces

Rank

AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents

Traction

No public download signal

Freshness

Updated 2d ago

OPENCLAW

GITHUB_REPOScherry-studio

Rank

AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs

Traction

No public download signal

Freshness

Updated 5d ago

MCPOPENCLAW

GITHUB_REPOSAionUi

Rank

Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW

GITHUB_REPOSCopilotKit

Rank

The Frontend for Agents & Generative UI. React + Angular

Traction

No public download signal

Freshness

Updated 23d ago

OPENCLAW

Machine Appendix

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/neur0map-clawvid/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/neur0map-clawvid/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/neur0map-clawvid/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/neur0map-clawvid/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/neur0map-clawvid/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/neur0map-clawvid/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "GITHUB_OPENCLEW",
      "generatedAt": "2026-04-17T01:49:16.727Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    },
    {
      "key": "accent",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "pick",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    },
    {
      "key": "style",
      "type": "capability",
      "support": "supported",
      "confidenceSource": "profile",
      "notes": "Declared in agent profile metadata"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile capability:accent|supported|profile capability:pick|supported|profile capability:style|supported|profile"
}

Facts JSON

[
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Neur0map",
    "href": "https://github.com/neur0map/clawvid",
    "sourceUrl": "https://github.com/neur0map/clawvid",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-02-25T02:07:08.413Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/neur0map-clawvid/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/neur0map-clawvid/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-02-25T02:07:08.413Z",
    "isPublic": true
  },
  {
    "factKey": "traction",
    "category": "adoption",
    "label": "Adoption signal",
    "value": "5 GitHub stars",
    "href": "https://github.com/neur0map/clawvid",
    "sourceUrl": "https://github.com/neur0map/clawvid",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-02-25T02:07:08.413Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/neur0map-clawvid/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/neur0map-clawvid/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub · GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]

clawvid answer-first brief

Executive Summary

Evidence Ledger

Release & Crawl Timeline

Artifacts Archive

Docs & README

ClawVid

🚨 MANDATORY: READ THIS SKILL EVERY TIME

⚠️ CRITICAL: Workflow Rules

Scene Type Rules

🆕 Talking Head Videos (VEED Fabric 1.0)

🆕 Static Images (Reference Photos, Maps, Documents)

🔴 TRANSITION RULES (CRITICAL FOR SMOOTH VIDEO)

How Transitions Work

Transition Schema

When to Use Transitions

Example: Cooking Show with Transitions

Video Scene Best Practices

Image Consistency Rules

Fixed Camera Style (Cooking Shows, Tutorials)

🆕 Frame Chaining (Seamless Video Continuity)

How Frame Chaining Works

Enable Frame Chaining

Frame Chaining vs Transitions

⚠️ CRITICAL: Execution Rules

Time Expectations

Process Management Rules

Cost Expectations

How It Works

Initial Setup (First-Time Users)

Step 1: Platform Selection

Step 2: Default Template

Step 3: Quality Mode

Step 4: Visual Style

Step 5: Voice Preferences

Step 6: AI Model Selection

Save Preferences

Per-Video Creation Flow

Phase 1: Understand the Request

Phase 1.5: Research & Reference Gathering

When to Research (use web_search + web_fetch):

Reference Image Gathering:

Media Sharing Rules:

Phase 2: Confirm Format & Style

Phase 3: Template-Specific Questions

Phase 4: Build the Plan

Phase 5: Generate the Workflow JSON

Complete Workflow JSON Schema

Required Top-Level Fields

Scene Schema

Production Workflow Example

Model Selection Guide

Image Models

Video Models (Image-to-Video)

Talking Head Models (Lip-Sync)

Transition Models

Audio Models

Scene Planning Rules

Decision Tree: Video vs Image vs Transition

Content Type Recommendations

CLI Commands

Pipeline Flow (what generate does)

Available Effects

Templates

horror

motivation

quiz

reddit

🔍 Vision QA: Detecting Image Issues

What Vision QA Checks For

Common Hallucinations to Watch For

Using Vision QA

Preventing Issues in Prompts

Phased Generation for Manual Review

Skip QA for Specific Scenes

Common Mistakes to Avoid

❌ Mistake: No transitions on continuous content

❌ Mistake: Transition on first scene

❌ Mistake: Video prompt doesn't match image

❌ Mistake: Ken Burns on fixed-camera content

When to Research (use `web_search` + `web_fetch`):

Pipeline Flow (what `generate` does)