Rank
70
AI Agents & MCPs & AI Workflow Automation โข (~400 MCP servers for AI agents) โข AI Automation / AI Agent with MCPs โข AI Workflows & AI Agents โข MCPs for AI Agents
Traction
No public download signal
Freshness
Updated 2d ago
Crawler Summary
ClawVid ClawVid Generate short-form videos (YouTube Shorts, TikTok, Reels) from text prompts. You are the orchestrator. You plan scenes, write prompts, generate a workflow JSON, and call clawvid generate to execute the full pipeline. --- ๐จ MANDATORY: READ THIS SKILL EVERY TIME **Before creating ANY video workflow, you MUST read this entire SKILL.md file.** This skill contains critical rules about: - Workflow JSON structure Capability contract not published. No trust telemetry is available yet. 5 GitHub stars reported by the source. Last updated 2/25/2026.
Freshness
Last checked 2/25/2026
Best For
clawvid is best for accent, pick, style workflows where OpenClaw compatibility matters.
Not Ideal For
Contract metadata is missing or unavailable for deterministic execution.
Evidence Sources Checked
editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack
ClawVid ClawVid Generate short-form videos (YouTube Shorts, TikTok, Reels) from text prompts. You are the orchestrator. You plan scenes, write prompts, generate a workflow JSON, and call clawvid generate to execute the full pipeline. --- ๐จ MANDATORY: READ THIS SKILL EVERY TIME **Before creating ANY video workflow, you MUST read this entire SKILL.md file.** This skill contains critical rules about: - Workflow JSON structure
Public facts
5
Change events
1
Artifacts
0
Freshness
Feb 25, 2026
Capability contract not published. No trust telemetry is available yet. 5 GitHub stars reported by the source. Last updated 2/25/2026.
Trust score
Unknown
Compatibility
OpenClaw
Freshness
Feb 25, 2026
Vendor
Neur0map
Artifacts
0
Benchmarks
0
Last release
Unpublished
Key links, install path, and a quick operational read before the deeper crawl record.
Summary
Capability contract not published. No trust telemetry is available yet. 5 GitHub stars reported by the source. Last updated 2/25/2026.
Setup snapshot
git clone https://github.com/neur0map/clawvid.gitSetup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.
Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.
Vendor
Neur0map
Protocol compatibility
OpenClaw
Adoption signal
5 GitHub stars
Handshake status
UNKNOWN
Crawlable docs
6 indexed pages on the official domain
Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.
Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.
Extracted files
0
Examples
6
Snippets
0
Languages
typescript
Parameters
json
{
"id": "intro",
"type": "talking_head",
"image_generation": {
"model": "fal-ai/nano-banana-pro",
"input": {
"prompt": "Friendly female news anchor, professional attire, neutral background, looking at camera",
"aspect_ratio": "9:16"
}
},
"talking_head": {
"model": "veed/fabric-1.0/text",
"input": {
"text": "Welcome to today's deep dive into one of history's greatest mysteries...",
"resolution": "720p",
"voice_description": "Confident female voice, American accent, news anchor style"
}
},
"timing": {}
}json
{
"id": "scene_3",
"type": "static",
"static_image": {
"url": "https://example.com/historical-map.jpg",
"fit": "contain",
"background": "#000000"
},
"narration": "This map from 1706 shows...",
"timing": { "duration": 10 },
"effects": ["kenburns_slow_zoom"]
}json
{
"id": "scene_2",
"transition": {
"model": "fal-ai/vidu/q3/image-to-video",
"duration": "4",
"prompt": "Smooth camera transition, continuous motion",
"style": "3d_animation" // optional, for PixVerse
},
"type": "image",
...
}json
{
"scenes": [
{
"id": "scene_1",
"type": "video",
"narration": "Welcome to the show!",
"image_generation": { ... },
"video_generation": { ... }
},
{
"id": "scene_2",
"transition": {
"model": "fal-ai/vidu/q3/image-to-video",
"duration": "4",
"prompt": "Smooth transition, chef continues cooking, camera stays fixed"
},
"type": "image",
"narration": "First, gather your ingredients...",
"image_generation": { ... }
},
{
"id": "scene_3",
"transition": {
"model": "fal-ai/vidu/q3/image-to-video",
"duration": "4",
"prompt": "Smooth transition, chef mixing bowl, continuous motion"
},
"type": "video",
"narration": "Mix until smooth...",
"image_generation": { ... },
"video_generation": { ... }
}
]
}json
{
"video_generation": {
"model": "fal-ai/vidu/image-to-video",
"input": {
"prompt": "Chef whisks batter while camera stays completely fixed and static, only chef and whisk move, NOT realistic",
"duration": "4",
"movement_amplitude": "small"
}
}
}json
{
"consistency": {
"reference_prompt": "Cartoon French chef character, white hat, blue apron, kitchen background, Pixar style",
"seed": 55555555,
"model": "fal-ai/nano-banana-pro"
}
}Full documentation captured from public sources, including the complete README when available.
Docs source
GITHUB OPENCLEW
Editorial quality
ready
ClawVid ClawVid Generate short-form videos (YouTube Shorts, TikTok, Reels) from text prompts. You are the orchestrator. You plan scenes, write prompts, generate a workflow JSON, and call clawvid generate to execute the full pipeline. --- ๐จ MANDATORY: READ THIS SKILL EVERY TIME **Before creating ANY video workflow, you MUST read this entire SKILL.md file.** This skill contains critical rules about: - Workflow JSON structure
Generate short-form videos (YouTube Shorts, TikTok, Reels) from text prompts.
You are the orchestrator. You plan scenes, write prompts, generate a workflow JSON, and call clawvid generate to execute the full pipeline.
Before creating ANY video workflow, you MUST read this entire SKILL.md file.
This skill contains critical rules about:
Do not rely on memory. Read this file fresh each time.
| Scene Type | When to Use | Motion Source |
|------------|-------------|---------------|
| type: "image" | Narration-heavy, descriptions, establishing shots | Ken Burns effects only |
| type: "video" | Action moments, reveals, dramatic beats | AI video generation |
| type: "static" | Real photos, maps, documents to SHOW as-is | None (displayed as-is) |
| type: "talking_head" | AI presenter, character speaking | VEED Fabric lip-sync |
Key insight: Each type: "video" scene generates an independent 4-8s clip. Without transitions, these clips are hard-cut together, causing jarring jumps.
Use type: "talking_head" to create AI presenter videos with lip-synced speech:
{
"id": "intro",
"type": "talking_head",
"image_generation": {
"model": "fal-ai/nano-banana-pro",
"input": {
"prompt": "Friendly female news anchor, professional attire, neutral background, looking at camera",
"aspect_ratio": "9:16"
}
},
"talking_head": {
"model": "veed/fabric-1.0/text",
"input": {
"text": "Welcome to today's deep dive into one of history's greatest mysteries...",
"resolution": "720p",
"voice_description": "Confident female voice, American accent, news anchor style"
}
},
"timing": {}
}
How it works:
Talking head fields:
text: The speech to lip-sync (required)resolution: 720p or 480p (default: 720p)voice_description: Voice styling (e.g., "British accent", "Deep male voice")When to use talking head:
Cost: ~$0.50 per talking head clip
โ ๏ธ Important: Talking head scenes generate their own audio. Don't add separate narration in workflow โ the talking_head.input.text IS the narration.
Use type: "static" when you want to display an existing image without AI generation:
{
"id": "scene_3",
"type": "static",
"static_image": {
"url": "https://example.com/historical-map.jpg",
"fit": "contain",
"background": "#000000"
},
"narration": "This map from 1706 shows...",
"timing": { "duration": 10 },
"effects": ["kenburns_slow_zoom"]
}
When to use static images:
Static image fields:
url: URL or local path to the imagefit: contain (letterbox), cover (crop), or fill (stretch)background: Background color for letterboxing (default: black)โ ๏ธ Static images are SHOWN, not used for image-to-video generation.
Problem: Video models generate isolated clips. Concatenating them creates jarring cuts with no motion continuity.
Solution: Use the transition field to generate interpolated videos between scenes.
When a scene has a transition object:
{
"id": "scene_2",
"transition": {
"model": "fal-ai/vidu/q3/image-to-video",
"duration": "4",
"prompt": "Smooth camera transition, continuous motion",
"style": "3d_animation" // optional, for PixVerse
},
"type": "image",
...
}
Supported models for transitions:
fal-ai/vidu/q3/image-to-video โ Best quality, smooth morphing ($0.50-1.50)fal-ai/pixverse/image-to-video โ Good quality, supports style ($0.45)| Scenario | Use Transition? | Notes | |----------|-----------------|-------| | Cooking show (fixed camera) | โ YES on every scene | Creates continuous "footage" feel | | Horror (jump cuts intentional) | โ ๏ธ SELECTIVE | Use on atmosphere scenes, skip for jump scares | | Talking head / tutorial | โ YES | Smooth presenter movements | | Fast-paced montage | โ NO | Hard cuts are stylistically appropriate | | Scene with dramatic reveal | โ NO | Hard cut adds impact |
{
"scenes": [
{
"id": "scene_1",
"type": "video",
"narration": "Welcome to the show!",
"image_generation": { ... },
"video_generation": { ... }
},
{
"id": "scene_2",
"transition": {
"model": "fal-ai/vidu/q3/image-to-video",
"duration": "4",
"prompt": "Smooth transition, chef continues cooking, camera stays fixed"
},
"type": "image",
"narration": "First, gather your ingredients...",
"image_generation": { ... }
},
{
"id": "scene_3",
"transition": {
"model": "fal-ai/vidu/q3/image-to-video",
"duration": "4",
"prompt": "Smooth transition, chef mixing bowl, continuous motion"
},
"type": "video",
"narration": "Mix until smooth...",
"image_generation": { ... },
"video_generation": { ... }
}
]
}
Note: The first scene cannot have a transition (no previous scene to transition FROM).
When using type: "video":
"movement_amplitude": "small" for stability{
"video_generation": {
"model": "fal-ai/vidu/image-to-video",
"input": {
"prompt": "Chef whisks batter while camera stays completely fixed and static, only chef and whisk move, NOT realistic",
"duration": "4",
"movement_amplitude": "small"
}
}
}
For consistent characters/settings across scenes:
consistency.reference_prompt and consistency.seedfal-ai/nano-banana-pro/edit maintains reference style{
"consistency": {
"reference_prompt": "Cartoon French chef character, white hat, blue apron, kitchen background, Pixar style",
"seed": 55555555,
"model": "fal-ai/nano-banana-pro"
}
}
For content that should feel like "one continuous shot":
effects: ["kenburns_*"]Example image prompt for fixed camera:
"Fixed camera cooking show shot, medium wide angle view, cute cartoon chef behind kitchen counter, same TV studio kitchen set, bright even studio lighting, static straight-on camera angle at chest height, [SCENE SPECIFIC ACTION], Pixar Disney 3D animation style"
For butter-smooth scene transitions, enable frame chaining. This extracts the last frame of each video and uses it as the start frame for the next scene.
This creates videos that literally pick up exactly where the previous scene ended.
Add video_settings to your workflow:
{
"name": "My Video",
"video_settings": {
"chain_frames": true,
"chain_model": "fal-ai/vidu/q3/image-to-video", // optional, defaults to scene model
"chain_duration": "5" // optional
},
"scenes": [ ... ]
}
| Feature | Transitions | Frame Chaining | |---------|-------------|----------------| | Input | Two scene images | Previous video's end frame + current image | | Continuity | Good (image โ image morph) | Best (actual frame continuity) | | Use case | Style morphs, location changes | Same character/scene evolving | | Cost | +1 video per transition | Same (replaces standard video gen) |
When to use frame chaining:
When to use transitions instead:
Note: When chain_frames is enabled, the transition field on scenes is ignored (chaining provides better continuity).
Before starting any generation, tell the user:
"Video generation takes 20-25 minutes for a typical 6-scene video. This includes:
- TTS narration (~1-2 min)
- Image generation (~3-5 min)
- Video clip generation (~8-12 min)
- Transition generation (~3-5 min if using transitions)
- Sound effects + music (~2-3 min)
- Transcription for subtitles (~2-3 min)
- Audio mixing + Remotion render (~3-5 min)
I'll keep you updated on progress. Ready to start?"
DO NOT set timeouts on clawvid commands. The pipeline runs many sequential API calls and will complete on its own.
When running clawvid generate:
process poll to check status periodicallyExample execution:
# CORRECT - no timeout, let it run
clawvid generate --workflow workflow.json
# WRONG - timeout will kill the process mid-generation
# timeout 600 clawvid generate --workflow workflow.json
| Quality | Video Clips | Transitions | Estimated Cost | |---------|-------------|-------------|----------------| | budget | 1 clip | 0 | $1-2 | | balanced | 2-3 clips | 0 | $3-5 | | balanced + transitions | 2-3 clips | 4-6 | $6-10 | | max_quality | 3+ clips (Vidu) | 5-7 | $12-20 |
Premium video models (Kling 2.6 Pro, Vidu Q3) and transitions cost more but produce much smoother results.
clawvid generate --workflow workflow.json to execute it.output/{date}-{slug}/ for each platform.You control everything through the workflow JSON and config.json. No code changes needed.
When a user first invokes ClawVid or has no preferences.json, run this setup flow.
Which platforms do you create for? (can pick multiple)
1. YouTube Shorts (16:9, up to 60s)
2. TikTok (9:16, up to 60s)
3. Instagram Reels (9:16, up to 90s)
4. All of the above
What type of content do you mainly create?
1. horror โ Scary stories, creepypasta, true crime
2. motivation โ Quotes, success stories, self-improvement
3. quiz โ Trivia, "did you know", interactive questions
4. reddit โ Reddit post readings, AITA, confessions
5. custom โ I'll define my own style each time
How should I balance quality vs cost?
1. max_quality โ Premium models (Vidu/Kling), best motion, $8-15 per video
2. balanced โ Default models, 2-3 video clips, $3-5 per video
3. budget โ Fewer clips, faster generation, $1-2 per video
What visual style fits your brand?
1. Photorealistic
2. Cinematic
3. Illustrated
4. Anime/Manga
5. Minimal/Clean
6. Mixed (choose per video)
Voice Style:
1. Use my own voice (provide recordings)
2. AI voice โ male, deep
3. AI voice โ female, warm
4. No narration (music/text only)
Pacing: 0.8 (slow) to 1.2 (fast), default 1.0
Ask the user which models to use for each generation type, or let them choose custom per-video:
Which AI models would you like to use? (or choose "custom" to pick per-video)
๐ท IMAGE GENERATION:
1. fal-ai/kling-image/v3/text-to-image โ Fast, good quality ($0.03)
2. fal-ai/nano-banana-pro โ Best for consistency/reference ($0.15)
3. custom โ Choose per video
๐ฌ VIDEO GENERATION:
1. fal-ai/kandinsky5-pro/image-to-video โ Budget, 5s clips ($0.04-0.12)
2. fal-ai/kling-video/v2.6/pro/image-to-video โ Better motion, 5s ($0.35)
3. fal-ai/vidu/q3/image-to-video โ Best quality, 8s clips ($1.50+)
4. custom โ Choose per video
๐ต MUSIC GENERATION:
1. beatoven/music-generation โ AI-generated background music ($0.10)
2. none โ I'll provide my own music files
3. custom โ Choose per video
๐ SOUND EFFECTS:
1. beatoven/sound-effect-generation โ AI-generated SFX ($0.10 each)
2. none โ No sound effects
3. custom โ Choose per video
๐ฃ๏ธ TTS (Text-to-Speech):
1. fal-ai/qwen-3-tts/voice-design/1.7b โ AI voice design ($0.09/1K chars)
2. none โ I'll provide my own voice recordings
3. custom โ Choose per video
๐ SUBTITLES:
1. enabled โ Word-by-word animated subtitles (uses Whisper for timing)
2. disabled โ No subtitles
3. custom โ Choose per video
After setup, save to preferences.json (gitignored):
{
"platforms": ["tiktok"],
"template": "horror",
"quality_mode": "max_quality",
"voice": {
"style": "ai_male_deep",
"pacing": 0.85
},
"visual_style": "anime",
"models": {
"image": "fal-ai/kling-image/v3/text-to-image",
"video": "fal-ai/vidu/q3/image-to-video",
"music": "beatoven/music-generation",
"sound_effects": "beatoven/sound-effect-generation",
"tts": "fal-ai/qwen-3-tts/voice-design/1.7b",
"subtitles": "enabled"
},
"created_at": "2026-02-13",
"updated_at": "2026-02-13"
}
Or run: clawvid setup (interactive) / clawvid setup --reset (start over).
Ask targeted questions based on how specific the user is:
Vague request:
User: "Make a horror video"
You: "Got it โ horror video. A few questions:
1. What's the story/topic?
2. Do you have a script, or should I write one?
3. Any specific scenes you're imagining?"
Specific request:
User: "Make a horror video about a guy who finds a VHS tape in his attic"
You: "Perfect premise. Let me confirm:
1. POV: First-person narrator or third-person?
2. Tone: Slow-burn dread or jump scares?
3. Ending: Resolved, cliffhanger, or ambiguous?"
CRITICAL: Before building the workflow, gather accurate information and reference images.
web_search + web_fetch):| Scenario | Action | |----------|--------| | Vague topic without details | Research to find interesting angles/facts | | "Did you know" / quiz / trivia | Verify facts, find accurate stats | | How-to / recipe / tutorial | Search for accurate steps and details | | Historical / scientific claims | Fact-check before including in narration | | Trending topics | Search for latest info and context | | User provides no source | Research authoritative sources |
Example research flow:
User: "Make a video about how to boil the perfect egg"
You: [uses web_search for "perfect boiled egg timing methods"]
[uses web_fetch on top cooking sites]
"Did some research! Here's what I found:
- Soft boil: 6-7 min
- Medium: 9-10 min
- Hard boil: 12-13 min
- Pro tip: Ice bath immediately after
Want me to use these timings in the video?"
When visual consistency matters or the user needs specific imagery:
reference_image in workflowUser: "Make a video about ancient Tartarian architecture"
You: [searches for reference images]
[sends 3-4 options to chat]
"Found some reference images for the Tartarian aesthetic:
[image 1] - Ornate domed building
[image 2] - Victorian exhibition hall
[image 3] - Old sepia photograph style
Which style should I use as the reference for consistent visuals?
Or should I generate without a reference?"
ALWAYS send to user chat:
Use the message tool to send media:
message action=send filePath=/path/to/video.mp4 caption="Here's your video!"
"Your defaults are 9:16, 60 seconds, horror template. Want to keep these or adjust?
- Keep defaults
- Change duration (30s / 90s)
- Different template
- Different visual style"
For content requiring smooth motion (cooking shows, tutorials, presentations):
"This type of content works best with:
- Fixed camera angle (same framing every scene)
- Transitions between scenes (smooth interpolation)
- Minimal video motion (prevents jarring clips)
This adds ~$4-6 for transitions but looks much more professional. Enable transitions?"
Horror:
Motivation:
Quiz:
Reddit:
Cooking Show / Tutorial:
Present a scene breakdown before generating:
"Here's my plan for 'Chef Pierre's Kitchen' (60s, cooking show):
SCENES (7 total):
1. [0-10s] VIDEO โ Chef intro, arms wide (NO transition - first scene)
2. [10-25s] IMAGE โ Ingredient reveal (TRANSITION from scene 1)
3. [25-35s] VIDEO โ Mixing batter (TRANSITION from scene 2)
4. [35-46s] IMAGE โ Secret tip moment (TRANSITION from scene 3)
5. [46-56s] VIDEO โ Pan swirl technique (TRANSITION from scene 4)
6. [56-65s] VIDEO โ The flip! (TRANSITION from scene 5)
7. [65-78s] VIDEO โ Final presentation (TRANSITION from scene 6)
TRANSITIONS: 6 (smooth interpolation between all scenes)
CAMERA: Fixed, medium-wide, straight-on
SOUND EFFECTS:
- Scene 1: Applause (0s offset, 4s duration)
- Scene 3: Whisking sounds (0s offset, 5s duration)
- Scene 5: Sizzling pan (1s offset, 4s duration)
- Scene 7: Applause + outro fanfare (3s offset, 4s duration)
AUDIO: French-accented male host voice, upbeat cooking show music
EFFECTS: None (fixed camera style)
Estimated: 7 images + 5 video clips + 6 transitions + 7 TTS + 4 SFX + 1 music track
Time: ~25-30 minutes
Cost: ~$10-14 (using Vidu Q3 for transitions)
Ready to proceed?"
After approval, create the workflow JSON file and run it.
Remember: Tell the user it will take 20-30 minutes before starting!
{
"name": "Video Title",
"template": "quiz",
"timing_mode": "tts_driven",
"scene_padding_seconds": 0.3,
"consistency": {
"reference_prompt": "Character/setting description for consistency",
"seed": 12345678,
"model": "fal-ai/nano-banana-pro"
},
"scenes": [ ... ],
"audio": {
"tts": { ... },
"music": { ... }
},
"subtitles": {
"enabled": true,
"style": { ... }
},
"output": {
"filename": "output_name.mp4",
"resolution": "1080x1920",
"fps": 30,
"format": "mp4"
}
}
{
"id": "scene_1",
"description": "Human-readable description",
"type": "video", // or "image"
"timing": {},
"transition": { // OPTIONAL - only on scenes 2+
"model": "fal-ai/vidu/q3/image-to-video",
"duration": "4",
"prompt": "Smooth transition description"
},
"narration": "What the voice says for this scene",
"image_generation": {
"model": "fal-ai/nano-banana-pro/edit",
"input": {
"prompt": "Detailed visual description",
"negative_prompt": "Things to avoid",
"aspect_ratio": "9:16",
"seed": 12345678
}
},
"video_generation": { // Only for type: "video"
"model": "fal-ai/vidu/image-to-video",
"input": {
"prompt": "Motion description, camera stays fixed, only subject moves",
"duration": "4",
"movement_amplitude": "small"
}
},
"sound_effects": [
{
"prompt": "Sound description",
"timing_offset": 0,
"duration": 4,
"volume": 0.6
}
],
"effects": [] // Ken Burns, vignette, grain, etc.
}
For high-quality horror videos with visual consistency, use this structure:
{
"name": "The Watchers - Horror Production",
"template": "horror",
"timing_mode": "tts_driven",
"scene_padding_seconds": 0.5,
"min_scene_duration_seconds": 5,
"consistency": {
"reference_prompt": "Full-body character design of a dark animated horror entity...",
"seed": 666,
"resolution": "2K"
},
"scenes": [
{
"id": "frame_1",
"description": "Exterior - Abandoned mansion at night, establishing shot",
"type": "video",
"timing": {},
"narration": "They say the mansion on Ashwood Lane has been empty for forty years...",
"image_generation": {
"model": "fal-ai/nano-banana-pro/edit",
"input": {
"prompt": "Wide establishing shot looking up at a massive three-story Victorian Gothic mansion at night...",
"aspect_ratio": "9:16"
}
},
"video_generation": {
"model": "fal-ai/vidu/q3/image-to-video",
"input": {
"prompt": "Slow steady dolly push toward the mansion entrance from the gate...",
"duration": "8",
"resolution": "720p"
}
},
"sound_effects": [
{
"prompt": "Howling wind gusting through dead tree branches at night...",
"timing_offset": 0,
"duration": 8,
"volume": 0.6
}
],
"effects": ["vignette_heavy", "grain", "flicker_subtle"]
}
],
"audio": {
"tts": {
"model": "fal-ai/qwen-3-tts/voice-design/1.7b",
"voice_prompt": "A low raspy whispering male voice, speaking slowly with dread...",
"language": "en",
"speed": 0.85
},
"music": {
"generate": true,
"prompt": "Dark ambient horror soundtrack, deep pulsing sub-bass drones in D minor...",
"duration": 60,
"volume": 0.15,
"fade_in": 3,
"fade_out": 4
}
},
"subtitles": {
"enabled": true,
"style": {
"font": "Impact",
"color": "#ffffff",
"stroke_color": "#000000",
"stroke_width": 5,
"position": "center",
"animation": "word_by_word",
"font_size": 72
}
},
"output": {
"filename": "the_watchers_horror.mp4",
"fps": 30,
"format": "mp4",
"platforms": ["tiktok"]
}
}
All models are configured in config.json under the fal section. Use full fal.ai model IDs in workflow JSON.
| Model | When to Use | Cost | Notes |
|-------|-------------|------|-------|
| fal-ai/kling-image/v3/text-to-image | Standard scenes | $0.03 | Uses aspect_ratio (e.g. "9:16") |
| fal-ai/nano-banana-pro | Reference images | $0.15 | For consistency base |
| fal-ai/nano-banana-pro/edit | Consistent scenes | $0.15 | Edit from reference |
| Model | Duration | Cost | Quality | Notes |
|-------|----------|------|---------|-------|
| fal-ai/kandinsky5-pro/image-to-video | 5s | $0.04-0.12 | Good | Use duration: "5s" (with "s" suffix!) |
| fal-ai/kling-video/v2.6/pro/image-to-video | 5s | $0.35 | Better | Premium motion |
| fal-ai/vidu/image-to-video | 4s | $0.20 | Good | Basic Vidu |
| fal-ai/vidu/q3/image-to-video | 1-16s | $0.50-1.50 | Best | Smoothest motion, transitions |
โ ๏ธ Duration format matters:
"duration": "5s" (with "s" suffix)"duration": "5" or "duration": "8" (number as string)| Model | Resolution | Cost | Notes |
|-------|------------|------|-------|
| veed/fabric-1.0/text | 720p, 480p | ~$0.50 | Best lip-sync, generates audio |
Use for: AI presenters, news anchors, character dialogue, any "person speaking" content.
| Model | Cost | Quality | Notes |
|-------|------|---------|-------|
| fal-ai/vidu/q3/image-to-video | $0.50-1.50 | Best | Smooth morphing between keyframes |
| fal-ai/pixverse/image-to-video | $0.45 | Good | Supports style parameter |
| Model | Purpose | Cost |
|-------|---------|------|
| fal-ai/qwen-3-tts/voice-design/1.7b | Voice-designed TTS narration | $0.09/1K chars |
| fal-ai/whisper | Transcription for subtitle timing | $0.001/sec |
| beatoven/sound-effect-generation | AI sound effect generation (1-35s) | $0.10/req |
| beatoven/music-generation | AI background music generation (5-150s) | $0.10/req |
For a 60-second video:
type: "image" with Ken Burns effects for narration-heavy scenestype: "video" for dramatic moments that need motionIs this the first scene?
โโโ YES โ type: "video" (strong hook), NO transition
โโโ NO โ Does this scene need motion?
โโโ YES โ type: "video"
โ โโโ Should it flow smoothly from previous scene?
โ โโโ YES โ ADD transition
โ โโโ NO (jump cut is intentional) โ NO transition
โโโ NO โ type: "image"
โโโ Should it flow smoothly from previous scene?
โโโ YES โ ADD transition
โโโ NO โ NO transition, use Ken Burns for subtle motion
| Content Type | Video Scenes | Transitions | Effects | |--------------|--------------|-------------|---------| | Horror | 3-4 | Selective | vignette, grain, flicker | | Cooking Show | 4-5 | ALL (except first) | None (clean look) | | Tutorial | 3-4 | ALL (except first) | None or minimal | | Motivation | 2-3 | Optional | kenburns on images | | Quiz/Trivia | 2-3 | None | Clean, vibrant | | Fast montage | 3-5 | None (hard cuts) | Template-dependent |
# Generate video from workflow JSON (full pipeline)
clawvid generate --workflow workflow.json
clawvid generate --workflow workflow.json --quality max_quality
clawvid generate --workflow workflow.json --template horror --skip-cache
# PHASED GENERATION - Generate in stages with review
clawvid generate --workflow workflow.json --phase images # Images only, pause for review
clawvid generate --workflow workflow.json --phase videos # Videos only (uses existing images)
clawvid generate --workflow workflow.json --phase audio # Audio only
clawvid generate --workflow workflow.json --phase render # Render only
# VISION QA - Check images for issues before continuing
clawvid generate --workflow workflow.json --qa # Enable QA checks
clawvid generate --workflow workflow.json --qa-auto-fix # Auto-regenerate failed images
# SELECTIVE REGENERATION - Fix specific scenes
clawvid generate --workflow workflow.json --regenerate scene_5,scene_6
clawvid generate --workflow workflow.json --use-existing-images --regenerate scene_3
# Re-render from a previous run's assets
clawvid render --run output/2026-02-11-haunted-library/
clawvid render --run output/2026-02-11-haunted-library/ --all-platforms
clawvid render --run output/2026-02-11-haunted-library/ --platform tiktok
# Preview workflow in Remotion
clawvid preview --workflow workflow.json
clawvid preview --workflow workflow.json --platform youtube
# Launch Remotion studio for visual editing
clawvid studio
# Configure preferences
clawvid setup
clawvid setup --reset
generate does)Phase 1 (1-2 min): Load config, validate workflow, create output directory
Phase 2 (2-4 min): Generate TTS narration for all scenes
Phase 3 (3-5 min): Generate images (kling-image or nano-banana-pro)
Phase 4 (8-12 min): Generate video clips (slowest phase)
Phase 5 (3-5 min): Generate transitions (if any scenes have transition field)
Phase 6 (1-2 min): Generate sound effects (beatoven)
Phase 7 (1-2 min): Generate background music (beatoven)
Phase 8 (2-3 min): Transcribe narration with Whisper (for word-level subtitles)
Phase 9 (1-2 min): Mix audio (narration + music + SFX)
Phase 10 (2-3 min): Render with Remotion + FFmpeg post-processing
Total: ~20-30 minutes for a 6-scene video with transitions
Effects are applied per-scene via the effects array. Names are fuzzy-matched.
| Effect | Variants | Description |
|--------|----------|-------------|
| vignette | vignette_subtle, vignette_heavy | Dark edges |
| grain | grain_subtle, grain_heavy | Film grain noise |
| ken_burns | kenburns_slow_zoom, kenburns_slow_pan, kenburns_zoom_out | Zoom/pan on images |
| flicker | flicker_subtle | Light flickering |
| glitch | glitch_subtle, glitch_heavy | RGB splitting |
| chromatic_aberration | chromatic_aberration_subtle | Color fringing |
Note: For fixed-camera content (cooking shows), do NOT use Ken Burns effects.
Templates apply color grading, overlays, and default effects.
saturate(0.6) brightness(0.85) contrast(1.15)saturate(1.1) brightness(1.05) sepia(0.12)saturate(1.25) brightness(1.08) contrast(1.1)saturate(0.9) brightness(0.95)ClawVid includes Vision QA to automatically detect common issues in AI-generated images.
| Issue Type | Severity | Example |
|------------|----------|---------|
| hallucinated_text | Error | "PROJECT: MIDNIGHT ECHO" appearing in image |
| unwanted_logo | Error | History Channel logo, stock watermarks |
| stock_image | Warning | Generic stock photo look |
| style_drift | Warning | Image style differs from reference |
| missing_element | Warning | Requested subject not visible |
Image models often hallucinate text/logos when prompted with certain terms:
Trigger words that cause issues:
Safe alternatives:
# Check images after generation
clawvid generate --workflow x.json --qa
# Auto-fix by regenerating with sanitized prompts
clawvid generate --workflow x.json --qa-auto-fix
Always include in negative_prompt:
"negative_prompt": "text, watermark, logo, brand, copyright, title card, news ticker, TV graphics, stock photo"
For critical projects, generate images first and review:
# Step 1: Generate images only
clawvid generate --workflow x.json --phase images
# Step 2: Review images in output folder
# Step 3: Fix problematic scenes
clawvid generate --workflow x.json --regenerate scene_5,scene_6
# Step 4: Continue with videos
clawvid generate --workflow x.json --phase videos --use-existing-images
Add skip_qa: true to scenes that should bypass checking:
{
"id": "scene_3",
"type": "static",
"skip_qa": true,
"static_image": { "url": "..." }
}
// WRONG - cooking show with hard cuts
{ "id": "scene_2", "type": "video", ... }
{ "id": "scene_3", "type": "video", ... }
// CORRECT - smooth flow
{ "id": "scene_2", "transition": { "model": "fal-ai/vidu/q3/image-to-video", "duration": "4", "prompt": "..." }, "type": "video", ... }
// WRONG - no previous scene to transition from
{ "id": "scene_1", "transition": { ... }, ... }
// CORRECT - first scene has no transition
{ "id": "scene_1", "type": "video", ... }
// WRONG - video describes new content
"image_prompt": "Chef standing at counter"
"video_prompt": "Chef running through kitchen"
// CORRECT - video describes motion OF the image
"image_prompt": "Chef standing at counter"
"video_prompt": "Chef gestures while standing at counter, camera fixed"
// WRONG - breaks the "fixed camera" illusion
"effects": ["kenburns_slow_zoom"]
// CORRECT - no camera movement effects
"effects": []
// WRONG - triggers History Channel branding
"prompt": "Dark cinematic documentary style, History Channel conspiracy aesthetic..."
// CORRECT - same style without brand references
"prompt": "Dark cinematic film style, moody lighting, dramatic shadows, blue and orange color grading, film grain texture...",
"negative_prompt": "text, watermark, logo, brand, title card, TV graphics"
// WRONG - different angles break continuity
"scene_1 prompt": "...wide angle shot..."
"scene_2 prompt": "...close-up shot..."
"scene_3 prompt": "...overhead view..."
// CORRECT - same angle throughout
"All prompts": "Fixed camera cooking show shot, medium wide angle view, straight-on at chest height, ..."
1. READ THIS SKILL โ Every time, fresh
2. CHECK PREFERENCES โ Load preferences.json or run setup
3. GATHER REQUIREMENTS โ Topic, format, style questions
4. DECIDE ON TRANSITIONS โ For continuous content, recommend transitions
5. BUILD PLAN โ Present scene breakdown with transition plan
6. GET APPROVAL โ Wait for explicit "go"
7. WARN ABOUT TIME โ "This will take 20-30 minutes. Ready?"
8. GENERATE WORKFLOW โ Create the workflow JSON (following ALL rules above)
9. EXECUTE โ Run clawvid generate --workflow <file> (NO TIMEOUT!)
10. MONITOR โ Poll process and report progress
11. REVIEW โ Check outputs
12. DELIVER โ Compress, send video to chat, show cost summary
When generation completes:
message tool with filePath# Compress for chat delivery
ffmpeg -y -i output/.../tiktok/final.mp4 \
-c:v libx264 -preset fast -crf 28 \
-c:a aac -b:a 128k \
output/.../tiktok/playable.mp4
# Send to user
message action=send filePath=/path/to/playable.mp4 caption="๐ฌ Your video is ready!"
After each generation step, verify:
"movement_amplitude": "small" for stable video clipsMachine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.
Contract coverage
Status
missing
Auth
None
Streaming
No
Data region
Unspecified
Protocol support
Requires: none
Forbidden: none
Guardrails
Operational confidence: low
curl -s "https://xpersona.co/api/v1/agents/neur0map-clawvid/snapshot"
curl -s "https://xpersona.co/api/v1/agents/neur0map-clawvid/contract"
curl -s "https://xpersona.co/api/v1/agents/neur0map-clawvid/trust"
Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.
Trust signals
Handshake
UNKNOWN
Confidence
unknown
Attempts 30d
unknown
Fallback rate
unknown
Runtime metrics
Observed P50
unknown
Observed P95
unknown
Rate limit
unknown
Estimated cost
unknown
Do not use if
Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.
Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.
Rank
70
AI Agents & MCPs & AI Workflow Automation โข (~400 MCP servers for AI agents) โข AI Automation / AI Agent with MCPs โข AI Workflows & AI Agents โข MCPs for AI Agents
Traction
No public download signal
Freshness
Updated 2d ago
Rank
70
AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs
Traction
No public download signal
Freshness
Updated 5d ago
Rank
70
Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | ๐ Star if you like it!
Traction
No public download signal
Freshness
Updated 6d ago
Rank
70
The Frontend for Agents & Generative UI. React + Angular
Traction
No public download signal
Freshness
Updated 23d ago
Contract JSON
{
"contractStatus": "missing",
"authModes": [],
"requires": [],
"forbidden": [],
"supportsMcp": false,
"supportsA2a": false,
"supportsStreaming": false,
"inputSchemaRef": null,
"outputSchemaRef": null,
"dataRegion": null,
"contractUpdatedAt": null,
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Invocation Guide
{
"preferredApi": {
"snapshotUrl": "https://xpersona.co/api/v1/agents/neur0map-clawvid/snapshot",
"contractUrl": "https://xpersona.co/api/v1/agents/neur0map-clawvid/contract",
"trustUrl": "https://xpersona.co/api/v1/agents/neur0map-clawvid/trust"
},
"curlExamples": [
"curl -s \"https://xpersona.co/api/v1/agents/neur0map-clawvid/snapshot\"",
"curl -s \"https://xpersona.co/api/v1/agents/neur0map-clawvid/contract\"",
"curl -s \"https://xpersona.co/api/v1/agents/neur0map-clawvid/trust\""
],
"jsonRequestTemplate": {
"query": "summarize this repo",
"constraints": {
"maxLatencyMs": 2000,
"protocolPreference": [
"OPENCLEW"
]
}
},
"jsonResponseTemplate": {
"ok": true,
"result": {
"summary": "...",
"confidence": 0.9
},
"meta": {
"source": "GITHUB_OPENCLEW",
"generatedAt": "2026-04-17T01:49:16.727Z"
}
},
"retryPolicy": {
"maxAttempts": 3,
"backoffMs": [
500,
1500,
3500
],
"retryableConditions": [
"HTTP_429",
"HTTP_503",
"NETWORK_TIMEOUT"
]
}
}Trust JSON
{
"status": "unavailable",
"handshakeStatus": "UNKNOWN",
"verificationFreshnessHours": null,
"reputationScore": null,
"p95LatencyMs": null,
"successRate30d": null,
"fallbackRate": null,
"attempts30d": null,
"trustUpdatedAt": null,
"trustConfidence": "unknown",
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Capability Matrix
{
"rows": [
{
"key": "OPENCLEW",
"type": "protocol",
"support": "unknown",
"confidenceSource": "profile",
"notes": "Listed on profile"
},
{
"key": "accent",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
},
{
"key": "pick",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
},
{
"key": "style",
"type": "capability",
"support": "supported",
"confidenceSource": "profile",
"notes": "Declared in agent profile metadata"
}
],
"flattenedTokens": "protocol:OPENCLEW|unknown|profile capability:accent|supported|profile capability:pick|supported|profile capability:style|supported|profile"
}Facts JSON
[
{
"factKey": "docs_crawl",
"category": "integration",
"label": "Crawlable docs",
"value": "6 indexed pages on the official domain",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
},
{
"factKey": "vendor",
"category": "vendor",
"label": "Vendor",
"value": "Neur0map",
"href": "https://github.com/neur0map/clawvid",
"sourceUrl": "https://github.com/neur0map/clawvid",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-02-25T02:07:08.413Z",
"isPublic": true
},
{
"factKey": "protocols",
"category": "compatibility",
"label": "Protocol compatibility",
"value": "OpenClaw",
"href": "https://xpersona.co/api/v1/agents/neur0map-clawvid/contract",
"sourceUrl": "https://xpersona.co/api/v1/agents/neur0map-clawvid/contract",
"sourceType": "contract",
"confidence": "medium",
"observedAt": "2026-02-25T02:07:08.413Z",
"isPublic": true
},
{
"factKey": "traction",
"category": "adoption",
"label": "Adoption signal",
"value": "5 GitHub stars",
"href": "https://github.com/neur0map/clawvid",
"sourceUrl": "https://github.com/neur0map/clawvid",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-02-25T02:07:08.413Z",
"isPublic": true
},
{
"factKey": "handshake_status",
"category": "security",
"label": "Handshake status",
"value": "UNKNOWN",
"href": "https://xpersona.co/api/v1/agents/neur0map-clawvid/trust",
"sourceUrl": "https://xpersona.co/api/v1/agents/neur0map-clawvid/trust",
"sourceType": "trust",
"confidence": "medium",
"observedAt": null,
"isPublic": true
}
]Change Events JSON
[
{
"eventType": "docs_update",
"title": "Docs refreshed: Sign in to GitHub ยท GitHub",
"description": "Fresh crawlable documentation was indexed for the official domain.",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
}
]Sponsored
Ads related to clawvid and adjacent AI workflows.