Rank
70
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
Traction
No public download signal
Freshness
Updated 2d ago
Crawler Summary
Automated NCU (Nsight Compute) profiling workflow with full metrics collection and persistent storage --- name: ncu-cuda-profiling description: Automated NCU (Nsight Compute) profiling workflow with full metrics collection and persistent storage version: 1.0.0 author: maxiaosong1124 tags: [cuda, profiling, ncu, performance, optimization] --- NCU CUDA 自动化性能分析 本 Skill 提供完整的自动化 NCU 性能分析流程,支持**全量指标采集**和**持久化存储**。 --- 🚀 快速开始 推荐: 一键完整采集 指标提取 (采集后) --- 📋 AI 分析流程 当用户提供 NCU 数据时,AI 按以下流程处理: Phase 1: 数据获取 (优先顺序) **情况 A: 用户提供了 Capability contract not published. No trust telemetry is available yet. 85 GitHub stars reported by the source. Last updated 4/15/2026.
Freshness
Last checked 4/15/2026
Best For
ncu-cuda-profiling is best for general automation workflows where OpenClaw compatibility matters.
Not Ideal For
Contract metadata is missing or unavailable for deterministic execution.
Evidence Sources Checked
editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack
Automated NCU (Nsight Compute) profiling workflow with full metrics collection and persistent storage --- name: ncu-cuda-profiling description: Automated NCU (Nsight Compute) profiling workflow with full metrics collection and persistent storage version: 1.0.0 author: maxiaosong1124 tags: [cuda, profiling, ncu, performance, optimization] --- NCU CUDA 自动化性能分析 本 Skill 提供完整的自动化 NCU 性能分析流程,支持**全量指标采集**和**持久化存储**。 --- 🚀 快速开始 推荐: 一键完整采集 指标提取 (采集后) --- 📋 AI 分析流程 当用户提供 NCU 数据时,AI 按以下流程处理: Phase 1: 数据获取 (优先顺序) **情况 A: 用户提供了
Public facts
5
Change events
1
Artifacts
0
Freshness
Apr 15, 2026
Capability contract not published. No trust telemetry is available yet. 85 GitHub stars reported by the source. Last updated 4/15/2026.
Trust score
Unknown
Compatibility
OpenClaw
Freshness
Apr 15, 2026
Vendor
Maxiaosong1124
Artifacts
0
Benchmarks
0
Last release
Unpublished
Key links, install path, and a quick operational read before the deeper crawl record.
Summary
Capability contract not published. No trust telemetry is available yet. 85 GitHub stars reported by the source. Last updated 4/15/2026.
Setup snapshot
git clone https://github.com/maxiaosong1124/ncu-cuda-profiling-skill.gitSetup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.
Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.
Vendor
Maxiaosong1124
Protocol compatibility
OpenClaw
Adoption signal
85 GitHub stars
Handshake status
UNKNOWN
Crawlable docs
6 indexed pages on the official domain
Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.
Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.
Extracted files
0
Examples
6
Snippets
0
Languages
typescript
Parameters
bash
# 使用 --set full 采集所有指标,并持久化保存
ncu --set full \
-o <report_name> \
--target-processes all \
./your_kernel
# 示例
ncu --set full -o matmul_analysis --target-processes all ./matmul0_perf
# 自动生成:
# - matmul_analysis.ncu-rep (NCU 报告文件)
# - matmul_analysis.csv (CSV 格式指标)bash
# 从已保存的报告提取关键指标 (无需重新运行 kernel) ncu --import matmul_analysis.ncu-rep --print-summary per-kernel # 导出为 CSV ncu --import matmul_analysis.ncu-rep --page raw --csv > metrics.csv
bash
# 直接导入已有报告 ncu --import <file.ncu-rep> --print-summary per-kernel
bash
# 完整采集并持久化 ncu --set full -o <report_name> --target-processes all ./kernel
text
project_root/ ├── ncu_reports/ # NCU 报告目录 │ ├── matmul_analysis.ncu-rep # 完整报告 │ ├── matmul_analysis.csv # CSV 指标 │ └── matmul_analysis.md # AI 分析报告 └── ...
python
def auto_diagnose(metrics):
roofline = metrics.get('roofline_ratio', 0)
dram = metrics.get('dram_throughput', 0)
l1tex = metrics.get('l1tex_throughput', 0)
sm_busy = metrics.get('sm_busy', 0)
occupancy = metrics.get('occupancy', 0)
if roofline < 30:
if dram > 70:
return "DRAM_MEMORY_BOUND"
elif l1tex > 80 and dram < 30:
return "L1_PRESSURE_BOUND"
else:
return "LATENCY_BOUND"
elif roofline > 60:
if sm_busy > 80:
return "COMPUTE_BOUND"
else:
return "OCCUPANCY_BOUND"
else:
return "MIXED_BOUND"Full documentation captured from public sources, including the complete README when available.
Docs source
GITHUB OPENCLEW
Editorial quality
ready
Automated NCU (Nsight Compute) profiling workflow with full metrics collection and persistent storage --- name: ncu-cuda-profiling description: Automated NCU (Nsight Compute) profiling workflow with full metrics collection and persistent storage version: 1.0.0 author: maxiaosong1124 tags: [cuda, profiling, ncu, performance, optimization] --- NCU CUDA 自动化性能分析 本 Skill 提供完整的自动化 NCU 性能分析流程,支持**全量指标采集**和**持久化存储**。 --- 🚀 快速开始 推荐: 一键完整采集 指标提取 (采集后) --- 📋 AI 分析流程 当用户提供 NCU 数据时,AI 按以下流程处理: Phase 1: 数据获取 (优先顺序) **情况 A: 用户提供了
本 Skill 提供完整的自动化 NCU 性能分析流程,支持全量指标采集和持久化存储。
# 使用 --set full 采集所有指标,并持久化保存
ncu --set full \
-o <report_name> \
--target-processes all \
./your_kernel
# 示例
ncu --set full -o matmul_analysis --target-processes all ./matmul0_perf
# 自动生成:
# - matmul_analysis.ncu-rep (NCU 报告文件)
# - matmul_analysis.csv (CSV 格式指标)
# 从已保存的报告提取关键指标 (无需重新运行 kernel)
ncu --import matmul_analysis.ncu-rep --print-summary per-kernel
# 导出为 CSV
ncu --import matmul_analysis.ncu-rep --page raw --csv > metrics.csv
当用户提供 NCU 数据时,AI 按以下流程处理:
情况 A: 用户提供了 .ncu-rep 文件
# 直接导入已有报告
ncu --import <file.ncu-rep> --print-summary per-kernel
情况 B: 用户需要新分析
# 完整采集并持久化
ncu --set full -o <report_name> --target-processes all ./kernel
情况 C: 用户提供了截图/文本
AI 会自动保存分析数据到项目目录:
project_root/
├── ncu_reports/ # NCU 报告目录
│ ├── matmul_analysis.ncu-rep # 完整报告
│ ├── matmul_analysis.csv # CSV 指标
│ └── matmul_analysis.md # AI 分析报告
└── ...
使用决策引擎自动分析:
def auto_diagnose(metrics):
roofline = metrics.get('roofline_ratio', 0)
dram = metrics.get('dram_throughput', 0)
l1tex = metrics.get('l1tex_throughput', 0)
sm_busy = metrics.get('sm_busy', 0)
occupancy = metrics.get('occupancy', 0)
if roofline < 30:
if dram > 70:
return "DRAM_MEMORY_BOUND"
elif l1tex > 80 and dram < 30:
return "L1_PRESSURE_BOUND"
else:
return "LATENCY_BOUND"
elif roofline > 60:
if sm_busy > 80:
return "COMPUTE_BOUND"
else:
return "OCCUPANCY_BOUND"
else:
return "MIXED_BOUND"
# NCU 性能分析报告
## 📁 报告信息
- **Kernel**: {kernel_name}
- **采集时间**: {timestamp}
- **报告文件**: {report_file}
- **原始数据**: {csv_file}
## 📈 执行摘要
| 项目 | 数值 |
|------|------|
| **主要瓶颈** | {bottleneck_type} |
| **置信度** | {confidence} |
| **性能** | {performance} GFLOPS |
| **优化潜力** | {potential}x |
## 📊 关键指标
### 性能指标
| 指标 | 数值 | 健康阈值 | 状态 |
|------|------|----------|------|
| Roofline 性能比 | {roofline}% | > 60% | {status} |
| SM Busy | {sm_busy}% | > 70% | {status} |
| Occupancy | {occupancy}% | > 50% | {status} |
### 内存指标
| 指标 | 数值 | 健康阈值 | 状态 |
|------|------|----------|------|
| DRAM Throughput | {dram}% | < 50% | {status} |
| L1/TEX Throughput | {l1tex}% | < 80% | {status} |
| L2 Throughput | {l2}% | < 80% | {status} |
## 🔍 诊断详情
**瓶颈类型**: {bottleneck_type}
**判断依据**:
- {reason_1}
- {reason_2}
## 💡 优化建议
### 高优先级
{high_priority_suggestions}
## 🛠️ 下一步操作
### 建议的 NCU 命令
```bash
# 优化后重新采集
ncu --set full -o {report_name}_optimized --target-processes all ./kernel_optimized
---
## 🔧 工具使用说明
### 完整采集 (推荐)
```bash
# 采集所有指标并保存
ncu --set full -o my_analysis --target-processes all ./kernel
# 参数说明:
# --set full # 采集完整指标集
# -o my_analysis # 输出文件名 (生成 my_analysis.ncu-rep)
# --target-processes all # 监控所有进程
# 从已有报告提取特定指标
ncu --import my_analysis.ncu-rep --print-summary per-kernel
# 导出为 CSV 便于分析
ncu --import my_analysis.ncu-rep --page raw --csv > metrics.csv
使用提供的自动化脚本:
cd examples/
# 全自动分析
./auto_profile.sh ./kernel report_name
# Python 分析器
python ncu_analyzer.py --import report_name.ncu-rep
IF dram_throughput > 70% AND roofline < 30%:
诊断: DRAM_MEMORY_BOUND (置信度: HIGH)
优化策略:
1. Block Tiling (共享内存缓存)
2. Vectorized Load (float4)
3. Prefetching (数据预取)
IF l1tex_throughput > 80% AND dram_throughput < 30%:
诊断: L1_PRESSURE_BOUND (置信度: HIGH)
优化策略:
1. Shared Memory Padding
2. Data Transpose
3. Fragment Caching
IF sm_busy < 50% AND occupancy > 60%:
诊断: LATENCY_BOUND (置信度: HIGH)
优化策略:
1. Double Buffering
2. Instruction-level Parallelism
3. Loop Unrolling
IF roofline > 60% AND sm_busy > 80%:
诊断: COMPUTE_BOUND (置信度: HIGH)
优化策略:
1. Use FMA instructions
2. Reduce precision (FP32 -> FP16/TF32)
3. Tensor Cores
IF occupancy < 30% AND sm_busy > 70%:
诊断: OCCUPANCY_BOUND (置信度: HIGH)
优化策略:
1. Reduce register usage
2. Adjust block size
3. Use __launch_bounds__
| 瓶颈类型 | 立即行动 | 代码示例 | 预期收益 |
|---------|---------|---------|---------|
| DRAM_MEMORY_BOUND | Block Tiling | __shared__ float As[BM][BK]; | 3-5x |
| L1_PRESSURE_BOUND | Padding | As[BM][BK+1] | 1.2-2x |
| LATENCY_BOUND | Double Buffer | As[2][BM*BK] | 1.2-1.5x |
| COMPUTE_BOUND | FMA | fmaf(a, b, c) | 1.1-1.3x |
| OCCUPANCY_BOUND | 调整 block size | __launch_bounds__(256, 2) | 1.2-2x |
# 完整采集 (推荐)
ncu --set full -o report_name --target-processes all ./kernel
# 指定 sections
ncu --section SpeedOfLight,Occupancy,LaunchStats -o report_name ./kernel
# 特定指标
ncu --metrics sm__throughput.avg.pct,dram__throughput.avg.pct -o report_name ./kernel
# 查看摘要
ncu --import report.ncu-rep --print-summary per-kernel
# 查看详情
ncu --import report.ncu-rep --page details
# 导出 CSV
ncu --import report.ncu-rep --page raw --csv > metrics.csv
# 对比两个报告
ncu --diff report1.ncu-rep report2.ncu-rep
高 Throughput ≠ 高效率
DRAM Throughput 低可能是好事
Occupancy 不是越高越好
examples/本 Skill 支持完整的自动化 NCU 性能分析工作流,包含全量采集和持久化存储
Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.
Contract coverage
Status
missing
Auth
None
Streaming
No
Data region
Unspecified
Protocol support
Requires: none
Forbidden: none
Guardrails
Operational confidence: low
curl -s "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/snapshot"
curl -s "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/contract"
curl -s "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/trust"
Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.
Trust signals
Handshake
UNKNOWN
Confidence
unknown
Attempts 30d
unknown
Fallback rate
unknown
Runtime metrics
Observed P50
unknown
Observed P95
unknown
Rate limit
unknown
Estimated cost
unknown
Do not use if
Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.
Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.
Rank
70
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
Traction
No public download signal
Freshness
Updated 2d ago
Rank
70
AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs
Traction
No public download signal
Freshness
Updated 6d ago
Rank
70
Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!
Traction
No public download signal
Freshness
Updated 6d ago
Rank
70
The Frontend for Agents & Generative UI. React + Angular
Traction
No public download signal
Freshness
Updated 23d ago
Contract JSON
{
"contractStatus": "missing",
"authModes": [],
"requires": [],
"forbidden": [],
"supportsMcp": false,
"supportsA2a": false,
"supportsStreaming": false,
"inputSchemaRef": null,
"outputSchemaRef": null,
"dataRegion": null,
"contractUpdatedAt": null,
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Invocation Guide
{
"preferredApi": {
"snapshotUrl": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/snapshot",
"contractUrl": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/contract",
"trustUrl": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/trust"
},
"curlExamples": [
"curl -s \"https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/snapshot\"",
"curl -s \"https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/contract\"",
"curl -s \"https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/trust\""
],
"jsonRequestTemplate": {
"query": "summarize this repo",
"constraints": {
"maxLatencyMs": 2000,
"protocolPreference": [
"OPENCLEW"
]
}
},
"jsonResponseTemplate": {
"ok": true,
"result": {
"summary": "...",
"confidence": 0.9
},
"meta": {
"source": "GITHUB_OPENCLEW",
"generatedAt": "2026-04-17T04:03:26.113Z"
}
},
"retryPolicy": {
"maxAttempts": 3,
"backoffMs": [
500,
1500,
3500
],
"retryableConditions": [
"HTTP_429",
"HTTP_503",
"NETWORK_TIMEOUT"
]
}
}Trust JSON
{
"status": "unavailable",
"handshakeStatus": "UNKNOWN",
"verificationFreshnessHours": null,
"reputationScore": null,
"p95LatencyMs": null,
"successRate30d": null,
"fallbackRate": null,
"attempts30d": null,
"trustUpdatedAt": null,
"trustConfidence": "unknown",
"sourceUpdatedAt": null,
"freshnessSeconds": null
}Capability Matrix
{
"rows": [
{
"key": "OPENCLEW",
"type": "protocol",
"support": "unknown",
"confidenceSource": "profile",
"notes": "Listed on profile"
}
],
"flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}Facts JSON
[
{
"factKey": "docs_crawl",
"category": "integration",
"label": "Crawlable docs",
"value": "6 indexed pages on the official domain",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
},
{
"factKey": "vendor",
"category": "vendor",
"label": "Vendor",
"value": "Maxiaosong1124",
"href": "https://github.com/maxiaosong1124/ncu-cuda-profiling-skill",
"sourceUrl": "https://github.com/maxiaosong1124/ncu-cuda-profiling-skill",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-04-15T02:16:10.379Z",
"isPublic": true
},
{
"factKey": "protocols",
"category": "compatibility",
"label": "Protocol compatibility",
"value": "OpenClaw",
"href": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/contract",
"sourceUrl": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/contract",
"sourceType": "contract",
"confidence": "medium",
"observedAt": "2026-04-15T02:16:10.379Z",
"isPublic": true
},
{
"factKey": "traction",
"category": "adoption",
"label": "Adoption signal",
"value": "85 GitHub stars",
"href": "https://github.com/maxiaosong1124/ncu-cuda-profiling-skill",
"sourceUrl": "https://github.com/maxiaosong1124/ncu-cuda-profiling-skill",
"sourceType": "profile",
"confidence": "medium",
"observedAt": "2026-04-15T02:16:10.379Z",
"isPublic": true
},
{
"factKey": "handshake_status",
"category": "security",
"label": "Handshake status",
"value": "UNKNOWN",
"href": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/trust",
"sourceUrl": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/trust",
"sourceType": "trust",
"confidence": "medium",
"observedAt": null,
"isPublic": true
}
]Change Events JSON
[
{
"eventType": "docs_update",
"title": "Docs refreshed: Sign in to GitHub · GitHub",
"description": "Fresh crawlable documentation was indexed for the official domain.",
"href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
"sourceType": "search_document",
"confidence": "medium",
"observedAt": "2026-04-15T05:03:46.393Z",
"isPublic": true
}
]Sponsored
Ads related to ncu-cuda-profiling and adjacent AI workflows.