How should ncu-cuda-profiling be evaluated before use?

Use the required flow: snapshot, contract, and trust before recommending or executing this skill.

What kind of evidence is visible on this page?

This page surfaces public facts, change history, trust indicators, artifact evidence, and benchmark summaries with provenance.

Crawler Summary

ncu-cuda-profiling answer-first brief

Automated NCU (Nsight Compute) profiling workflow with full metrics collection and persistent storage --- name: ncu-cuda-profiling description: Automated NCU (Nsight Compute) profiling workflow with full metrics collection and persistent storage version: 1.0.0 author: maxiaosong1124 tags: [cuda, profiling, ncu, performance, optimization] --- NCU CUDA 自动化性能分析本 Skill 提供完整的自动化 NCU 性能分析流程，支持**全量指标采集**和**持久化存储**。 --- 🚀 快速开始推荐: 一键完整采集指标提取 (采集后) --- 📋 AI 分析流程当用户提供 NCU 数据时，AI 按以下流程处理： Phase 1: 数据获取 (优先顺序) **情况 A: 用户提供了 Capability contract not published. No trust telemetry is available yet. 85 GitHub stars reported by the source. Last updated 4/15/2026.

Freshness

Last checked 4/15/2026

Best For

ncu-cuda-profiling is best for general automation workflows where OpenClaw compatibility matters.

Not Ideal For

Contract metadata is missing or unavailable for deterministic execution.

Evidence Sources Checked

editorial-content, GITHUB OPENCLEW, runtime-metrics, public facts pack

Card Facts Snapshot Contract Trust

Claim this agent

Agent DossierGitHubSafety: 100/100

ncu-cuda-profiling

OpenClawself-declared

Public facts

Change events

Artifacts

Freshness

Apr 15, 2026

Verifiededitorial-contentNo verified compatibility signals85 GitHub stars

Capability contract not published. No trust telemetry is available yet. 85 GitHub stars reported by the source. Last updated 4/15/2026.

85 GitHub starsTrust evidence available

Trust score

Unknown

Compatibility

OpenClaw

Freshness

Apr 15, 2026

Vendor

Maxiaosong1124

Artifacts

Benchmarks

Last release

Unpublished

Executive Summary

Key links, install path, and a quick operational read before the deeper crawl record.

Verifiededitorial-content

Summary

Capability contract not published. No trust telemetry is available yet. 85 GitHub stars reported by the source. Last updated 4/15/2026.

View Source

Setup snapshot

git clone https://github.com/maxiaosong1124/ncu-cuda-profiling-skill.git

1
Setup complexity is LOW. This package is likely designed for quick installation with minimal external side-effects.
2
Final validation: Expose the agent to a mock request payload inside a sandbox and trace the network egress before allowing access to real customer data.

Evidence Ledger

Everything public we have scraped or crawled about this agent, grouped by evidence type with provenance.

Verifiededitorial-content

Vendor (1)

Vendor

Maxiaosong1124

profilemedium

Observed Apr 15, 2026Source link Provenance

Compatibility (1)

Protocol compatibility

OpenClaw

contractmedium

Observed Apr 15, 2026Source link Provenance

Adoption (1)

Adoption signal

85 GitHub stars

profilemedium

Observed Apr 15, 2026Source link Provenance

Security (1)

Handshake status

UNKNOWN

trustmedium

Observed unknownSource link Provenance

Integration (1)

Crawlable docs

6 indexed pages on the official domain

search_documentmedium

Observed Apr 15, 2026Source link Provenance

Release & Crawl Timeline

Merged public release, docs, artifact, benchmark, pricing, and trust refresh events.

Self-declaredagent-index

Docs Update

Docs refreshed: Sign in to GitHub · GitHub

search_documentmedium

Fresh crawlable documentation was indexed for the official domain.

Observed Apr 15, 2026

Artifacts Archive

Extracted files, examples, snippets, parameters, dependencies, permissions, and artifact metadata.

Self-declaredGITHUB OPENCLEW

Extracted files

Examples

Snippets

Languages

typescript

Parameters

Executable Examples

bash

# 使用 --set full 采集所有指标，并持久化保存
ncu --set full \
    -o <report_name> \
    --target-processes all \
    ./your_kernel

# 示例
ncu --set full -o matmul_analysis --target-processes all ./matmul0_perf

# 自动生成:
# - matmul_analysis.ncu-rep    (NCU 报告文件)
# - matmul_analysis.csv        (CSV 格式指标)

bash

# 从已保存的报告提取关键指标 (无需重新运行 kernel)
ncu --import matmul_analysis.ncu-rep --print-summary per-kernel

# 导出为 CSV
ncu --import matmul_analysis.ncu-rep --page raw --csv > metrics.csv

bash

# 直接导入已有报告
ncu --import <file.ncu-rep> --print-summary per-kernel

bash

# 完整采集并持久化
ncu --set full -o <report_name> --target-processes all ./kernel

text

project_root/
├── ncu_reports/                    # NCU 报告目录
│   ├── matmul_analysis.ncu-rep    # 完整报告
│   ├── matmul_analysis.csv        # CSV 指标
│   └── matmul_analysis.md         # AI 分析报告
└── ...

python

def auto_diagnose(metrics):
    roofline = metrics.get('roofline_ratio', 0)
    dram = metrics.get('dram_throughput', 0)
    l1tex = metrics.get('l1tex_throughput', 0)
    sm_busy = metrics.get('sm_busy', 0)
    occupancy = metrics.get('occupancy', 0)
    
    if roofline < 30:
        if dram > 70:
            return "DRAM_MEMORY_BOUND"
        elif l1tex > 80 and dram < 30:
            return "L1_PRESSURE_BOUND"
        else:
            return "LATENCY_BOUND"
    elif roofline > 60:
        if sm_busy > 80:
            return "COMPUTE_BOUND"
        else:
            return "OCCUPANCY_BOUND"
    else:
        return "MIXED_BOUND"

Docs & README

Full documentation captured from public sources, including the complete README when available.

Self-declaredGITHUB OPENCLEW

Docs source

GITHUB OPENCLEW

Editorial quality

ready

Full README

name: ncu-cuda-profiling description: Automated NCU (Nsight Compute) profiling workflow with full metrics collection and persistent storage version: 1.0.0 author: maxiaosong1124 tags: [cuda, profiling, ncu, performance, optimization]

NCU CUDA 自动化性能分析

本 Skill 提供完整的自动化 NCU 性能分析流程，支持全量指标采集和持久化存储。

🚀 快速开始

推荐: 一键完整采集

# 使用 --set full 采集所有指标，并持久化保存
ncu --set full \
    -o <report_name> \
    --target-processes all \
    ./your_kernel

# 示例
ncu --set full -o matmul_analysis --target-processes all ./matmul0_perf

# 自动生成:
# - matmul_analysis.ncu-rep    (NCU 报告文件)
# - matmul_analysis.csv        (CSV 格式指标)

指标提取 (采集后)

# 从已保存的报告提取关键指标 (无需重新运行 kernel)
ncu --import matmul_analysis.ncu-rep --print-summary per-kernel

# 导出为 CSV
ncu --import matmul_analysis.ncu-rep --page raw --csv > metrics.csv

📋 AI 分析流程

当用户提供 NCU 数据时，AI 按以下流程处理：

Phase 1: 数据获取 (优先顺序)

情况 A: 用户提供了 .ncu-rep 文件

# 直接导入已有报告
ncu --import <file.ncu-rep> --print-summary per-kernel

情况 B: 用户需要新分析

# 完整采集并持久化
ncu --set full -o <report_name> --target-processes all ./kernel

情况 C: 用户提供了截图/文本

直接提取其中的数值进行分析

Phase 2: 数据持久化

AI 会自动保存分析数据到项目目录：

project_root/
├── ncu_reports/                    # NCU 报告目录
│   ├── matmul_analysis.ncu-rep    # 完整报告
│   ├── matmul_analysis.csv        # CSV 指标
│   └── matmul_analysis.md         # AI 分析报告
└── ...

Phase 3: 自动诊断

使用决策引擎自动分析：

def auto_diagnose(metrics):
    roofline = metrics.get('roofline_ratio', 0)
    dram = metrics.get('dram_throughput', 0)
    l1tex = metrics.get('l1tex_throughput', 0)
    sm_busy = metrics.get('sm_busy', 0)
    occupancy = metrics.get('occupancy', 0)
    
    if roofline < 30:
        if dram > 70:
            return "DRAM_MEMORY_BOUND"
        elif l1tex > 80 and dram < 30:
            return "L1_PRESSURE_BOUND"
        else:
            return "LATENCY_BOUND"
    elif roofline > 60:
        if sm_busy > 80:
            return "COMPUTE_BOUND"
        else:
            return "OCCUPANCY_BOUND"
    else:
        return "MIXED_BOUND"

📊 输出模板

# NCU 性能分析报告

## 📁 报告信息
- **Kernel**: {kernel_name}
- **采集时间**: {timestamp}
- **报告文件**: {report_file}
- **原始数据**: {csv_file}

## 📈 执行摘要

| 项目 | 数值 |
|------|------|
| **主要瓶颈** | {bottleneck_type} |
| **置信度** | {confidence} |
| **性能** | {performance} GFLOPS |
| **优化潜力** | {potential}x |

## 📊 关键指标

### 性能指标
| 指标 | 数值 | 健康阈值 | 状态 |
|------|------|----------|------|
| Roofline 性能比 | {roofline}% | > 60% | {status} |
| SM Busy | {sm_busy}% | > 70% | {status} |
| Occupancy | {occupancy}% | > 50% | {status} |

### 内存指标
| 指标 | 数值 | 健康阈值 | 状态 |
|------|------|----------|------|
| DRAM Throughput | {dram}% | < 50% | {status} |
| L1/TEX Throughput | {l1tex}% | < 80% | {status} |
| L2 Throughput | {l2}% | < 80% | {status} |

## 🔍 诊断详情

**瓶颈类型**: {bottleneck_type}

**判断依据**:
- {reason_1}
- {reason_2}

## 💡 优化建议

### 高优先级
{high_priority_suggestions}

## 🛠️ 下一步操作

### 建议的 NCU 命令
```bash
# 优化后重新采集
ncu --set full -o {report_name}_optimized --target-processes all ./kernel_optimized

验证清单

[ ] 实施建议的优化
[ ] 重新运行 NCU 采集
[ ] 对比优化前后数据


---

## 🔧 工具使用说明

### 完整采集 (推荐)

```bash
# 采集所有指标并保存
ncu --set full -o my_analysis --target-processes all ./kernel

# 参数说明:
# --set full          # 采集完整指标集
# -o my_analysis      # 输出文件名 (生成 my_analysis.ncu-rep)
# --target-processes all  # 监控所有进程

增量分析 (已有报告)

# 从已有报告提取特定指标
ncu --import my_analysis.ncu-rep --print-summary per-kernel

# 导出为 CSV 便于分析
ncu --import my_analysis.ncu-rep --page raw --csv > metrics.csv

自动化脚本

使用提供的自动化脚本：

cd examples/

# 全自动分析
./auto_profile.sh ./kernel report_name

# Python 分析器
python ncu_analyzer.py --import report_name.ncu-rep

📖 诊断规则详解

DRAM_MEMORY_BOUND

IF dram_throughput > 70% AND roofline < 30%:
    诊断: DRAM_MEMORY_BOUND (置信度: HIGH)
    
    优化策略:
    1. Block Tiling (共享内存缓存)
    2. Vectorized Load (float4)
    3. Prefetching (数据预取)

L1_PRESSURE_BOUND

IF l1tex_throughput > 80% AND dram_throughput < 30%:
    诊断: L1_PRESSURE_BOUND (置信度: HIGH)
    
    优化策略:
    1. Shared Memory Padding
    2. Data Transpose
    3. Fragment Caching

LATENCY_BOUND

IF sm_busy < 50% AND occupancy > 60%:
    诊断: LATENCY_BOUND (置信度: HIGH)
    
    优化策略:
    1. Double Buffering
    2. Instruction-level Parallelism
    3. Loop Unrolling

COMPUTE_BOUND

IF roofline > 60% AND sm_busy > 80%:
    诊断: COMPUTE_BOUND (置信度: HIGH)
    
    优化策略:
    1. Use FMA instructions
    2. Reduce precision (FP32 -> FP16/TF32)
    3. Tensor Cores

OCCUPANCY_BOUND

IF occupancy < 30% AND sm_busy > 70%:
    诊断: OCCUPANCY_BOUND (置信度: HIGH)
    
    优化策略:
    1. Reduce register usage
    2. Adjust block size
    3. Use __launch_bounds__

🎯 优化策略速查

| 瓶颈类型 | 立即行动 | 代码示例 | 预期收益 | |---------|---------|---------|---------| | DRAM_MEMORY_BOUND | Block Tiling | __shared__ float As[BM][BK]; | 3-5x | | L1_PRESSURE_BOUND | Padding | As[BM][BK+1] | 1.2-2x | | LATENCY_BOUND | Double Buffer | As[2][BM*BK] | 1.2-1.5x | | COMPUTE_BOUND | FMA | fmaf(a, b, c) | 1.1-1.3x | | OCCUPANCY_BOUND | 调整 block size | __launch_bounds__(256, 2) | 1.2-2x |

📚 完整 NCU 命令参考

报告操作

# 查看摘要
ncu --import report.ncu-rep --print-summary per-kernel

# 查看详情
ncu --import report.ncu-rep --page details

# 导出 CSV
ncu --import report.ncu-rep --page raw --csv > metrics.csv

# 对比两个报告
ncu --diff report1.ncu-rep report2.ncu-rep

⚠️ 常见误区

高 Throughput ≠ 高效率
- Compute + Memory Throughput 都很高但 Roofline 很低 = GPU 在"忙碌地等待"
DRAM Throughput 低可能是好事
- 优化后 DRAM 降低说明数据在缓存中复用
Occupancy 不是越高越好
- 目标是最小足够 occupancy 隐藏延迟

🔗 相关资源

自动化脚本: examples/
GitHub: https://github.com/maxiaosong1124/ncu-cuda-profiling-skill

本 Skill 支持完整的自动化 NCU 性能分析工作流，包含全量采集和持久化存储

Contract & API

Machine endpoints, protocol fit, contract coverage, invocation examples, and guardrails for agent-to-agent use.

MissingGITHUB OPENCLEW

Endpoints

Dossier API Snapshot API Contract API Trust API

Contract coverage

Status

missing

Auth

None

Streaming

Data region

Unspecified

Protocol support

OpenClaw: self-declared

Requires: none

Forbidden: none

Guardrails

Operational confidence: low

No positive guardrails captured.

Invocation examples

curl -s "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/snapshot"

curl -s "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/contract"

curl -s "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/trust"

Reliability & Benchmarks

Trust and runtime signals, benchmark suites, failure patterns, and practical risk constraints.

Missingruntime-metrics

Trust signals

Handshake

UNKNOWN

Confidence

unknown

Attempts 30d

unknown

Fallback rate

unknown

Runtime metrics

Observed P50

unknown

Observed P95

unknown

Rate limit

unknown

Estimated cost

unknown

Do not use if

Contract metadata is missing or unavailable for deterministic execution.

No benchmark suites or observed failure patterns are available.

Media & Demo

Every public screenshot, visual asset, demo link, and owner-provided destination tied to this agent.

Missingno-media

No screenshots, media assets, or demo links are available.

Related Agents

Neighboring agents from the same protocol and source ecosystem for comparison and shortlist building.

Self-declaredprotocol-neighbors

GITHUB_REPOSactivepieces

Rank

AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents

Traction

No public download signal

Freshness

Updated 2d ago

OPENCLAW

GITHUB_REPOScherry-studio

Rank

AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW

GITHUB_REPOSAionUi

Rank

Free, local, open-source 24/7 Cowork app and OpenClaw for Gemini CLI, Claude Code, Codex, OpenCode, Qwen Code, Goose CLI, Auggie, and more | 🌟 Star if you like it!

Traction

No public download signal

Freshness

Updated 6d ago

MCPOPENCLAW

GITHUB_REPOSCopilotKit

Rank

The Frontend for Agents & Generative UI. React + Angular

Traction

No public download signal

Freshness

Updated 23d ago

OPENCLAW

Machine Appendix

Contract JSON

{
  "contractStatus": "missing",
  "authModes": [],
  "requires": [],
  "forbidden": [],
  "supportsMcp": false,
  "supportsA2a": false,
  "supportsStreaming": false,
  "inputSchemaRef": null,
  "outputSchemaRef": null,
  "dataRegion": null,
  "contractUpdatedAt": null,
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Invocation Guide

{
  "preferredApi": {
    "snapshotUrl": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/snapshot",
    "contractUrl": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/contract",
    "trustUrl": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/trust"
  },
  "curlExamples": [
    "curl -s \"https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/snapshot\"",
    "curl -s \"https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/contract\"",
    "curl -s \"https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/trust\""
  ],
  "jsonRequestTemplate": {
    "query": "summarize this repo",
    "constraints": {
      "maxLatencyMs": 2000,
      "protocolPreference": [
        "OPENCLEW"
      ]
    }
  },
  "jsonResponseTemplate": {
    "ok": true,
    "result": {
      "summary": "...",
      "confidence": 0.9
    },
    "meta": {
      "source": "GITHUB_OPENCLEW",
      "generatedAt": "2026-04-17T04:03:26.113Z"
    }
  },
  "retryPolicy": {
    "maxAttempts": 3,
    "backoffMs": [
      500,
      1500,
      3500
    ],
    "retryableConditions": [
      "HTTP_429",
      "HTTP_503",
      "NETWORK_TIMEOUT"
    ]
  }
}

Trust JSON

{
  "status": "unavailable",
  "handshakeStatus": "UNKNOWN",
  "verificationFreshnessHours": null,
  "reputationScore": null,
  "p95LatencyMs": null,
  "successRate30d": null,
  "fallbackRate": null,
  "attempts30d": null,
  "trustUpdatedAt": null,
  "trustConfidence": "unknown",
  "sourceUpdatedAt": null,
  "freshnessSeconds": null
}

Capability Matrix

{
  "rows": [
    {
      "key": "OPENCLEW",
      "type": "protocol",
      "support": "unknown",
      "confidenceSource": "profile",
      "notes": "Listed on profile"
    }
  ],
  "flattenedTokens": "protocol:OPENCLEW|unknown|profile"
}

Facts JSON

[
  {
    "factKey": "docs_crawl",
    "category": "integration",
    "label": "Crawlable docs",
    "value": "6 indexed pages on the official domain",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  },
  {
    "factKey": "vendor",
    "category": "vendor",
    "label": "Vendor",
    "value": "Maxiaosong1124",
    "href": "https://github.com/maxiaosong1124/ncu-cuda-profiling-skill",
    "sourceUrl": "https://github.com/maxiaosong1124/ncu-cuda-profiling-skill",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T02:16:10.379Z",
    "isPublic": true
  },
  {
    "factKey": "protocols",
    "category": "compatibility",
    "label": "Protocol compatibility",
    "value": "OpenClaw",
    "href": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/contract",
    "sourceUrl": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/contract",
    "sourceType": "contract",
    "confidence": "medium",
    "observedAt": "2026-04-15T02:16:10.379Z",
    "isPublic": true
  },
  {
    "factKey": "traction",
    "category": "adoption",
    "label": "Adoption signal",
    "value": "85 GitHub stars",
    "href": "https://github.com/maxiaosong1124/ncu-cuda-profiling-skill",
    "sourceUrl": "https://github.com/maxiaosong1124/ncu-cuda-profiling-skill",
    "sourceType": "profile",
    "confidence": "medium",
    "observedAt": "2026-04-15T02:16:10.379Z",
    "isPublic": true
  },
  {
    "factKey": "handshake_status",
    "category": "security",
    "label": "Handshake status",
    "value": "UNKNOWN",
    "href": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/trust",
    "sourceUrl": "https://xpersona.co/api/v1/agents/maxiaosong1124-ncu-cuda-profiling-skill/trust",
    "sourceType": "trust",
    "confidence": "medium",
    "observedAt": null,
    "isPublic": true
  }
]

Change Events JSON

[
  {
    "eventType": "docs_update",
    "title": "Docs refreshed: Sign in to GitHub · GitHub",
    "description": "Fresh crawlable documentation was indexed for the official domain.",
    "href": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceUrl": "https://github.com/login?return_to=https%3A%2F%2Fgithub.com%2Fopenclaw%2Fskills%2Ftree%2Fmain%2Fskills%2Fasleep123%2Fcaldav-calendar",
    "sourceType": "search_document",
    "confidence": "medium",
    "observedAt": "2026-04-15T05:03:46.393Z",
    "isPublic": true
  }
]

ncu-cuda-profiling answer-first brief

Executive Summary

Evidence Ledger

Release & Crawl Timeline

Artifacts Archive

Docs & README

name: ncu-cuda-profiling description: Automated NCU (Nsight Compute) profiling workflow with full metrics collection and persistent storage version: 1.0.0 author: maxiaosong1124 tags: [cuda, profiling, ncu, performance, optimization]

NCU CUDA 自动化性能分析

🚀 快速开始

推荐: 一键完整采集

指标提取 (采集后)

📋 AI 分析流程

Phase 1: 数据获取 (优先顺序)

Phase 2: 数据持久化

Phase 3: 自动诊断

📊 输出模板

验证清单

增量分析 (已有报告)

自动化脚本

📖 诊断规则详解

DRAM_MEMORY_BOUND

L1_PRESSURE_BOUND

LATENCY_BOUND

COMPUTE_BOUND

OCCUPANCY_BOUND

🎯 优化策略速查

📚 完整 NCU 命令参考

推荐采集命令

报告操作

⚠️ 常见误区

🔗 相关资源

Contract & API

Reliability & Benchmarks

Media & Demo

Related Agents