Skip to content

Fundamentals

Table of Contents

  1. Why Skills Exist
  2. A Basic Introduction to Skills
  3. Deployment Locations and Use Cases
  4. Advanced Structure: Wrapper Scripts and Supporting Docs
  5. Progressive Disclosure: An Elegant Answer to AI Context Limits

1. Why Skills Exist

1.1 The Context Problem of AI Coding Assistants

Large language models (LLMs) are excellent at code generation and review, but they have one fundamental limitation: the context window is a shared and finite resource. Every instruction, document, and message in the conversation consumes part of that window. As projects grow, teams need to pass more standards, workflows, and templates to the AI, and the strategy of "put everything into CLAUDE.md" quickly stops working:

  • Once CLAUDE.md grows past 200 lines, instruction-following starts to degrade
  • Different tasks need different domain knowledge, but loading everything wastes tokens
  • Team members keep their own prompt snippets, so knowledge does not accumulate or get reused

1.2 From Prompt to Reusable Knowledge Unit

Skills exist to solve this exact problem: they package reusable domain knowledge and workflows into separate, on-demand modules. They follow the Agent Skills Open Standard and are not tied to a single platform.

Compared with traditional prompt engineering, the key differences are:

Dimension Traditional Prompt Skill
Lifecycle One-time use Persistent, version-controlled, shareable across a team
Loading model Fully loaded every time Loaded on demand, so unused knowledge costs no context
Testability Cannot be verified Can have contract tests and regression tests
Scope of reuse Personal clipboard Enterprise → personal → project deployment layers
Iteration model Improved from memory Reviewed, committed, and validated like code

2. A Basic Introduction to Skills

2.1 What a Skill Looks Like

In its simplest form, a skill only needs one file: SKILL.md.

my-skill/
└── SKILL.md

SKILL.md has two parts:

---
name: my-skill
description: Describe what the skill does and when it should trigger. Claude uses this text to decide whether to auto-load it.
---

# My Skill

This is the instruction body Claude follows when the skill runs.
  • YAML frontmatter (between ---) tells Claude when to use the skill
  • Markdown body tells Claude how to do the work

2.2 Two Classification Dimensions

By content type:

Type Purpose Typical Scenarios Invocation Control
Knowledge skill Transfer domain knowledge, coding standards, and review criteria Code review, security checks, performance guidance Default: Claude decides whether to load it
Task skill Define concrete steps for workflows with side effects Commit code, create PRs, deploy releases Recommended: disable-model-invocation: true

By use domain (from Anthropic's official guide):

Category Purpose Typical Skills Key Technique
Docs & asset creation Produce consistent, high-quality docs, decks, designs, or code frontend-design, docx, pptx, xlsx Embedded style guides, template structures, quality checklists
Workflow automation Multi-step processes that need a consistent method skill-creator, git-commit Step-by-step workflows, validation gates, iterative refinement loops
MCP enhancement Add workflow guidance on top of MCP tool access sentry-code-review Coordinating multiple MCP calls, embedding domain expertise, handling errors

These two dimensions are orthogonal and complementary: knowledge/task describes the nature of the content, while the three domains describe the kind of problem it solves.

2.3 How a Skill Is Invoked

  • User-triggered: type /skill-name in Claude Code
  • Auto-loaded by Claude: Claude checks whether the current task matches the description in frontmatter
  • Arguments: supports $ARGUMENTS (all arguments), $0, and $1 (positional arguments)

2.4 Frontmatter Field Reference

Required fields:

Field Purpose Example
name Display name and / command name. Must be kebab-case. No spaces, uppercase letters, or underscores go-code-reviewer
description The most important field because Claude uses it for auto-triggering. Maximum 1024 characters. XML tags (< >) are not allowed Review Go code with a defect-first approach...

Optional fields:

Field Purpose Example
disable-model-invocation Set to true to prevent Claude from auto-triggering the skill Useful for side-effecting operations such as /commit or /deploy
allowed-tools Limits which tools can be used while the skill is active Read, Grep, Glob, Bash
context Set to fork to run in an isolated sub-agent Useful for research-style tasks that need independent context
license Open-source license MIT, Apache-2.0
compatibility Environment requirements (1-500 chars) Requires Node.js 18+, network access for API calls
metadata Custom key-value pairs such as author, version, or MCP dependency author: John, version: 1.0.0, mcp-server: linear

Security restrictions: - XML angle brackets (< >) are forbidden in description because frontmatter is injected into the system prompt, and malicious content could turn into instructions - Skill names cannot include claude or anthropic (reserved prefixes) - The skill folder name must match name and use kebab-case


3. Deployment Locations and Use Cases

Skills support four deployment levels. Higher-priority levels override lower-priority ones:

Priority Level Path Best Use
1 (highest) Enterprise Distributed through managed settings Organization-wide security review standards, compliance checks
2 Personal ~/.claude/skills/<name>/SKILL.md Personal writing style, common workflows
3 Project .claude/skills/<name>/SKILL.md Project-specific coding standards, CI setup
4 (lowest) Plugin <plugin>/skills/<name>/SKILL.md General capabilities reused across projects

How to choose:

  • Security review, coding standards → enterprise deployment so everyone follows the same rules
  • Personal productivity tools (such as tech-doc-writer, google-search) → personal deployment
  • Project-specific workflows (such as a particular CI workflow or PR template) → project deployment committed to Git
  • Shared general-purpose tools for a team → plugin distribution

3.1 When You Do Not Need a Skill

Not all knowledge should be packaged as a skill. In the following cases, another mechanism is a better fit:

Scenario Use This Instead of a Skill Why
Fewer than 50 lines and needed in every session CLAUDE.md Full loading is simpler, and on-demand loading adds little value
Rules only apply to certain file types (for example, .proto coding standards) .claude/rules/ with paths Rules can match files precisely with globs, unlike skill triggering via description
A step must run 100% of the time and cannot be "forgotten" by AI (for example, run lint before commit) Hook Hooks run deterministically with zero context cost; skills are prompts and may be skipped
You only need external API access (for example, query GitHub issues or send email) MCP server MCP provides tools; skills provide knowledge and workflow. Do not reimplement API logic in a skill
The instruction is truly one-off and will not be reused Say it directly in the conversation A temporary instruction is not worth turning into a permanent module

Rule of thumb: if a piece of knowledge (1) is reused repeatedly, (2) is longer than 50 lines, and (3) is not needed in every session, then it is a good candidate for a skill. If all three conditions are not met, prefer a lighter-weight mechanism.

3.2 Build Your First Skill: Iterate First, Extract Later

Use "check Go code formatting" as an example to show the full path from a normal conversation to a reusable skill. The process has two stages: first, manual iteration to produce a working draft; then, hardening with skill-creator to bring it to production quality.

Step 1: Solve the same task repeatedly in normal chat

Do not rush to write SKILL.md. Start by simply asking in Claude Code:

> Help me check whether the current project has any improperly formatted Go files, and fix them if it does

Claude will run gofmt -l ., find problem files, and fix them. But you may notice the behavior is not ideal. For example, it may edit files under vendor/, fail to prefer goimports, or skip a second verification pass.

So you keep correcting it in chat:

> Do not touch the vendor directory. If goimports is available, prefer it. After fixing, run it again to verify

Repeat this two or three times until the workflow feels right. At that point, you already have a validated process in your head.

Step 2: Extract the successful method into SKILL.md

Now create the skill. You are no longer inventing instructions from scratch. You are turning a proven prompt into a reusable artifact:

mkdir -p ~/.claude/skills/fmt-check

Write this to ~/.claude/skills/fmt-check/SKILL.md:

---
name: fmt-check
description: >
  Check and fix Go code formatting issues. Triggers when the user asks
  to format code, check formatting, or fix style issues in Go files.
---

# Format Check

## Workflow

1. Run `gofmt -l .` to list files with formatting issues.
2. If no files found, report "All files properly formatted."
3. If files found, run `gofmt -w <file>` for each file.
4. Run `gofmt -l .` again to verify all issues are fixed.
5. Report which files were modified.

## Rules

- Never modify files outside the current Go module.
- If `goimports` is available, prefer it over `gofmt` (it also handles imports).

Every rule in this SKILL.md comes from the real corrections in Step 1. "Do not touch vendor" becomes a boundary rule. "Prefer goimports" becomes a tool-selection rule. "Verify after fixing" becomes Step 4 in the workflow.

Step 3: Harden with skill-creator

A manually written SKILL.md works, but it has blind spots:

  • The description may not trigger for all valid phrasings (e.g., a user says "check go formatting" instead of "format code")
  • Two or three rounds of conversation corrections cover only a limited set of edge cases
  • There is no objective data showing whether the skill actually improves AI output

Anthropic's official skill-creator automates the hardest parts of this process — the parts that manual creation most easily skips:

  1. Interview-driven gap analysis — systematically asks about trigger scenarios, expected output format, edge cases, and test cases, reducing omissions caused by limited experience
  2. Automatic eval generation and execution — creates test scenarios and runs with-skill vs without-skill comparisons in parallel, replacing guesswork with data
  3. Description optimization loop — generates 20 should-trigger and should-not-trigger queries, then runs an optimization loop with train/test split. This is the hardest part to do manually and has the biggest impact on whether the skill actually gets used
  4. Visual eval viewer — a browser-based UI for reviewing outputs and benchmark comparisons, closing the feedback loop

Continuing with the fmt-check example, invoke it directly in Claude Code:

> Use skill-creator to evaluate and improve the fmt-check skill

skill-creator might discover that the description misses the common phrasing "check go formatting" (causing trigger failures), that multi-module monorepos are an unhandled edge case, and that "format my Python code" incorrectly triggers this skill. These issues are very hard to catch through manual creation alone.

For the full three-dimensional evaluation methodology (trigger accuracy, task performance, token cost-effectiveness) and real case studies, see Chapter 10.

Step 4: Use it and keep iterating

In Claude Code, you can now: - Type /fmt-check to invoke it directly - Or simply say "help me check the code formatting", and Claude will auto-load the skill based on the description

After using it a few times, you may discover new improvements: - Need support for goimports-reviser → add it to the Rules section - Need the same workflow in CI → move the skill from ~/.claude/skills/ to the project's .claude/skills/ and commit it to Git - After significant changes, re-run skill-creator evaluation to verify the changes did not introduce regressions

When is manual creation enough?

  • Personal use, simple workflow, low stakes — Steps 1-2 are sufficient
  • Shared with a team, broad trigger surface, complex edge cases — run it through skill-creator. Even if you choose to create manually, at minimum do two things: run a quick eval with 2-3 test cases (exposes hidden issues) and run one round of description optimization (small effort, high impact). An under-triggering skill is a dead skill

This is the core creation path for a skill: iterate in conversation → extract into a skill → harden with skill-creator → keep improving through real use. The full lifecycle also includes quantitative evaluation (Chapter 10) and workflow integration (Chapter 12): build → evaluate → improve → integrate → monitor. The later chapters cover best practices for each step.

3.3 Distribution Channels: From Local to API

Beyond the four local deployment levels above (§3), skills can also be distributed more broadly:

Channel Best Use Notes
Upload to Claude.ai Individual users Go to Settings > Capabilities > Skills and upload a zip file
Claude Code directory Developers Put the skill under ~/.claude/skills/ or .claude/skills/
Organization-wide deployment Enterprise teams Centrally distributed by admins, with auto-update and centralized management (launched in Dec 2025)
Skills API Programmatic integration Use the /v1/skills endpoint and inject via the container.skills parameter; supported by the Claude Agent SDK
GitHub hosting Community sharing Public repository + README (note: do not put README.md inside the skill folder; keep it at repo root)

API vs interactive use: use Claude.ai or Claude Code for day-to-day development and manual testing; use the API for production deployment, automation pipelines, or agent systems. The Skills API requires the Code Execution Tool beta.

Skills follow the Agent Skills Open Standard, which gives them cross-platform portability by default. The same skill can run on Claude.ai, Claude Code, and the API without modification.


4. Advanced Structure: Wrapper Scripts and Supporting Docs

Once a skill becomes too complex for a single file, you need a richer directory structure. Using the go-ci-workflow skill (rated 9.5/10) as an example:

go-ci-workflow/
├── SKILL.md                           # Entry point: 236-line operating framework
├── agents/
│   └── openai.yaml                    # UI metadata
├── scripts/
│   ├── discover_ci_needs.sh           # Repo shape discovery script
│   ├── run_regression.sh              # Regression test runner
│   └── tests/
│       ├── COVERAGE.md                # Test coverage matrix
│       ├── test_skill_contract.py     # 44 contract tests
│       ├── test_golden_scenarios.py   # 17 golden-scenario tests
│       └── golden/                    # 8 golden scenario JSON files
│           ├── 001_single_module_service.json
│           ├── ...
│           └── 008_service_containers_integration.json
└── references/
    ├── workflow-quality-guide.md           # 16-section CI pattern guide
    ├── golden-examples.md                  # 4 complete workflow YAML examples
    ├── github-actions-advanced-patterns.md # 9 sections of advanced patterns
    ├── repository-shapes.md                # Modeling 6 repository shapes
    ├── pr-checklist.md                     # PR review checklist
    └── fallback-and-scaffolding.md         # Degradation strategy

4.1 What Each Directory Is For

Directory Purpose When It Is Loaded
SKILL.md Operating framework and decision flow Loaded when the skill triggers
references/ Detailed domain knowledge, split by topic Loaded on demand; irrelevant files stay unloaded
scripts/ Deterministic logic such as discovery or validation scripts Called during execution, without loading into context
assets/ Output templates, images, and other resources Used for output generation, not loaded into context

4.2 Why Wrapping Logic in Scripts Matters

Put deterministic logic into scripts instead of prompt text for three reasons:

  1. Lower token usage: the output of a script is usually much shorter than the script code itself
  2. Determinism: the same input always gives the same output, without depending on LLM reasoning
  3. Testability: scripts can be tested independently in CI

For example, discover_ci_needs.sh scans a repository and outputs structured TSV data. Claude makes decisions based on that deterministic output instead of guessing the repo structure.


5. Progressive Disclosure: An Elegant Answer to AI Context Limits

5.1 Core Idea

Progressive disclosure is the most important design pattern in high-quality skills. It splits knowledge into three layers and loads them one step at a time, only when needed:

┌─────────────────────────────────────────┐
│ L1: Metadata (name + description)       │  ← Always in context (~50 words)
├─────────────────────────────────────────┤
│ L2: SKILL.md body                       │  ← Loaded when the skill triggers (<500 lines)
├─────────────────────────────────────────┤
│ L3: references/ + scripts/              │  ← Loaded on demand (no hard limit)
└─────────────────────────────────────────┘

Key constraint: keep the main body of SKILL.md under 500 lines. Anything beyond that should be split into references/.

5.2 Selective Loading Table

High-quality skills do not just list reference files. They explain when each file should be loaded:

## Load References Selectively

- `references/workflow-quality-guide.md`
  Baseline job templates and Go/GitHub Actions patterns.
- `references/repository-shapes.md`
  Use for monorepo, multi-module, library decisions.
- `references/github-actions-advanced-patterns.md`
  Use for permissions, fork PR security, service containers.

This means a single-module service only loads workflow-quality-guide.md, while a monorepo also loads repository-shapes.md. Each conversation only loads the knowledge it actually needs.

5.3 Add a Table of Contents to Long Files

Any reference file longer than 100 lines should have a table of contents at the top, so Claude can quickly jump to the relevant section:

# Go CI Workflow Quality Guide

## Table of Contents

1. [Job Set](#1-job-set)
2. [Trigger Strategy](#2-trigger-strategy)
...
16. [Validation Checklist](#16-validation-checklist)