Fundamentals
Table of Contents¶
- Why Skills Exist
- A Basic Introduction to Skills
- Deployment Locations and Use Cases
- Advanced Structure: Wrapper Scripts and Supporting Docs
- Progressive Disclosure: An Elegant Answer to AI Context Limits
1. Why Skills Exist¶
1.1 The Context Problem of AI Coding Assistants¶
Large language models (LLMs) are excellent at code generation and review, but they have one fundamental limitation: the context window is a shared and finite resource. Every instruction, document, and message in the conversation consumes part of that window. As projects grow, teams need to pass more standards, workflows, and templates to the AI, and the strategy of "put everything into CLAUDE.md" quickly stops working:
- Once
CLAUDE.mdgrows past 200 lines, instruction-following starts to degrade - Different tasks need different domain knowledge, but loading everything wastes tokens
- Team members keep their own prompt snippets, so knowledge does not accumulate or get reused
1.2 From Prompt to Reusable Knowledge Unit¶
Skills exist to solve this exact problem: they package reusable domain knowledge and workflows into separate, on-demand modules. They follow the Agent Skills Open Standard and are not tied to a single platform.
Compared with traditional prompt engineering, the key differences are:
| Dimension | Traditional Prompt | Skill |
|---|---|---|
| Lifecycle | One-time use | Persistent, version-controlled, shareable across a team |
| Loading model | Fully loaded every time | Loaded on demand, so unused knowledge costs no context |
| Testability | Cannot be verified | Can have contract tests and regression tests |
| Scope of reuse | Personal clipboard | Enterprise → personal → project deployment layers |
| Iteration model | Improved from memory | Reviewed, committed, and validated like code |
2. A Basic Introduction to Skills¶
2.1 What a Skill Looks Like¶
In its simplest form, a skill only needs one file: SKILL.md.
SKILL.md has two parts:
---
name: my-skill
description: Describe what the skill does and when it should trigger. Claude uses this text to decide whether to auto-load it.
---
# My Skill
This is the instruction body Claude follows when the skill runs.
- YAML frontmatter (between
---) tells Claude when to use the skill - Markdown body tells Claude how to do the work
2.2 Two Classification Dimensions¶
By content type:
| Type | Purpose | Typical Scenarios | Invocation Control |
|---|---|---|---|
| Knowledge skill | Transfer domain knowledge, coding standards, and review criteria | Code review, security checks, performance guidance | Default: Claude decides whether to load it |
| Task skill | Define concrete steps for workflows with side effects | Commit code, create PRs, deploy releases | Recommended: disable-model-invocation: true |
By use domain (from Anthropic's official guide):
| Category | Purpose | Typical Skills | Key Technique |
|---|---|---|---|
| Docs & asset creation | Produce consistent, high-quality docs, decks, designs, or code | frontend-design, docx, pptx, xlsx | Embedded style guides, template structures, quality checklists |
| Workflow automation | Multi-step processes that need a consistent method | skill-creator, git-commit | Step-by-step workflows, validation gates, iterative refinement loops |
| MCP enhancement | Add workflow guidance on top of MCP tool access | sentry-code-review | Coordinating multiple MCP calls, embedding domain expertise, handling errors |
These two dimensions are orthogonal and complementary: knowledge/task describes the nature of the content, while the three domains describe the kind of problem it solves.
2.3 How a Skill Is Invoked¶
- User-triggered: type
/skill-namein Claude Code - Auto-loaded by Claude: Claude checks whether the current task matches the
descriptionin frontmatter - Arguments: supports
$ARGUMENTS(all arguments),$0, and$1(positional arguments)
2.4 Frontmatter Field Reference¶
Required fields:
| Field | Purpose | Example |
|---|---|---|
name | Display name and / command name. Must be kebab-case. No spaces, uppercase letters, or underscores | go-code-reviewer |
description | The most important field because Claude uses it for auto-triggering. Maximum 1024 characters. XML tags (< >) are not allowed | Review Go code with a defect-first approach... |
Optional fields:
| Field | Purpose | Example |
|---|---|---|
disable-model-invocation | Set to true to prevent Claude from auto-triggering the skill | Useful for side-effecting operations such as /commit or /deploy |
allowed-tools | Limits which tools can be used while the skill is active | Read, Grep, Glob, Bash |
context | Set to fork to run in an isolated sub-agent | Useful for research-style tasks that need independent context |
license | Open-source license | MIT, Apache-2.0 |
compatibility | Environment requirements (1-500 chars) | Requires Node.js 18+, network access for API calls |
metadata | Custom key-value pairs such as author, version, or MCP dependency | author: John, version: 1.0.0, mcp-server: linear |
Security restrictions: - XML angle brackets (< >) are forbidden in description because frontmatter is injected into the system prompt, and malicious content could turn into instructions - Skill names cannot include claude or anthropic (reserved prefixes) - The skill folder name must match name and use kebab-case
3. Deployment Locations and Use Cases¶
Skills support four deployment levels. Higher-priority levels override lower-priority ones:
| Priority | Level | Path | Best Use |
|---|---|---|---|
| 1 (highest) | Enterprise | Distributed through managed settings | Organization-wide security review standards, compliance checks |
| 2 | Personal | ~/.claude/skills/<name>/SKILL.md | Personal writing style, common workflows |
| 3 | Project | .claude/skills/<name>/SKILL.md | Project-specific coding standards, CI setup |
| 4 (lowest) | Plugin | <plugin>/skills/<name>/SKILL.md | General capabilities reused across projects |
How to choose:
- Security review, coding standards → enterprise deployment so everyone follows the same rules
- Personal productivity tools (such as
tech-doc-writer,google-search) → personal deployment - Project-specific workflows (such as a particular CI workflow or PR template) → project deployment committed to Git
- Shared general-purpose tools for a team → plugin distribution
3.1 When You Do Not Need a Skill¶
Not all knowledge should be packaged as a skill. In the following cases, another mechanism is a better fit:
| Scenario | Use This Instead of a Skill | Why |
|---|---|---|
| Fewer than 50 lines and needed in every session | CLAUDE.md | Full loading is simpler, and on-demand loading adds little value |
Rules only apply to certain file types (for example, .proto coding standards) | .claude/rules/ with paths | Rules can match files precisely with globs, unlike skill triggering via description |
| A step must run 100% of the time and cannot be "forgotten" by AI (for example, run lint before commit) | Hook | Hooks run deterministically with zero context cost; skills are prompts and may be skipped |
| You only need external API access (for example, query GitHub issues or send email) | MCP server | MCP provides tools; skills provide knowledge and workflow. Do not reimplement API logic in a skill |
| The instruction is truly one-off and will not be reused | Say it directly in the conversation | A temporary instruction is not worth turning into a permanent module |
Rule of thumb: if a piece of knowledge (1) is reused repeatedly, (2) is longer than 50 lines, and (3) is not needed in every session, then it is a good candidate for a skill. If all three conditions are not met, prefer a lighter-weight mechanism.
3.2 Build Your First Skill: Iterate First, Extract Later¶
Use "check Go code formatting" as an example to show the full path from a normal conversation to a reusable skill. The process has two stages: first, manual iteration to produce a working draft; then, hardening with skill-creator to bring it to production quality.
Step 1: Solve the same task repeatedly in normal chat
Do not rush to write SKILL.md. Start by simply asking in Claude Code:
> Help me check whether the current project has any improperly formatted Go files, and fix them if it does
Claude will run gofmt -l ., find problem files, and fix them. But you may notice the behavior is not ideal. For example, it may edit files under vendor/, fail to prefer goimports, or skip a second verification pass.
So you keep correcting it in chat:
> Do not touch the vendor directory. If goimports is available, prefer it. After fixing, run it again to verify
Repeat this two or three times until the workflow feels right. At that point, you already have a validated process in your head.
Step 2: Extract the successful method into SKILL.md
Now create the skill. You are no longer inventing instructions from scratch. You are turning a proven prompt into a reusable artifact:
Write this to ~/.claude/skills/fmt-check/SKILL.md:
---
name: fmt-check
description: >
Check and fix Go code formatting issues. Triggers when the user asks
to format code, check formatting, or fix style issues in Go files.
---
# Format Check
## Workflow
1. Run `gofmt -l .` to list files with formatting issues.
2. If no files found, report "All files properly formatted."
3. If files found, run `gofmt -w <file>` for each file.
4. Run `gofmt -l .` again to verify all issues are fixed.
5. Report which files were modified.
## Rules
- Never modify files outside the current Go module.
- If `goimports` is available, prefer it over `gofmt` (it also handles imports).
Every rule in this SKILL.md comes from the real corrections in Step 1. "Do not touch vendor" becomes a boundary rule. "Prefer goimports" becomes a tool-selection rule. "Verify after fixing" becomes Step 4 in the workflow.
Step 3: Harden with skill-creator
A manually written SKILL.md works, but it has blind spots:
- The description may not trigger for all valid phrasings (e.g., a user says "check go formatting" instead of "format code")
- Two or three rounds of conversation corrections cover only a limited set of edge cases
- There is no objective data showing whether the skill actually improves AI output
Anthropic's official skill-creator automates the hardest parts of this process — the parts that manual creation most easily skips:
- Interview-driven gap analysis — systematically asks about trigger scenarios, expected output format, edge cases, and test cases, reducing omissions caused by limited experience
- Automatic eval generation and execution — creates test scenarios and runs with-skill vs without-skill comparisons in parallel, replacing guesswork with data
- Description optimization loop — generates 20 should-trigger and should-not-trigger queries, then runs an optimization loop with train/test split. This is the hardest part to do manually and has the biggest impact on whether the skill actually gets used
- Visual eval viewer — a browser-based UI for reviewing outputs and benchmark comparisons, closing the feedback loop
Continuing with the fmt-check example, invoke it directly in Claude Code:
skill-creator might discover that the description misses the common phrasing "check go formatting" (causing trigger failures), that multi-module monorepos are an unhandled edge case, and that "format my Python code" incorrectly triggers this skill. These issues are very hard to catch through manual creation alone.
For the full three-dimensional evaluation methodology (trigger accuracy, task performance, token cost-effectiveness) and real case studies, see Chapter 10.
Step 4: Use it and keep iterating
In Claude Code, you can now: - Type /fmt-check to invoke it directly - Or simply say "help me check the code formatting", and Claude will auto-load the skill based on the description
After using it a few times, you may discover new improvements: - Need support for goimports-reviser → add it to the Rules section - Need the same workflow in CI → move the skill from ~/.claude/skills/ to the project's .claude/skills/ and commit it to Git - After significant changes, re-run skill-creator evaluation to verify the changes did not introduce regressions
When is manual creation enough?
- Personal use, simple workflow, low stakes — Steps 1-2 are sufficient
- Shared with a team, broad trigger surface, complex edge cases — run it through skill-creator. Even if you choose to create manually, at minimum do two things: run a quick eval with 2-3 test cases (exposes hidden issues) and run one round of description optimization (small effort, high impact). An under-triggering skill is a dead skill
This is the core creation path for a skill: iterate in conversation → extract into a skill → harden with skill-creator → keep improving through real use. The full lifecycle also includes quantitative evaluation (Chapter 10) and workflow integration (Chapter 12): build → evaluate → improve → integrate → monitor. The later chapters cover best practices for each step.
3.3 Distribution Channels: From Local to API¶
Beyond the four local deployment levels above (§3), skills can also be distributed more broadly:
| Channel | Best Use | Notes |
|---|---|---|
| Upload to Claude.ai | Individual users | Go to Settings > Capabilities > Skills and upload a zip file |
| Claude Code directory | Developers | Put the skill under ~/.claude/skills/ or .claude/skills/ |
| Organization-wide deployment | Enterprise teams | Centrally distributed by admins, with auto-update and centralized management (launched in Dec 2025) |
| Skills API | Programmatic integration | Use the /v1/skills endpoint and inject via the container.skills parameter; supported by the Claude Agent SDK |
| GitHub hosting | Community sharing | Public repository + README (note: do not put README.md inside the skill folder; keep it at repo root) |
API vs interactive use: use Claude.ai or Claude Code for day-to-day development and manual testing; use the API for production deployment, automation pipelines, or agent systems. The Skills API requires the Code Execution Tool beta.
Skills follow the Agent Skills Open Standard, which gives them cross-platform portability by default. The same skill can run on Claude.ai, Claude Code, and the API without modification.
4. Advanced Structure: Wrapper Scripts and Supporting Docs¶
Once a skill becomes too complex for a single file, you need a richer directory structure. Using the go-ci-workflow skill (rated 9.5/10) as an example:
go-ci-workflow/
├── SKILL.md # Entry point: 236-line operating framework
├── agents/
│ └── openai.yaml # UI metadata
├── scripts/
│ ├── discover_ci_needs.sh # Repo shape discovery script
│ ├── run_regression.sh # Regression test runner
│ └── tests/
│ ├── COVERAGE.md # Test coverage matrix
│ ├── test_skill_contract.py # 44 contract tests
│ ├── test_golden_scenarios.py # 17 golden-scenario tests
│ └── golden/ # 8 golden scenario JSON files
│ ├── 001_single_module_service.json
│ ├── ...
│ └── 008_service_containers_integration.json
└── references/
├── workflow-quality-guide.md # 16-section CI pattern guide
├── golden-examples.md # 4 complete workflow YAML examples
├── github-actions-advanced-patterns.md # 9 sections of advanced patterns
├── repository-shapes.md # Modeling 6 repository shapes
├── pr-checklist.md # PR review checklist
└── fallback-and-scaffolding.md # Degradation strategy
4.1 What Each Directory Is For¶
| Directory | Purpose | When It Is Loaded |
|---|---|---|
SKILL.md | Operating framework and decision flow | Loaded when the skill triggers |
references/ | Detailed domain knowledge, split by topic | Loaded on demand; irrelevant files stay unloaded |
scripts/ | Deterministic logic such as discovery or validation scripts | Called during execution, without loading into context |
assets/ | Output templates, images, and other resources | Used for output generation, not loaded into context |
4.2 Why Wrapping Logic in Scripts Matters¶
Put deterministic logic into scripts instead of prompt text for three reasons:
- Lower token usage: the output of a script is usually much shorter than the script code itself
- Determinism: the same input always gives the same output, without depending on LLM reasoning
- Testability: scripts can be tested independently in CI
For example, discover_ci_needs.sh scans a repository and outputs structured TSV data. Claude makes decisions based on that deterministic output instead of guessing the repo structure.
5. Progressive Disclosure: An Elegant Answer to AI Context Limits¶
5.1 Core Idea¶
Progressive disclosure is the most important design pattern in high-quality skills. It splits knowledge into three layers and loads them one step at a time, only when needed:
┌─────────────────────────────────────────┐
│ L1: Metadata (name + description) │ ← Always in context (~50 words)
├─────────────────────────────────────────┤
│ L2: SKILL.md body │ ← Loaded when the skill triggers (<500 lines)
├─────────────────────────────────────────┤
│ L3: references/ + scripts/ │ ← Loaded on demand (no hard limit)
└─────────────────────────────────────────┘
Key constraint: keep the main body of SKILL.md under 500 lines. Anything beyond that should be split into references/.
5.2 Selective Loading Table¶
High-quality skills do not just list reference files. They explain when each file should be loaded:
## Load References Selectively
- `references/workflow-quality-guide.md`
Baseline job templates and Go/GitHub Actions patterns.
- `references/repository-shapes.md`
Use for monorepo, multi-module, library decisions.
- `references/github-actions-advanced-patterns.md`
Use for permissions, fork PR security, service containers.
This means a single-module service only loads workflow-quality-guide.md, while a monorepo also loads repository-shapes.md. Each conversation only loads the knowledge it actually needs.
5.3 Add a Table of Contents to Long Files¶
Any reference file longer than 100 lines should have a table of contents at the top, so Claude can quickly jump to the relevant section:
# Go CI Workflow Quality Guide
## Table of Contents
1. [Job Set](#1-job-set)
2. [Trigger Strategy](#2-trigger-strategy)
...
16. [Validation Checklist](#16-validation-checklist)