yt-dlp-downloader is a skill for generating and running yt-dlp download commands. It is suited for single videos, playlists, audio extraction, subtitle downloads, SponsorBlock, resolution limits, and authenticated download scenarios. Its three standout strengths are: probe-first, using format lists and subtitle info to decide command combinations instead of guessing parameters; safe defaults including --no-playlist, retries, output naming, and archive to reduce accidental full-playlist downloads, re-downloads, and runaway commands; and structured execution reports, especially useful for complex combined requests to reuse, review, and adjust.
This evaluation reviews the yt-dlp-downloader skill along two axes: actual task performance and token cost-effectiveness. Three yt-dlp command-generation scenarios of increasing complexity were designed (single video download, audio extraction + subtitles, playlist + resolution + SponsorBlock + subtitles). Each scenario was run with both with-skill and without-skill configurations, for 3 scenarios × 2 configs = 6 independent subagent runs, scored against 40 assertions.
Dimension
With Skill
Without Skill
Delta
Assertion pass rate
40/40 (100%)
18/40 (45.0%)
+55.0 pp
Output Contract structured report
3/3 correct
0/3
Skill-only
Probe decision compliance
3/3 correct
0/3
Skill-only
Safety guard (--no-playlist)
3/3 (including correct --yes-playlist in playlist scenario)
%(playlist_index)s without zero-padding leads to wrong sort order
Missing --merge-output-format mp4
1
3
Output format uncertain (may be mkv/webm)
Missing --write-subs with --embed-subs
1
3
--embed-subs requires subtitles to be downloaded first
3.4 Trend: Skill Advantage Increases with Scenario Complexity¶
Scenario complexity
With-Skill advantage
Eval 1 (simple single video)
+58.3% (7 failures)
Eval 2 (medium dual-scenario)
+46.2% (6 failures)
Eval 3 (complex four-scenario overlay)
+60.0% (9 failures)
Unlike the go-makefile-writer evaluation where "Skill advantage decreases with complexity", this skill is strongest in the most complex scenario. Reason: yt-dlp command combinations have many implicit rules (--write-subs with --embed-subs, playlist template zero-padding, SponsorBlock ffmpeg dependency, etc.); the base model omits more details when stacking multiple scenarios.
Full copy-paste command + table of reasons per flag
Command + brief param notes
5. Execution status
"Not run in this environment"
No explicit declaration
6. Output location
Expected file path pattern
Brief save location
7. Next step
Ordered follow-up action list
Brief hint
Practical value: Output Contract enables: - Auditable command recommendations (know why specific flags were chosen) - Transparent Probe decisions (whether probe was skipped and why) - Clear next steps for users (no guessing)
Prevent watch URL from accidentally triggering full playlist download
Eval 1 ✅ / Eval 2 ✅
❌ / ❌
--yes-playlist
Explicitly declare playlist intent
Eval 3 ✅
❌
--download-archive
Prevent re-download
3/3 ✅
0/3 ❌
--retries/--fragment-retries
Network resilience
3/3 ✅
1/3
%(title).200s
Prevent long title path overflow
3/3 ✅
0/3 ❌
--no-playlist is the highest-risk safety gap. When a YouTube watch URL includes &list=, omitting --no-playlist downloads the entire playlist instead of one video, potentially causing tens of GB of accidental downloads. This is explicitly addressed in Skill Anti-Example #3.
With-skill’s bv* selector is better than bestvideo because * includes pre-merged video streams (some sites only offer pre-merged format). Without-skill’s bestvideo does not include pre-merged streams.
Outstanding — ~2,370 tokens contains all high-leverage rules
High-leverage token share
~35% (820/2,370) directly contributes to 20/22 assertion deltas
Low-leverage token share
~6% (150/2,370) no incremental contribution in this eval
Reference cost-effectiveness
Good — indirectly improves command completeness and technical correctness
5.7 Comparison with Other Skills’ Cost-Effectiveness¶
Metric
yt-dlp-downloader
go-makefile-writer
tdd-workflow
SKILL.md tokens
~2,370
~1,960
~2,100
Total load tokens
~5,100–5,730
~4,100–4,600
~3,600–4,800
Pass-rate gain
+55.0%
+31.0%
+46.2%
Tokens per 1% (SKILL.md)
~43 tok
~63 tok
~45 tok
Tokens per 1% (full)
~95 tok
~149 tok
~92 tok
yt-dlp-downloader has best token cost-effectiveness among the three skills because: 1. Base model has weaker grasp of yt-dlp’s implicit rules (45% baseline vs go-makefile 69%), more room for improvement 2. Skill’s high-leverage rules are compact (safe defaults, probe gate, output contract only ~820 tokens) 3. Reference conditional loading is well designed; simple scenarios don’t load everything
Command technical correctness increment is scored lower because Without-skill’s core commands are technically sound — the base model has good grasp of basic yt-dlp usage; the skill’s core value is safety guards, Probe discipline, and structured reports.