create-pr Skill Design Rationale¶
create-pr is a delivery framework for opening pull requests to the main branch. Its core idea is: before a change enters team review, branch convention and sync checks, quality evidence, security risk, compatibility notes, and delivery state should already be made explicit. That is why the skill turns authentication checks, branch sync, risk classification, quality verification, security scanning, documentation checks, commit hygiene, and PR content into a fixed process.
1. Definition¶
create-pr is a structured delivery skill for opening PRs to the main branch. It requires a series of mandatory Gates to run before PR creation, uses a standardized PR title and PR body template to organize the result, and only then decides whether the PR should be draft or ready. Its output includes a PR URL together with a delivery report containing Gate outcomes, uncovered risks, PR metadata, and next actions.
2. Background and Problems¶
The skill addresses a common PR-delivery problem: many PRs enter team review with major uncertainty already embedded in them, but that uncertainty is never written down explicitly.
Without a clear process, problems usually cluster into six categories:
| Problem | Typical consequence |
|---|---|
| Branch convention and sync checks are missing | PR opened from main, branch is behind, working tree is dirty, or conflict markers remain |
| Quality evidence is incomplete | Tests, lint, or build were never run, or were run without leaving visible evidence in the PR |
| Security checks are skipped | Secrets, credentials, vulnerable dependencies, or high-confidence security findings enter review unnoticed |
| Documentation and compatibility notes are missing | Reviewers cannot tell whether behavior changed, whether the change is breaking, or how to roll it back |
| Draft / ready is decided by feel | A PR with large verification gaps is still marked ready |
| PR content is poorly organized | Title is inconsistent, body misses critical sections, and risk/evidence are scattered |
The design goal of create-pr is to move these "open it first, explain later" failures forward into explicit Gates, each with a checkable conclusion.
3. Comparison with Common Alternatives¶
Before looking at the details, it helps to compare the skill with a few common alternatives:
| Dimension | create-pr skill | Manually using gh pr create | Relying on CI / reviewers to catch everything |
|---|---|---|---|
| Repository preflight before PR creation | Strong | Weak | Weak |
| Branch convention and sync checks | Strong | Weak | Weak |
| Explicit quality evidence | Strong | Medium | Weak |
| Security scanning | Strong | Weak | Medium |
| Draft / ready decision quality | Strong | Weak | Weak |
| PR body structure | Strong | Weak | Weak |
| Risk and rollback communication | Strong | Medium | Weak |
| Reviewer friendliness | Strong | Medium | Weak |
The skill does not replace CI, code review, or GitHub itself. It standardizes what should already be true before a PR enters review.
4. Core Design Rationale¶
4.1 It Uses a Fixed Process vs. Free-Form Execution¶
create-pr fixes the overall workflow into thirteen steps and breaks the most important parts into Gates A through H. This is necessary because PR creation is a step that both humans and models tend to underestimate.
It is easy to think: "the code is already done; opening the PR is just the last small step." But the quality of team review often depends on whether this "last step" was handled properly:
- Is the branch actually synced to the latest
origin/main? - Were the checks really run?
- Is security clean, or was it never checked?
- Is this a low-risk patch or a high-risk change touching auth, migrations, concurrency, or public APIs?
- Should this PR really be
ready, or should it still bedraft?
The fixed process turns these easily skipped judgments into a delivery chain that is clear, reviewable, and much less dependent on ad hoc improvisation.
4.2 Authentication and Repository Preflight Come First¶
Gate A checks GitHub authentication, remote setup, default branch information, permissions, and branch protection status.
The point is not technical complexity. The point is to avoid pushing the rest of the workflow forward when the basic operating assumptions are wrong. For example:
ghis not logged in, but the author is already drafting a PR body.- The remote does not actually use
main, so every later comparison target is wrong. - The current account does not have permission, so PR creation cannot succeed.
- Branch protection status cannot be queried and needs to be recorded as an uncovered gap.
These are not code problems, but if the premise is wrong, the rest of the Gates may no longer mean anything. That is why this step is placed first and treated as a blocker.
4.3 Branch Convention and Sync Checks Get Their Own Gate¶
Gate B is dedicated to branch health: the PR must not come from main, the working tree must be clean, conflict markers must be absent, the branch must include the latest origin/main, and branch naming should be checked for convention.
This is an important layer in create-pr, because many PR problems are not implementation defects at all. The branch itself is not in a reviewable state. Examples include:
- Opening a PR from a dirty working tree, so local-only edits are mixed with committed changes.
- Letting the branch drift far behind main, so reviewers see a stale integration context.
- Using a branch name that no longer reflects the actual change, making later tracking harder.
- Leaving conflict markers in files for CI or reviewers to discover later.
Moving these checks into Gate B means the skill first verifies whether the branch is fit to become a review object.
4.4 It Classifies Change Risk vs. Only Listing the Diff¶
Gate C does more than list changed files. It identifies high-risk areas and classifies the PR by total changed lines:
≤ 400lines: normal401–800lines: warning; consider splitting> 800lines: strong warning unless the change is inherently atomic
This solves a very practical review problem: not every PR deserves the same review expectation.
Two PRs may both be "a pull request," but they are not equal:
- A small patch may mainly need correctness and regression review.
- A small but high-risk change touching auth, payment, migration, concurrency, public APIs, or infra config needs much clearer risk and rollback notes.
- A huge PR may still be logically correct, but review quality itself becomes weaker as change size grows.
So create-pr does not treat the diff as a display artifact. It treats the diff as an input that determines how much explanation and caution the PR should carry.
4.5 Quality Evidence Must Be Explicitly Recorded¶
Gate D requires project-standard checks to run first, with fallback to language defaults, and it requires the exact commands and pass/fail results to be recorded.
The design principle here is straightforward: if something was not run, the PR must not read as though it was verified.
This solves two common failures:
- The PR says "tested," but no one knows what was actually run.
- Commands failed in a terminal, but the PR body contains no visible trace of that fact.
The skill is not asking only for checks to run. It is asking for check results to be converted into evidence a reviewer can directly consume.
4.6 Security Checking Is Its Own Gate and Multi-Tool Validation Matters¶
Gate E groups filename risk scans, content risk scans, and Go-specific tools such as gosec and govulncheck.
This is one of create-pr's strongest design choices. A PR is the last point before risky code enters shared team attention. If a high-confidence security issue is not stopped here, then by the time a reviewer catches it, team time has already been wasted.
Scenario 3 in the evaluation report makes this concrete: with-skill did not merely notice a hardcoded ghp_ token. It built a stronger evidence chain through regex scanning and tooling, and explicitly advised against pushing or creating the PR. Without the skill, the issue could still be recognized, but the result looked more like a normal review comment than a formal blocking security conclusion.
This design improves:
- discovery of high-risk issues,
- clarity around whether the issue is confirmed or merely suspicious,
- the evidence basis for draft / ready decisions.
4.7 Documentation and Compatibility Must Be Clear Before Review Starts¶
Gate F checks whether docs, README, or changelog updates are needed, and requires compatibility and breaking-change implications to be made explicit.
This rule corrects a common misunderstanding: many teams treat documentation updates and compatibility notes as something to clean up after merge, but reviewers often need that information during review:
- Does this change alter external behavior?
- Is there migration cost?
- If rollout fails, how do we roll back?
- If the change is breaking, what exactly is affected?
Without this information, reviewers see only a code diff, not a complete delivery unit. Gate F makes clear that a PR is not just code for review. It is also a change explanation for review.
4.8 Commit Hygiene and PR Title Are Checked Together¶
Gate G checks both the commit set and the PR title, and requires both to follow Conventional Commits. It especially enforces:
<type>(<scope>): <subject>formatting,- subject length of at most 50 characters,
- imperative mood, no trailing period.
This has a very practical reason: in many teams, squash merge turns the PR title into the final commit message on main. If the PR title is left uncontrolled, the team can care deeply about commit quality beforehand and still end up with poor main-branch history.
So the skill effectively extends commit hygiene from the commit level to the PR level. It prevents situations such as:
- commits are acceptable, but the PR title is vague or sloppy,
- unrelated commits are mixed into the range,
- reviewers have to infer what the PR should have been called.
4.9 It Introduces a confirmed / likely / suspected Confidence Model¶
One of the most important parts of create-pr is that it does not jump directly to a binary draft vs ready. It first assigns one of three confidence levels:
confirmedlikelysuspected
and then uses that intermediate layer to decide the PR state.
This matters because real PR states are not always binary:
- Some PRs have all major Gates executed and passed. They can confidently be marked ready.
- Some PRs are almost ready but still have one non-blocking gap.
- Some PRs have multiple unverified Gates or unresolved high-risk issues and clearly should stay draft.
Without a confidence layer, the process easily collapses into subjective judgment: "it probably looks fine; let us mark it ready." With confidence, draft / ready becomes an evidence-based conclusion instead of a mood.
4.10 The PR Body Structure Is Fixed¶
create-pr requires the PR body to contain at least eight sections:
- Problem/Context
- What Changed
- Why This Approach
- Risk and Rollback Plan
- Test Evidence
- Security Notes
- Breaking Changes / Migration Notes
- Reviewer Checklist
This is not about making PR descriptions longer. It is about making information predictable. Reviewers should be able to find a known type of information in a known location.
The design problem it solves is not "can someone write a PR description?" It is "does the PR description support a high-quality review?" It groups the exact categories reviewers care about:
- the background,
- what changed,
- why it changed that way,
- where the risk is,
- what was tested,
- whether security concerns exist,
- whether the change is breaking,
- what reviewers should pay special attention to.
The evaluation report shows that this is one of the largest gaps between with-skill and without-skill output. Without the skill, descriptions may still be readable, but they often omit risk, rollback, security, or compatibility sections that heavily affect review quality.
4.11 Post-Create Verification Is Still Worth Keeping¶
Gate H verifies after the PR is created that:
- the base is really
main, - the head is really the current branch,
- the title and body rendered correctly,
- the draft / ready state matches the actual Gate results.
Many workflows stop once gh pr create succeeds. create-pr prefers one more verification pass, because successful creation is not the same as correct creation:
- the base branch may be wrong,
- the title or body may render incorrectly because of escaping or formatting,
- a PR that should be draft may end up ready.
It is worth noting that Gate H is softer than Gates A-G: in the quick reference it is marked informational, and gh pr checks is explicitly optional and non-blocking. Its purpose is to give the delivery a final consistency check between the intended result and the actual result, not to introduce a whole new class of hard blockers after PR creation.
4.12 Required Uncovered Risk List¶
The skill does not allow unverifiable areas to pass silently. It requires them to be listed explicitly in Uncovered Risk List.
This is essential because real projects are not always in an ideal state:
- a tool may not be installed,
- branch protection status may be unavailable,
- a test environment may not be runnable at the moment,
- an equivalent upstream control may exist, but local direct evidence may still be missing.
Without Uncovered Risk List, these gaps either disappear or get buried in free-form prose. Once they are explicit, the team can actually discuss:
- whether the risk is acceptable,
- who owns the follow-up,
- why the PR should or should not be marked ready before the gap is closed.
5. Problems This Design Addresses¶
Cross-referencing SKILL.md and the evaluation report, the skill mainly addresses the following engineering problems:
| Problem | Corresponding design | Practical effect |
|---|---|---|
| PR creation starts from false assumptions | Gate A authentication and repository preflight | Prevents continuing from the wrong repo state or permission context |
| Branch state is not reviewable | Gate B branch convention and sync checks | Prevents dirty trees, stale branches, and unresolved conflicts from entering review |
| Review burden does not match change size | Gate C risk classification and line-count tiers | Helps the team decide whether splitting or extra explanation is needed |
| Quality verification has no visible evidence | Gate D quality evidence recording | Reviewers can directly see what was run and what passed or failed |
| High-confidence security issues enter review | Gate E multi-tool security scanning | Stops risky changes before they consume reviewer time |
| Documentation and compatibility context is missing | Gate F docs and compatibility checks | Reviewers can understand external impact and rollback expectations |
| Commit history and squash result drift apart | Gate G commit hygiene + PR title rules | Keeps main-branch history aligned with PR intent |
| Draft / ready is decided by intuition | Confidence model + Gate-driven decision | Makes PR state evidence-based rather than subjective |
| The created PR can still be wrong after success | Gate H post-create verification | Ensures the created result matches the intended one |
| Verification gaps disappear silently | Uncovered Risk List | Makes residual risk visible, assignable, and trackable |
The evaluation data supports this clearly: create-pr passed all 34 assertions across 3 scenarios, while without-skill strict pass rate was only 29%; even on the 12 substantive checks that do not depend on flow-structure assertions, with-skill was 100% and without-skill only 58%. So the value is not just "better organization." It is a major increase in completeness and reliability at the PR creation stage.
6. Key Highlights¶
6.1 It Turns "Opening a PR" into a Delivery Process¶
This is the skill's central strength. It does not treat PR creation as a final click. It treats it as a formal delivery action before team review begins.
6.2 It Uses Gates to Lock Down the Steps Most Commonly Skipped¶
Many baseline behaviors can perform some of these checks part of the time. The advantage of create-pr is not that any single rule is magical. It is that auth, branch convention and sync checks, quality evidence, security, documentation, commit hygiene, and post-create verification are connected into one reliable chain.
6.3 The Confidence Model Reduces State Misclassification¶
The confirmed / likely / suspected layer is especially valuable. It prevents draft / ready from collapsing into a rough binary switch and instead turns state into a conclusion with an evidence gradient.
6.4 It Prepares the Information Reviewers Actually Need¶
The fixed PR body and output contract both serve the same purpose: reducing reviewer comprehension cost. Reviewers do not have to hunt through prose to answer:
- why this change exists,
- what changed,
- where the risk is,
- what was tested,
- what remains uncovered.
The structure puts those answers in predictable places.
6.5 It Adds the Most Value Where the Baseline Is Weakest¶
The evaluation report gives create-pr the largest pass-rate delta of the evaluated skills (+71 pp). That is a strong signal that PR creation is not a naturally strong area for the baseline model and benefits heavily from a dedicated structured workflow.
7. When to Use It — and When Not To¶
| Scenario | Suitable | Reason |
|---|---|---|
Opening a formal PR to main | Yes | This is the core use case |
| Changes touching security, migrations, concurrency, public APIs, or infra config | Especially yes | Risk classification, Gate flow, and body structure materially improve delivery quality |
| Team uses squash merge | Especially yes | PR title quality directly affects the final main-branch commit |
| You only want a quick incomplete draft link and do not care about evidence or explanation | Not really | That runs against the skill's design goal |
| The task is writing code, fixing a bug, or reviewing code | No | Those belong to other skills or workflows |
8. Conclusion¶
The real strength of create-pr is not that it can call gh pr create. It is that it systematizes the decisions around PR creation that are easiest to skip or handle by feel. Through its Gates, it turns repository assumptions, branch convention and sync checks, quality evidence, security checks, compatibility notes, commit discipline, PR structure, and final state into a checkable delivery loop.
From a design perspective, the skill is a clear example of production-grade delivery principles: confirm the operating assumptions first, then organize the review material; make the risks explicit before asking others to review; state what is verified and what is still uncovered before deciding whether the PR is ready. That is why it creates such a large quality gain in a workflow area where the baseline model is especially weak.
9. Document Maintenance¶
This document should be updated when:
- the Gate flow, confidence model, non-negotiables, or output format in
skills/create-pr/SKILL.mdchanges, - the PR template, checklists, or configuration examples in
skills/create-pr/references/*.mdchange, - key data in
evaluate/create-pr-skill-eval-report.mdthat supports the claims here changes, - the project's conventions around PRs to main, PR title rules, breaking-change disclosure, or rollout / rollback expectations change.
Review quarterly; review immediately if the create-pr skill undergoes substantial refactoring.
10. Further Reading¶
skills/create-pr/SKILL.mdskills/create-pr/references/pr-body-template.mdskills/create-pr/references/create-pr-checklists.mdskills/create-pr/references/create-pr-config.example.yamlskills/create-pr/references/merge-strategy-guide.mdskills/create-pr/references/bundled-script-guide.mdevaluate/create-pr-skill-eval-report.md