api-integration-test Skill Design Rationale¶
api-integration-test is a safety-oriented execution framework for internal API integration tests. Its core idea is: integration testing should first establish whether the test type is correct, whether the available configuration is sufficient, whether the runtime path is safe, and whether the resulting test can be understood, executed, and audited by the team over time. That is why the skill turns scope validation, Go-version adaptation, configuration completeness, execution mode, safety gates, build-tag isolation, and structured reporting into a fixed workflow.
1. Definition¶
api-integration-test is a skill for gated integration tests against internal Go HTTP / gRPC APIs. It is designed to create, maintain, and run integration tests with real configuration, focusing on contract verification, failure diagnosis, and safe controlled execution. Its output is not just test code. It also includes execution mode, degradation level, required environment variables, exact run commands, and a structured result report.
2. Background and Problems¶
The skill is not solving "people do not know how to write Go tests." It is solving "many so-called integration tests are not reliable in scope, safety, executability, or long-term maintainability."
Without a clear framework, problems usually fall into six categories:
| Problem | Typical consequence |
|---|---|
| Wrong scope classification | Third-party API tests, unit tests, and internal API integration tests get mixed together and the whole approach is wrong |
| Incomplete configuration is still written as if runnable | Tests contain guessed hosts, credentials, or IDs that only look executable |
Integration tests enter normal go test ./... by default | Missing build tags increase CI compile time or even fail in environments without the right dependencies |
| Production protection is too weak | Mis-set environment variables cause tests to hit production services |
| Timeout, retry, and failure-path rules are too loose | Tests hang, retry forever, or only prove success but never prove that failure modes behave correctly |
| Output is not a real deliverable | A few code snippets exist, but no one knows the prerequisites, degradation reason, missing inputs, or exact reproduction command |
The design goal of api-integration-test is to decide first whether the task should be done, to what degree it can be done, and whether doing it is safe, before deciding what test code should be generated.
3. Comparison with Common Alternatives¶
Before looking at the details, it helps to compare the skill with a few common alternatives:
| Dimension | api-integration-test skill | Asking a model to "write an integration test" | Treating integration tests like ordinary unit tests |
|---|---|---|---|
| Scope recognition | Strong | Weak | Weak |
| Configuration completeness checks | Strong | Weak | Weak |
| Production safety protection | Strong | Weak | Weak |
| Build-tag isolation | Strong | Weak | Weak |
| Real-config executability | Strong | Medium | Weak |
| Auditability of output | Strong | Weak | Weak |
| Timeout / retry discipline | Strong | Medium | Weak |
| CI integration friendliness | Strong | Medium | Weak |
The skill does not replace general testing ability. It turns "real integration tests against internal APIs" from a vague request into a controlled workflow that can be safely adopted.
4. Core Design Rationale¶
4.1 Scope Validation Comes Before Test Authoring¶
The first Mandatory Gate is Scope Validation. It requires the task to be classified before any test code is written.
This matters because many testing requests sound similar on the surface but require fundamentally different handling:
- Internal HTTP / gRPC API integration tests: in scope for this skill
- Pure unit tests: should be redirected to
unit-test - Third-party vendor API tests: should be redirected to
thirdparty-api-integration-test - Browser end-to-end journeys: not part of this workflow at all
If this step is skipped, every later best practice may be applied to the wrong problem. Eval 2 in the evaluation report shows why this matters: with-skill correctly identified the GitHub API as a third-party case, but the current hard-stop wording was still not strong enough to fully prevent test generation. That edge case makes the role of the Scope Gate especially visible.
4.2 Go Version Is Its Own Gate¶
The second Mandatory Gate is the Go Version Gate. It requires reading go.mod and adapting the test implementation to the actual Go version.
This gate is not about showing off modern Go features. It is about making sure the generated test style really fits the project. For example:
t.Setenvrequires Go 1.17+- before Go 1.22, range-variable closure capture still needs explicit care
- before Go 1.24, mixing
t.Parallel()andt.Setenv()in the same subtest is unsafe
Without this step, the generated code may look idiomatic in isolation but still be mismatched to the target repository or subtly unsafe in test behavior.
4.3 Configuration Completeness Matters More Than "Just Write Something"¶
The third Mandatory Gate is Configuration Completeness. It splits the outcome into three levels:
FullScaffoldBlocked
This is one of the most important design choices in the skill, because one of the biggest integration-test failures is writing something that looks runnable when the necessary config is not actually known.
The skill refuses to do that:
- if configuration is complete, generate full tests;
- if some inputs are missing, generate a scaffold with
t.Skip(...)and// TODO:markers; - if critical inputs are unknown, stop and output only the variable checklist and setup guidance.
This has three practical benefits:
- it avoids guessed endpoints, credentials, or test identifiers,
- it makes missing prerequisites explicit,
- it keeps the generated artifact aligned with its true execution status.
The skill would rather produce an honest scaffold than a fake-complete test that cannot really run.
4.4 Execution Mode Is Selected Up Front¶
The fourth Mandatory Gate is Execution Mode. It automatically classifies the task into Smoke, Standard, or Comprehensive.
This aligns test depth with scenario risk:
| Mode | Typical use | Design purpose |
|---|---|---|
Smoke | Connectivity checks, first-time environment validation, one read-only endpoint | Answer "does this path work at all?" |
Standard | Most normal internal API integration tests | Balance success-path, failure-path, and business-level assertions |
Comprehensive | Broad coverage, migration validation, high-risk APIs | Add concurrency, large payloads, pagination, rate limiting, and fuller path coverage |
Without mode selection, the workflow usually distorts in one of two ways:
- simple checks become too heavy to use,
- risky APIs are reduced to a shallow "200 OK" smoke check.
4.5 Production Safety Must Be a Hard Gate¶
The fifth Mandatory Gate is Production Safety. It requires:
ENV=prod/productionto default tot.Skip,- explicit opt-in through
INTEGRATION_ALLOW_PROD=1, - an additional explicit gate such as
INTEGRATION_ALLOW_DESTRUCTIVE=1for destructive operations.
This is the strongest safety layer in the skill. The evaluation report also identifies it as a skill-only capability: with-skill was correct in all three scenarios, while without-skill missed it in all three.
The problem it solves is direct and serious: many integration tests can become production incidents if a gate variable is accidentally enabled. Without this layer, a "test" may become a real write path into production.
And the protection is not a single check. It is defense in depth:
- one gate controls whether integration tests are allowed to run at all,
- another gate ensures that even when running is allowed, production is still blocked by default.
That is why the evaluation report treats it as a safety-critical dimension.
4.6 Build-Tag Isolation Is Not a Minor Detail¶
The skill requires every integration test file to include:
//go:build integration// +build integration
These two lines may look like a small file-header detail, but they are a major design feature.
The evaluation report shows that without-skill missed them in all three scenarios. The consequences are concrete:
go test ./...still compiles the integration test files,- CI compile time increases,
- environments without the right integration dependencies may fail before tests even reach the skip path.
So build-tag isolation is not "slightly better style." It is an explicit declaration in code that this test is not part of the default test path and must be deliberately enabled.
4.7 The Skill Insists on Real Transport Paths vs. Mocking the Core Client¶
The skill explicitly rejects "integration tests" that mock out the client or transport being integrated against. Once the real http.Client or gRPC connection is replaced with a fake, the task has effectively turned back into a unit test.
The design principle here is: the value of an integration test is to validate the real transport path, real config resolution, and real protocol contract, not merely a business function's return value.
That is why the skill requires:
- a real client instance, or a real handler plus
httptestserver, - request construction through production code paths,
- both protocol-level and business-level assertions.
This gives the skill a clear separation from pseudo-integration tests that only imitate the shape of integration.
4.8 Timeouts and Retries Are Strictly Bounded¶
The skill is very explicit about context.WithTimeout, bounded retry policy, and checking ctx.Done().
The goal is not "make the test more persistent." It is "make the test more stable without hiding real failures."
If retries and timeouts are left loose, common failures include:
- infinite retries that mask the actual defect,
- hanging tests where it is unclear whether the service is stuck or the test is,
- use of convenience helpers like
http.Post/http.Getthat do not accept context and make cancellation behavior opaque.
Eval 3 in the evaluation report illustrates this clearly: without-skill used http.Post directly and lost unified timeout control; with-skill preserved consistent context-based timeouts.
4.9 The Skill Requires Both Protocol-Level and Business-Level Assertions¶
api-integration-test is not satisfied with HTTP 200 or require.NoError alone. It requires both:
- protocol-level contract checks: status codes, gRPC codes, required response fields;
- business-level invariants: identity mapping, state transitions, numeric constraints, and similar semantics.
This matters because one of the most common distortions in API integration testing is: "the request succeeded, therefore the test passed." But a successful request does not prove that the returned data or behavior is semantically correct.
The value of this design is that it upgrades the goal from simple connectivity to contract-and-semantics verification.
4.10 Structured Output¶
The skill explicitly requires the shared Output Contract to be loaded when reporting results, and the evaluation report shows that with-skill outputs consistently included information such as:
- Execution Mode
- Degradation Level
- Gate Variables
- Exact Commands
- Timeout / Retry Policy
- Result Summary
- Quality Scorecard
This solves a very practical problem: many teams may get test code, but still have no idea:
- which environment variables it depends on,
- whether the result is full or scaffold-only,
- whether anything actually ran,
- why something was skipped,
- what exact command should be used to reproduce it.
The purpose of structured output is to turn "some test code" into an actual testing deliverable. That is why the evaluation report treats Output Contract as a skill-only differentiator.
4.11 Degradation Exists vs. a Binary "Can Write / Cannot Write"¶
The skill does not reduce the world to "full success" or "total failure." It deliberately supports Full / Scaffold / Blocked.
This is a mature design choice because real projects often start from partial information. If the model is given only two outcomes, it tends to fall into one of two bad behaviors:
- writing a fake-complete test when key information is missing,
- stopping entirely at the first missing input and producing nothing useful.
The degradation model allows a third path: an honest, intermediate artifact that can still be completed later. In practice, that is much more valuable than a polished output built on guesses.
4.12 CI Integration Comes Last, Not First¶
The skill does not begin with GitHub Actions, Make targets, or Docker Compose. It first defines scope, safety, configuration requirements, and execution constraints, and only then talks about CI integration.
That ordering reflects a strong design choice: make sure the test itself is correctly defined before deciding how to wire it into CI.
Otherwise it is easy to invert the whole process:
- the CI step exists,
- the test appears to run,
- but the test still lacks build tags, prod safety, or explicit gate variables.
Seen this way, CI is an amplifier. The earlier safety and boundary decisions are the foundation.
5. Problems This Design Addresses¶
Cross-referencing SKILL.md and the evaluation report, the skill mainly addresses the following engineering problems:
| Problem | Corresponding design | Practical effect |
|---|---|---|
| Wrong test-type classification | Scope Validation Gate | Distinguishes internal APIs, third-party APIs, and unit-test scenarios |
| Generated test style does not fit project Go version | Go Version Gate | Avoids patterns that are invalid or unsafe for the target Go version |
| Incomplete configuration presented as runnable | Configuration Completeness Gate + degradation levels | Keeps output aligned with true executability |
| Integration tests leak into default test runs | Build-tag isolation | Prevents ordinary go test ./... from compiling and triggering them |
| Tests accidentally hit production | Production Safety Gate | Blocks production execution by default and requires explicit override |
| Timeout / retry behavior is uncontrolled | context.WithTimeout + bounded retry rules | Improves stability without masking real defects |
| Tests only prove "it responds," not "it is correct" | Protocol-level + business-level assertions | Brings tests closer to real contract verification |
| Outputs are not reusable or auditable | Output Contract | Makes prerequisites, result state, gaps, and reproduction commands explicit |
The evaluation data supports this clearly: with-skill overall pass rate was 94.7%, while without-skill was only 57.9%; for build-tag isolation, Production Safety Gate, and Output Contract, with-skill was 3/3 and without-skill was 0/3. So the value is not merely "more complete test code." It is turning integration testing into a safer, clearer, and more maintainable delivery artifact.
6. Key Highlights¶
6.1 It Asks "Should This Test Be Written?" Before "How Should It Be Written?"¶
This is the skill's most important strength. Scope validation comes first, which is intended to prevent effort from being invested in the wrong test category. The current evaluation also shows that scope recognition is already strong, even though the hard-stop semantics can still be tightened.
6.2 Its Safety Gates Are Unusually Strong¶
Production Safety Gate, destructive-operation gating, and build-tag isolation work together to make the skill much safer than the baseline in terms of preventing accidental production impact.
6.3 It Handles Incomplete Information Honestly¶
The Full / Scaffold / Blocked model is extremely practical. It avoids both extremes: stopping immediately whenever data is missing, and pretending everything is fine when it is not.
6.4 The Goal Is Not Just to Run Tests, but to Make Them Diagnosable¶
Output Contract, Exact Commands, Timeout / Retry Policy, and the Quality Scorecard all show that the skill is not merely trying to generate a test file. It is trying to generate something the team can run, skip, diagnose, and audit.
6.5 Its Biggest Advantages Appear Exactly Where the Baseline Is Weakest¶
The evaluation report shows that most of the skill's unique value appears in the places the baseline most often misses:
- build tags,
- production safety,
- scope recognition,
- structured reporting.
So the skill does not win mainly by "knowing a bit more HTTP / gRPC." It wins by systematically controlling the delivery boundary of integration testing.
7. When to Use It — and When Not To¶
| Scenario | Suitable | Reason |
|---|---|---|
| Writing integration tests against internal HTTP / gRPC APIs with real configuration | Yes | This is the core use case |
| Triage of failing internal API integration tests | Yes | Structured output and execution mode selection help with diagnosis |
| Introducing gated integration tests into CI | Yes | Build tags, environment gates, and structured reporting fit this workflow well |
| Pure unit tests | No | Redirect to unit-test |
| Third-party vendor API tests | No | Redirect to thirdparty-api-integration-test |
| Browser end-to-end flows | No | This is outside API integration-test scope |
8. Conclusion¶
The real strength of api-integration-test is not that it can write a request and an assertion. It is that it systematizes the parts of internal API integration testing that most often go wrong: validate scope first, confirm Go version and config state, choose the right execution mode, and then constrain the final artifact with production safety, build tags, timeout / retry rules, contract assertions, and structured reporting.
From a design perspective, the skill is a strong example of production-grade integration-testing principles: make sure the test type and runtime assumptions are correct before chasing coverage; set the safety boundary before allowing execution; make missing coverage explicit before calling the result "done." That is why it is especially well suited to real internal API environments rather than as a generic replacement for all testing work.
9. Document Maintenance¶
This document should be updated when:
- the Mandatory Gates, Execution Modes, Safety Rules, or Output Contract in
skills/api-integration-test/SKILL.mdchange, - the scope gate, output format, mode examples, or advanced patterns in
skills/api-integration-test/references/*.mdchange, - key data in
evaluate/api-integration-test-skill-eval-report.mdthat supports the claims here changes, - the project's conventions for internal API integration tests, production protection, CI execution, or gate environment variables change.
Review quarterly; review immediately if the api-integration-test skill undergoes substantial refactoring.
10. Further Reading¶
skills/api-integration-test/SKILL.mdskills/api-integration-test/references/common-integration-gate.mdskills/api-integration-test/references/common-output-contract.mdskills/api-integration-test/references/checklists.mdskills/api-integration-test/references/internal-api-patterns.mdskills/api-integration-test/references/advanced-patterns.mdevaluate/api-integration-test-skill-eval-report.md