E2E test¶
Use this skill to create E2E coverage that is deterministic, evidence-backed, and maintainable in real repositories.
Use This Skill For¶
- selecting high-value journeys for E2E coverage
- creating or updating Playwright tests
- using Agent Browser to explore, debug, and reproduce flows
- flaky test triage with traces and artifacts
- CI gate design for critical journeys
- browser automation tasks where reproducibility matters
Do not use this skill for:
- visual design review with no automated journey value
- performance/load testing
- tests that require guessed secrets, endpoints, or private accounts
Load References Selectively¶
Always read (regardless of framework): - references/checklists.md — pre-run, coverage, flaky triage, quarantine, and result checklists. - references/environment-and-dependency-gates.md — environment readiness for local, preview, staging, or CI.
Read only for Playwright / JS projects (skip for Go, Python, or other non-JS): - references/playwright-patterns.md — selector, wait, assertion, config baseline, and version/platform gate rules. - references/playwright-deep-patterns.md — auth state, fixtures, data isolation, mocking, serial/parallel, and CI engineering. - references/anti-examples.md — common Playwright mistakes with corrected alternatives.
Read when using Agent Browser: - references/agent-browser-workflows.md — exploration, failure reproduction, flow-to-code conversion, and command starters.
Read when shaping reports or triaging flakes: - references/golden-examples.md — full output contract examples for Playwright and Go HTTP E2E tasks.
Run before gate decisions to collect repository facts: - scripts/discover_e2e_needs.sh — detects Playwright, Node.js, Go, framework, existing tests, env vars, and CI platform.
Runner Strategy¶
Use both tools intentionally, not interchangeably.
- Agent Browser first:
- journey discovery
- repro of flaky or environment-specific UI behavior
- fast semantic interaction and screenshot capture
- Playwright preferred for code:
- committed E2E tests
- CI suites
- repeated local validation
- multi-browser or matrix execution
If a task starts in Agent Browser and the flow is valuable long-term, convert the learned steps into Playwright coverage.
Operating Model¶
- Classify the task:
- new journey coverage
- flaky triage
- failed CI investigation
- exploratory browser reproduction
-
test architecture or CI gate design
-
Run
scripts/discover_e2e_needs.shto collect repository facts (Playwright version, framework, existing tests, env vars, CI platform). Use its structured output for gate decisions instead of guessing. -
Run the environment and configuration gate.
-
Choose the runner path:
- Agent Browser for exploration or reproduction
- Playwright for maintainable automated coverage
- both when discovery should become code
-
non-JS projects: use the project's native test framework (see Runner Selection below)
-
Produce only the strongest deliverable the environment can actually support:
- runnable test
- guarded scaffold with explicit skips
- triage report with repro commands
Runner Selection Guidance¶
If the project has no Node.js / Playwright (e.g., Go, Python, Rust web apps):
- Use the project's native test framework (Go
net/http, Pythonrequests/httpx, etc.) - Do NOT install Playwright into a project that has no JavaScript toolchain
- Follow the project's existing E2E test conventions if they exist
- Document the runner selection rationale in the Output Contract
- All 5 mandatory gates still apply regardless of runner choice
Mandatory Gates¶
1) Configuration Gate¶
Before generating or updating runnable tests:
- scan repository config, scripts, env files, docs, and existing tests
- list required variables, accounts, feature flags, and service dependencies
- mark each as:
availablemissingunknown
If required values are missing:
- do not invent them
- generate placeholder-only scaffolding with explicit TODOs and skip guards when code output is still useful
- otherwise stop and report the exact blockers
In every result include:
- required variable list
- example export block or config shape
- missing variables
2) Environment Gate¶
Before claiming a test is runnable, determine:
- target environment: local, preview, staging, CI
- base URL and auth flow
- whether seed/reset is deterministic
- whether third-party dependencies can be stubbed or must be live
- whether test accounts and permissions are available
Read references/environment-and-dependency-gates.md whenever environment readiness is uncertain.
3) Execution Integrity Gate¶
Never claim a suite or repro was executed unless it actually ran.
If commands were not run, output:
Not run in this environment- reason
- exact commands to run next
If commands were run, report:
- command(s)
- target environment
- pass/fail status
- artifact locations
4) Stability Gate¶
Do not treat a single pass as proof of reliability for critical paths or flaky failures.
Use repeat runs, traces, screenshots, and environment evidence before concluding:
- the bug is fixed
- the test is stable
- the failure is infra-only
5) Side-Effect Gate¶
Default to safe behavior for real-world side effects:
- avoid production data mutation
- avoid real-money or irreversible flows unless explicitly configured for safe test execution
- require explicit approval or isolation for destructive actions
Version and Platform Gate¶
Before recommending Playwright code or config, adapt to the repository's actual platform:
| Signal | Adaptation |
|---|---|
Playwright < 1.27 | Prefer locator and stable attribute selectors. Do not assume getByRole, getByLabel, getByTestId, or getByPlaceholder are available. |
Playwright < 1.30 | Be conservative with newer trace and snapshot ergonomics; keep config minimal and explicit. |
Playwright < 1.35 | Avoid assuming newer helper APIs without checking the installed version first. |
Node < 16 | Treat as upgrade-required for modern Playwright usage; do not present a "ready to run" claim. |
Node < 18 | Avoid assuming newer Web API defaults and modern runner ergonomics without verification. |
Framework adaptation checklist: - Next.js: verify baseURL, server startup, and auth/session bootstrapping strategy. - SPA: prefer explicit waits on route or API completion, not arbitrary sleeps. - SSR: assert server-rendered and hydrated states separately when needed. - Monorepo: locate the owning package, config file, and CI entrypoint before generating commands.
Playwright-First Engineering Rules¶
Use references/playwright-deep-patterns.md whenever generating or refactoring Playwright code.
At minimum:
- prefer reusable fixtures and domain helpers over copy-pasted flows
- use stable auth setup such as
storageStatewhen appropriate - isolate data per test or per worker
- define what can be mocked and what must stay real
- choose serial vs parallel execution intentionally
- keep retries, trace, screenshot, and video policies aligned with CI needs
If the repository lacks the needed config or fixtures, generate the smallest honest scaffold rather than pseudo-runnable code.
Anti-Examples¶
1) Unconditional waitForTimeout in assertions¶
BAD:
GOOD:2) Fragile CSS selector chains¶
BAD:
GOOD:3) Shared mutable data across tests¶
BAD:
const sharedEmail = "e2e-user@example.com";
test("profile update", async ({ page }) => { /* mutates same account */ });
const email = `e2e-${test.info().parallelIndex}-${Date.now()}@example.com`;
test("profile update", async ({ page }) => { /* isolated data per test */ });
4) Guessing env values or credentials¶
BAD:
await page.goto("http://staging");
await page.fill("#email", "fake-user@test.com");
await page.fill("#password", "password123");
test.skip(!process.env.E2E_BASE_URL || !process.env.E2E_USER, "explicit TODOs until config exists");
await page.goto(process.env.E2E_BASE_URL!);
5) Silently serializing entire suite¶
BAD:
GOOD:test.describe("checkout funnel", () => {
test.describe.configure({ mode: "serial" }); // justified by irreversible payment sandbox state
});
6) Repeating login instead of storageState¶
BAD:
test.beforeEach(async ({ page }) => {
await page.goto("/login");
await page.fill("#email", process.env.E2E_USER!);
});
7) Pseudo-runnable scaffold without test.skip¶
BAD:
GOOD:test.skip(!process.env.E2E_BASE_URL, "missing base URL");
// TODO: wire payment sandbox account before enabling this journey
Agent Browser Bridge¶
Use Agent Browser to discover or reproduce, then convert findings into durable code.
Required bridge steps:
- capture the environment and entry URL
- record the exact command sequence
- save milestone screenshots
- note the selectors or semantic targets that proved stable
- translate the validated flow into Playwright assertions and helpers
Read references/agent-browser-workflows.md when using Agent Browser.
Flaky Test Policy¶
See references/checklists.md §Flaky Triage Template for the complete template.
Key rules: - A test is flaky only with non-deterministic behavior under unchanged code and environment - Required sequence: reproduce (--repeat-each=N or -count=N) → classify root cause → fix → quarantine with deadline - Root cause categories: selector instability, async race, test-data coupling, network instability, environment drift, application defect - quarantine only with owner, tracking issue, and removal deadline
CI Strategy¶
For PR automation, separate Blocking critical journeys from broader nightly coverage:
- Blocking PR gate:
- run
playwright install --with-deps chromiumduring setup when browsers are not pre-baked - use Secret injection for base URL, auth state bootstrap, and sandbox-only credentials
- upload-artifact for trace, screenshot, video, and HTML report on failure
- keep retries and timeout values explicit in config and CI job output
- Nightly / extended lane:
- run broader browser matrix, accessibility sweeps, and visual regression
- increase retries only for known infra volatility, not to hide product bugs
Output Contract¶
For any E2E task, return:
Task typeRunner choiceEnvironment gateConfig/dependency statusCovered journeyorFailure under triageExecuted commandsExecution statusArtifactsNext actions
If code was generated, also include:
- files created or updated
- skip conditions or TODO markers if scaffolding only
Machine-Readable Summary (JSON)¶
When the output will be consumed by CI or downstream tooling, append:
{
"task_type": "new_journey_coverage",
"runner": "playwright",
"environment": "local",
"execution_status": "pass",
"tests_passed": 3,
"tests_failed": 0,
"tests_skipped": 0,
"artifacts": ["playwright-report/index.html", "test-results/"],
"scorecard": { "critical": "PASS", "standard": "5/6", "hygiene": "4/4" },
"blockers": [],
"next_actions": ["add password-reset edge case"]
}
Quality Scorecard¶
For non-Playwright runners (Go HTTP, Python requests, etc.), mark Playwright-specific items as N/A. Count only applicable items when computing pass rates.
Critical (any FAIL → overall FAIL)¶
| # | Item | PASS rule |
|---|---|---|
| C1 | No unconditional waitForTimeout in assertions | Zero instances outside diagnostic comments |
| C2 | Data isolation explicit | Each test owns its data or has deterministic cleanup |
| C3 | No guessed secrets or URLs | All external values from env/config with skip guard |
| C4 | All 5 mandatory gates addressed | Configuration, Environment, Execution Integrity, Stability, Side-Effect |
Standard (≥ 4/6 PASS)¶
| # | Item | PASS rule |
|---|---|---|
| S1 | Selectors use getByRole/getByLabel/getByTestId | ≥ 90% of interactions use accessible selectors |
| S2 | Auth strategy explicit | storageState reuse or justified in-test login |
| S3 | Assertions after major interactions | Every user-visible state change has an expect |
| S4 | Artifact policy configured | trace, screenshot, video settings present |
| S5 | Serial vs parallel justified | Serial only with documented reason |
| S6 | Mock boundaries documented | Each mocked dependency has rationale |
Hygiene (≥ 3/4 PASS)¶
| # | Item | PASS rule |
|---|---|---|
| H1 | Reusable fixtures/helpers | Shared flows extracted, not copy-pasted |
| H2 | Descriptive test names | Name describes user journey, not implementation |
| H3 | CI strategy present | Blocking gate vs nightly split documented |
| H4 | Repeat-run validation | Critical paths validated with --repeat-each |