Skip to content

E2E test

Use this skill to create E2E coverage that is deterministic, evidence-backed, and maintainable in real repositories.

Use This Skill For

  • selecting high-value journeys for E2E coverage
  • creating or updating Playwright tests
  • using Agent Browser to explore, debug, and reproduce flows
  • flaky test triage with traces and artifacts
  • CI gate design for critical journeys
  • browser automation tasks where reproducibility matters

Do not use this skill for:

  • visual design review with no automated journey value
  • performance/load testing
  • tests that require guessed secrets, endpoints, or private accounts

Load References Selectively

Always read (regardless of framework): - references/checklists.md — pre-run, coverage, flaky triage, quarantine, and result checklists. - references/environment-and-dependency-gates.md — environment readiness for local, preview, staging, or CI.

Read only for Playwright / JS projects (skip for Go, Python, or other non-JS): - references/playwright-patterns.md — selector, wait, assertion, config baseline, and version/platform gate rules. - references/playwright-deep-patterns.md — auth state, fixtures, data isolation, mocking, serial/parallel, and CI engineering. - references/anti-examples.md — common Playwright mistakes with corrected alternatives.

Read when using Agent Browser: - references/agent-browser-workflows.md — exploration, failure reproduction, flow-to-code conversion, and command starters.

Read when shaping reports or triaging flakes: - references/golden-examples.md — full output contract examples for Playwright and Go HTTP E2E tasks.

Run before gate decisions to collect repository facts: - scripts/discover_e2e_needs.sh — detects Playwright, Node.js, Go, framework, existing tests, env vars, and CI platform.

Runner Strategy

Use both tools intentionally, not interchangeably.

  • Agent Browser first:
  • journey discovery
  • repro of flaky or environment-specific UI behavior
  • fast semantic interaction and screenshot capture
  • Playwright preferred for code:
  • committed E2E tests
  • CI suites
  • repeated local validation
  • multi-browser or matrix execution

If a task starts in Agent Browser and the flow is valuable long-term, convert the learned steps into Playwright coverage.

Operating Model

  1. Classify the task:
  2. new journey coverage
  3. flaky triage
  4. failed CI investigation
  5. exploratory browser reproduction
  6. test architecture or CI gate design

  7. Run scripts/discover_e2e_needs.sh to collect repository facts (Playwright version, framework, existing tests, env vars, CI platform). Use its structured output for gate decisions instead of guessing.

  8. Run the environment and configuration gate.

  9. Choose the runner path:

  10. Agent Browser for exploration or reproduction
  11. Playwright for maintainable automated coverage
  12. both when discovery should become code
  13. non-JS projects: use the project's native test framework (see Runner Selection below)

  14. Produce only the strongest deliverable the environment can actually support:

  15. runnable test
  16. guarded scaffold with explicit skips
  17. triage report with repro commands

Runner Selection Guidance

If the project has no Node.js / Playwright (e.g., Go, Python, Rust web apps):

  • Use the project's native test framework (Go net/http, Python requests/httpx, etc.)
  • Do NOT install Playwright into a project that has no JavaScript toolchain
  • Follow the project's existing E2E test conventions if they exist
  • Document the runner selection rationale in the Output Contract
  • All 5 mandatory gates still apply regardless of runner choice

Mandatory Gates

1) Configuration Gate

Before generating or updating runnable tests:

  • scan repository config, scripts, env files, docs, and existing tests
  • list required variables, accounts, feature flags, and service dependencies
  • mark each as:
  • available
  • missing
  • unknown

If required values are missing:

  • do not invent them
  • generate placeholder-only scaffolding with explicit TODOs and skip guards when code output is still useful
  • otherwise stop and report the exact blockers

In every result include:

  • required variable list
  • example export block or config shape
  • missing variables

2) Environment Gate

Before claiming a test is runnable, determine:

  • target environment: local, preview, staging, CI
  • base URL and auth flow
  • whether seed/reset is deterministic
  • whether third-party dependencies can be stubbed or must be live
  • whether test accounts and permissions are available

Read references/environment-and-dependency-gates.md whenever environment readiness is uncertain.

3) Execution Integrity Gate

Never claim a suite or repro was executed unless it actually ran.

If commands were not run, output:

  • Not run in this environment
  • reason
  • exact commands to run next

If commands were run, report:

  • command(s)
  • target environment
  • pass/fail status
  • artifact locations

4) Stability Gate

Do not treat a single pass as proof of reliability for critical paths or flaky failures.

Use repeat runs, traces, screenshots, and environment evidence before concluding:

  • the bug is fixed
  • the test is stable
  • the failure is infra-only

5) Side-Effect Gate

Default to safe behavior for real-world side effects:

  • avoid production data mutation
  • avoid real-money or irreversible flows unless explicitly configured for safe test execution
  • require explicit approval or isolation for destructive actions

Version and Platform Gate

Before recommending Playwright code or config, adapt to the repository's actual platform:

Signal Adaptation
Playwright < 1.27 Prefer locator and stable attribute selectors. Do not assume getByRole, getByLabel, getByTestId, or getByPlaceholder are available.
Playwright < 1.30 Be conservative with newer trace and snapshot ergonomics; keep config minimal and explicit.
Playwright < 1.35 Avoid assuming newer helper APIs without checking the installed version first.
Node < 16 Treat as upgrade-required for modern Playwright usage; do not present a "ready to run" claim.
Node < 18 Avoid assuming newer Web API defaults and modern runner ergonomics without verification.

Framework adaptation checklist: - Next.js: verify baseURL, server startup, and auth/session bootstrapping strategy. - SPA: prefer explicit waits on route or API completion, not arbitrary sleeps. - SSR: assert server-rendered and hydrated states separately when needed. - Monorepo: locate the owning package, config file, and CI entrypoint before generating commands.

Playwright-First Engineering Rules

Use references/playwright-deep-patterns.md whenever generating or refactoring Playwright code.

At minimum:

  • prefer reusable fixtures and domain helpers over copy-pasted flows
  • use stable auth setup such as storageState when appropriate
  • isolate data per test or per worker
  • define what can be mocked and what must stay real
  • choose serial vs parallel execution intentionally
  • keep retries, trace, screenshot, and video policies aligned with CI needs

If the repository lacks the needed config or fixtures, generate the smallest honest scaffold rather than pseudo-runnable code.

Anti-Examples

1) Unconditional waitForTimeout in assertions

BAD:

await page.waitForTimeout(3000);
await expect(page.getByText("Order confirmed")).toBeVisible();
GOOD:
await expect(page.getByText("Order confirmed")).toBeVisible();

2) Fragile CSS selector chains

BAD:

await page.locator(".app > div:nth-child(2) .cta.primary").click();
GOOD:
await page.getByRole("button", { name: "Continue" }).click();

3) Shared mutable data across tests

BAD:

const sharedEmail = "e2e-user@example.com";
test("profile update", async ({ page }) => { /* mutates same account */ });
GOOD:
const email = `e2e-${test.info().parallelIndex}-${Date.now()}@example.com`;
test("profile update", async ({ page }) => { /* isolated data per test */ });

4) Guessing env values or credentials

BAD:

await page.goto("http://staging");
await page.fill("#email", "fake-user@test.com");
await page.fill("#password", "password123");
GOOD:
test.skip(!process.env.E2E_BASE_URL || !process.env.E2E_USER, "explicit TODOs until config exists");
await page.goto(process.env.E2E_BASE_URL!);

5) Silently serializing entire suite

BAD:

test.describe.configure({ mode: "serial" });
GOOD:
test.describe("checkout funnel", () => {
  test.describe.configure({ mode: "serial" }); // justified by irreversible payment sandbox state
});

6) Repeating login instead of storageState

BAD:

test.beforeEach(async ({ page }) => {
  await page.goto("/login");
  await page.fill("#email", process.env.E2E_USER!);
});
GOOD:
test.use({ storageState: "playwright/.auth/user.json" });

7) Pseudo-runnable scaffold without test.skip

BAD:

test("checkout", async ({ page }) => {
  await page.goto(process.env.E2E_BASE_URL!);
});
GOOD:
test.skip(!process.env.E2E_BASE_URL, "missing base URL");
// TODO: wire payment sandbox account before enabling this journey

Agent Browser Bridge

Use Agent Browser to discover or reproduce, then convert findings into durable code.

Required bridge steps:

  1. capture the environment and entry URL
  2. record the exact command sequence
  3. save milestone screenshots
  4. note the selectors or semantic targets that proved stable
  5. translate the validated flow into Playwright assertions and helpers

Read references/agent-browser-workflows.md when using Agent Browser.

Flaky Test Policy

See references/checklists.md §Flaky Triage Template for the complete template.

Key rules: - A test is flaky only with non-deterministic behavior under unchanged code and environment - Required sequence: reproduce (--repeat-each=N or -count=N) → classify root cause → fix → quarantine with deadline - Root cause categories: selector instability, async race, test-data coupling, network instability, environment drift, application defect - quarantine only with owner, tracking issue, and removal deadline

CI Strategy

For PR automation, separate Blocking critical journeys from broader nightly coverage:

  • Blocking PR gate:
  • run playwright install --with-deps chromium during setup when browsers are not pre-baked
  • use Secret injection for base URL, auth state bootstrap, and sandbox-only credentials
  • upload-artifact for trace, screenshot, video, and HTML report on failure
  • keep retries and timeout values explicit in config and CI job output
  • Nightly / extended lane:
  • run broader browser matrix, accessibility sweeps, and visual regression
  • increase retries only for known infra volatility, not to hide product bugs

Output Contract

For any E2E task, return:

  1. Task type
  2. Runner choice
  3. Environment gate
  4. Config/dependency status
  5. Covered journey or Failure under triage
  6. Executed commands
  7. Execution status
  8. Artifacts
  9. Next actions

If code was generated, also include:

  • files created or updated
  • skip conditions or TODO markers if scaffolding only

Machine-Readable Summary (JSON)

When the output will be consumed by CI or downstream tooling, append:

{
  "task_type": "new_journey_coverage",
  "runner": "playwright",
  "environment": "local",
  "execution_status": "pass",
  "tests_passed": 3,
  "tests_failed": 0,
  "tests_skipped": 0,
  "artifacts": ["playwright-report/index.html", "test-results/"],
  "scorecard": { "critical": "PASS", "standard": "5/6", "hygiene": "4/4" },
  "blockers": [],
  "next_actions": ["add password-reset edge case"]
}

Quality Scorecard

For non-Playwright runners (Go HTTP, Python requests, etc.), mark Playwright-specific items as N/A. Count only applicable items when computing pass rates.

Critical (any FAIL → overall FAIL)

# Item PASS rule
C1 No unconditional waitForTimeout in assertions Zero instances outside diagnostic comments
C2 Data isolation explicit Each test owns its data or has deterministic cleanup
C3 No guessed secrets or URLs All external values from env/config with skip guard
C4 All 5 mandatory gates addressed Configuration, Environment, Execution Integrity, Stability, Side-Effect

Standard (≥ 4/6 PASS)

# Item PASS rule
S1 Selectors use getByRole/getByLabel/getByTestId ≥ 90% of interactions use accessible selectors
S2 Auth strategy explicit storageState reuse or justified in-test login
S3 Assertions after major interactions Every user-visible state change has an expect
S4 Artifact policy configured trace, screenshot, video settings present
S5 Serial vs parallel justified Serial only with documented reason
S6 Mock boundaries documented Each mocked dependency has rationale

Hygiene (≥ 3/4 PASS)

# Item PASS rule
H1 Reusable fixtures/helpers Shared flows extracted, not copy-pasted
H2 Descriptive test names Name describes user journey, not implementation
H3 CI strategy present Blocking gate vs nightly split documented
H4 Repeat-run validation Critical paths validated with --repeat-each