E2E test¶

Use this skill to create E2E coverage that is deterministic, evidence-backed, and maintainable in real repositories.

Use This Skill For¶

selecting high-value journeys for E2E coverage
creating or updating Playwright tests
using Agent Browser to explore, debug, and reproduce flows
flaky test triage with traces and artifacts
CI gate design for critical journeys
browser automation tasks where reproducibility matters

Do not use this skill for:

visual design review with no automated journey value
performance/load testing
tests that require guessed secrets, endpoints, or private accounts

Load References Selectively¶

Always read (regardless of framework): - references/checklists.md — pre-run, coverage, flaky triage, quarantine, and result checklists. - references/environment-and-dependency-gates.md — environment readiness for local, preview, staging, or CI.

Read only for Playwright / JS projects (skip for Go, Python, or other non-JS): - references/playwright-patterns.md — selector, wait, assertion, config baseline, and version/platform gate rules. - references/playwright-deep-patterns.md — auth state, fixtures, data isolation, mocking, serial/parallel, and CI engineering. - references/anti-examples.md — common Playwright mistakes with corrected alternatives.

Read when using Agent Browser: - references/agent-browser-workflows.md — exploration, failure reproduction, flow-to-code conversion, and command starters.

Read when shaping reports or triaging flakes: - references/golden-examples.md — full output contract examples for Playwright and Go HTTP E2E tasks.

Run before gate decisions to collect repository facts: - scripts/discover_e2e_needs.sh — detects Playwright, Node.js, Go, framework, existing tests, env vars, and CI platform.

Runner Strategy¶

Use both tools intentionally, not interchangeably.

Agent Browser first:
journey discovery
repro of flaky or environment-specific UI behavior
fast semantic interaction and screenshot capture
Playwright preferred for code:
committed E2E tests
CI suites
repeated local validation
multi-browser or matrix execution

If a task starts in Agent Browser and the flow is valuable long-term, convert the learned steps into Playwright coverage.

Operating Model¶

Classify the task:
new journey coverage
flaky triage
failed CI investigation
exploratory browser reproduction
test architecture or CI gate design
Run scripts/discover_e2e_needs.sh to collect repository facts (Playwright version, framework, existing tests, env vars, CI platform). Use its structured output for gate decisions instead of guessing.
Run the environment and configuration gate.
Choose the runner path:
Agent Browser for exploration or reproduction
Playwright for maintainable automated coverage
both when discovery should become code
non-JS projects: use the project's native test framework (see Runner Selection below)
Produce only the strongest deliverable the environment can actually support:
runnable test
guarded scaffold with explicit skips
triage report with repro commands

Runner Selection Guidance¶

If the project has no Node.js / Playwright (e.g., Go, Python, Rust web apps):

Use the project's native test framework (Go net/http, Python requests/httpx, etc.)
Do NOT install Playwright into a project that has no JavaScript toolchain
Follow the project's existing E2E test conventions if they exist
Document the runner selection rationale in the Output Contract
All 5 mandatory gates still apply regardless of runner choice

Mandatory Gates¶

1) Configuration Gate¶

Before generating or updating runnable tests:

scan repository config, scripts, env files, docs, and existing tests
list required variables, accounts, feature flags, and service dependencies
mark each as:
available
missing
unknown

If required values are missing:

do not invent them
generate placeholder-only scaffolding with explicit TODOs and skip guards when code output is still useful
otherwise stop and report the exact blockers

In every result include:

required variable list
example export block or config shape
missing variables

2) Environment Gate¶

Before claiming a test is runnable, determine:

target environment: local, preview, staging, CI
base URL and auth flow
whether seed/reset is deterministic
whether third-party dependencies can be stubbed or must be live
whether test accounts and permissions are available

Read references/environment-and-dependency-gates.md whenever environment readiness is uncertain.

3) Execution Integrity Gate¶

Never claim a suite or repro was executed unless it actually ran.

If commands were not run, output:

Not run in this environment
reason
exact commands to run next

If commands were run, report:

command(s)
target environment
pass/fail status
artifact locations

4) Stability Gate¶

Do not treat a single pass as proof of reliability for critical paths or flaky failures.

Use repeat runs, traces, screenshots, and environment evidence before concluding:

the bug is fixed
the test is stable
the failure is infra-only

5) Side-Effect Gate¶

Default to safe behavior for real-world side effects:

avoid production data mutation
avoid real-money or irreversible flows unless explicitly configured for safe test execution
require explicit approval or isolation for destructive actions

Version and Platform Gate¶

Before recommending Playwright code or config, adapt to the repository's actual platform:

Signal	Adaptation
Playwright `< 1.27`	Prefer `locator` and stable attribute selectors. Do not assume `getByRole`, `getByLabel`, `getByTestId`, or `getByPlaceholder` are available.
Playwright `< 1.30`	Be conservative with newer trace and snapshot ergonomics; keep config minimal and explicit.
Playwright `< 1.35`	Avoid assuming newer helper APIs without checking the installed version first.
Node `< 16`	Treat as upgrade-required for modern Playwright usage; do not present a "ready to run" claim.
Node `< 18`	Avoid assuming newer Web API defaults and modern runner ergonomics without verification.

Framework adaptation checklist: - Next.js: verify baseURL, server startup, and auth/session bootstrapping strategy. - SPA: prefer explicit waits on route or API completion, not arbitrary sleeps. - SSR: assert server-rendered and hydrated states separately when needed. - Monorepo: locate the owning package, config file, and CI entrypoint before generating commands.

Playwright-First Engineering Rules¶

Use references/playwright-deep-patterns.md whenever generating or refactoring Playwright code.

At minimum:

prefer reusable fixtures and domain helpers over copy-pasted flows
use stable auth setup such as storageState when appropriate
isolate data per test or per worker
define what can be mocked and what must stay real
choose serial vs parallel execution intentionally
keep retries, trace, screenshot, and video policies aligned with CI needs

If the repository lacks the needed config or fixtures, generate the smallest honest scaffold rather than pseudo-runnable code.

Anti-Examples¶

1) Unconditional waitForTimeout in assertions¶

BAD:

await page.waitForTimeout(3000);
await expect(page.getByText("Order confirmed")).toBeVisible();

GOOD:

await expect(page.getByText("Order confirmed")).toBeVisible();

2) Fragile CSS selector chains¶

BAD:

await page.locator(".app > div:nth-child(2) .cta.primary").click();

GOOD:

await page.getByRole("button", { name: "Continue" }).click();

3) Shared mutable data across tests¶

BAD:

const sharedEmail = "e2e-user@example.com";
test("profile update", async ({ page }) => { /* mutates same account */ });

GOOD:

const email = `e2e-${test.info().parallelIndex}-${Date.now()}@example.com`;
test("profile update", async ({ page }) => { /* isolated data per test */ });

4) Guessing env values or credentials¶

BAD:

await page.goto("http://staging");
await page.fill("#email", "fake-user@test.com");
await page.fill("#password", "password123");

GOOD:

test.skip(!process.env.E2E_BASE_URL || !process.env.E2E_USER, "explicit TODOs until config exists");
await page.goto(process.env.E2E_BASE_URL!);

5) Silently serializing entire suite¶

BAD:

test.describe.configure({ mode: "serial" });

GOOD:

test.describe("checkout funnel", () => {
  test.describe.configure({ mode: "serial" }); // justified by irreversible payment sandbox state
});

BAD:

test.beforeEach(async ({ page }) => {
  await page.goto("/login");
  await page.fill("#email", process.env.E2E_USER!);
});

GOOD:

test.use({ storageState: "playwright/.auth/user.json" });

7) Pseudo-runnable scaffold without test.skip¶

BAD:

test("checkout", async ({ page }) => {
  await page.goto(process.env.E2E_BASE_URL!);
});

GOOD:

test.skip(!process.env.E2E_BASE_URL, "missing base URL");
// TODO: wire payment sandbox account before enabling this journey

Agent Browser Bridge¶

Use Agent Browser to discover or reproduce, then convert findings into durable code.

Required bridge steps:

capture the environment and entry URL
record the exact command sequence
save milestone screenshots
note the selectors or semantic targets that proved stable
translate the validated flow into Playwright assertions and helpers

Read references/agent-browser-workflows.md when using Agent Browser.

Flaky Test Policy¶

See references/checklists.md §Flaky Triage Template for the complete template.

Key rules: - A test is flaky only with non-deterministic behavior under unchanged code and environment - Required sequence: reproduce (--repeat-each=N or -count=N) → classify root cause → fix → quarantine with deadline - Root cause categories: selector instability, async race, test-data coupling, network instability, environment drift, application defect - quarantine only with owner, tracking issue, and removal deadline

CI Strategy¶

For PR automation, separate Blocking critical journeys from broader nightly coverage:

Blocking PR gate:
run playwright install --with-deps chromium during setup when browsers are not pre-baked
use Secret injection for base URL, auth state bootstrap, and sandbox-only credentials
upload-artifact for trace, screenshot, video, and HTML report on failure
keep retries and timeout values explicit in config and CI job output
Nightly / extended lane:
run broader browser matrix, accessibility sweeps, and visual regression
increase retries only for known infra volatility, not to hide product bugs

Output Contract¶

For any E2E task, return:

Task type
Runner choice
Environment gate
Config/dependency status
Covered journey or Failure under triage
Executed commands
Execution status
Artifacts
Next actions

If code was generated, also include:

files created or updated
skip conditions or TODO markers if scaffolding only

Machine-Readable Summary (JSON)¶

When the output will be consumed by CI or downstream tooling, append:

{
  "task_type": "new_journey_coverage",
  "runner": "playwright",
  "environment": "local",
  "execution_status": "pass",
  "tests_passed": 3,
  "tests_failed": 0,
  "tests_skipped": 0,
  "artifacts": ["playwright-report/index.html", "test-results/"],
  "scorecard": { "critical": "PASS", "standard": "5/6", "hygiene": "4/4" },
  "blockers": [],
  "next_actions": ["add password-reset edge case"]
}

Quality Scorecard¶

For non-Playwright runners (Go HTTP, Python requests, etc.), mark Playwright-specific items as N/A. Count only applicable items when computing pass rates.

Critical (any FAIL → overall FAIL)¶

#	Item	PASS rule
C1	No unconditional `waitForTimeout` in assertions	Zero instances outside diagnostic comments
C2	Data isolation explicit	Each test owns its data or has deterministic cleanup
C3	No guessed secrets or URLs	All external values from env/config with skip guard
C4	All 5 mandatory gates addressed	Configuration, Environment, Execution Integrity, Stability, Side-Effect

Standard (≥ 4/6 PASS)¶

#	Item	PASS rule
S1	Selectors use `getByRole`/`getByLabel`/`getByTestId`	≥ 90% of interactions use accessible selectors
S2	Auth strategy explicit	storageState reuse or justified in-test login
S3	Assertions after major interactions	Every user-visible state change has an `expect`
S4	Artifact policy configured	trace, screenshot, video settings present
S5	Serial vs parallel justified	Serial only with documented reason
S6	Mock boundaries documented	Each mocked dependency has rationale

Hygiene (≥ 3/4 PASS)¶

#	Item	PASS rule
H1	Reusable fixtures/helpers	Shared flows extracted, not copy-pasted
H2	Descriptive test names	Name describes user journey, not implementation
H3	CI strategy present	Blocking gate vs nightly split documented
H4	Repeat-run validation	Critical paths validated with `--repeat-each`

E2E test¶

Use This Skill For¶

Load References Selectively¶

Runner Strategy¶

Operating Model¶

Runner Selection Guidance¶

Mandatory Gates¶

1) Configuration Gate¶

2) Environment Gate¶

3) Execution Integrity Gate¶

4) Stability Gate¶

5) Side-Effect Gate¶

Version and Platform Gate¶

Playwright-First Engineering Rules¶

Anti-Examples¶

1) Unconditional waitForTimeout in assertions¶

2) Fragile CSS selector chains¶

3) Shared mutable data across tests¶

4) Guessing env values or credentials¶

5) Silently serializing entire suite¶

6) Repeating login instead of storageState¶

7) Pseudo-runnable scaffold without test.skip¶

Agent Browser Bridge¶

Flaky Test Policy¶

CI Strategy¶

Output Contract¶

Machine-Readable Summary (JSON)¶

Quality Scorecard¶

Critical (any FAIL → overall FAIL)¶

Standard (≥ 4/6 PASS)¶

Hygiene (≥ 3/4 PASS)¶