Articles / Software / Agent Skills

Introducing the playwright-pro agent skill

Picture the test suite you actually want. A hundred tests deep, and it still runs fast. A new starter can open the folder and instantly see where everything lives. CI comes back green in minutes, not coffee breaks, and when something does break, the failure is real, not flake. That suite is not a fantasy. It is mostly a question of architecture, and architecture is exactly the thing nobody remembers to design until it is too late.

Here is the uncomfortable truth about Playwright: your tests are probably fine. It is the structure around them that quietly rots. Files pile up in one folder. Every test logs in from scratch. Config gets copy-pasted until the environments silently disagree. Retries get cranked up to bury flakiness instead of fixing it. None of that is a test-writing problem, and no amount of writing better tests will save you from it.

That is the gap the playwright-pro skill exists to close. It is prescriptive about the boring, load-bearing parts (layout, config, fixtures, auth state, and CI execution) because those are the parts that decide whether your suite still makes sense a year from now.

The 11pm Problem You Already Recognise

Writing your first Playwright test is a joy. Keeping a hundred of them fast, isolated, and readable is a slow-motion nightmare, and it always shows up in the same places:

  • every test logs in from scratch instead of reusing saved auth state
  • specs, page objects, and fixtures all live in one flat folder
  • retries are cranked up to hide flakiness instead of fixing it
  • CI runs everything on one worker and takes far too long
  • config is copy-pasted between environments until it quietly diverges

If reading that list made you wince, you already know the feeling: 11pm, a red CI run, and a suite that fights you instead of helping. These are architecture problems, and they are exactly what this skill targets.

How It Actually Works

The skill starts by scanning the project, reading package.json and checking whether a playwright.config.ts and tests/ directory already exist. From there it branches into one of three modes:

  • New project. No config, no tests. It scaffolds the full recommended structure: config, directory layout, auth setup, and .gitignore entries.
  • Existing project. Config or tests already present. It runs an evidence-driven scorecard against the current suite and reports back, without changing anything unless you ask.
  • Targeted request. You name one concern, such as “add CI sharding” or “wire up fixtures”, and it goes straight to that without touching the rest.

And here is the bit that earns trust: it asks before overwriting an existing config and only creates the files a project is actually missing. It does not barge in and regenerate work that already looks correct. Your good decisions stay yours.

You Point, It Builds

You do not drive it file by file. You point it at a project and state your intent:

  • “Set up Playwright for this app” triggers a full scaffold with a scalable tests/ layout, an auth setup project, and a sensible multi-browser config.
  • “Audit my Playwright suite” runs the architecture scorecard and hands back a graded review with concrete findings.
  • “Split the run across CI shards” or “add storage-state auth” jumps straight to that one job.

The output is real files and real config, not a cheerful suggestion to go and read the docs. After scaffolding it stops, lists what it created, and asks whether anything needs adjusting before you build on top of it.

What You Actually Walk Away With

Most of these are the kind of wins you feel later, on the day the suite would otherwise have started hurting:

  • A layout that scales. Feature-scoped tests, shared page objects, a fixture barrel, and separate folders for e2e, API, visual, and public-surface tests. Six months from now, you still know where things go.
  • Auth state done once. A setup project saves storage state so the rest of the suite skips the login flow entirely. This is often the single biggest speed win, and you get it for free.
  • CI-ready execution. Parallelism, sharding, retry policy, and trace/video artifacts configured deliberately instead of guessed at.
  • Retries that do not lie. A policy that survives real flakiness without masking genuine failures, so a green run means green.
  • An honest audit. For existing suites, a scorecard that ties every finding back to a file or config line rather than vague stylistic judgement. You can act on it, and you can trust it.

Where It Really Pays Off

The skill earns its keep at the two ends of a suite’s life. At the start, it prevents the structural debt that is agony to unwind once fifty tests already depend on the wrong shape. Later, it hands you a way to inspect a suite you inherited and say precisely what is weak, whether that is slow auth, no sharding, or config drift, with evidence attached instead of a shrug.

The Honest Catch

Fair is fair, so here is what it will not do.

That boundary is the point, not a weakness. It fixes the layer you cannot easily fix later and leaves the craft of writing individual tests where it belongs, with you.

Bottom Line

The playwright-pro skill is at its best when the question is not “how do I write this test” but “how should this whole suite be shaped so it stays fast and maintainable”. It scaffolds that shape for new projects and grades it for the ones you have already got.

It will not turn a bad test into a good one. What it will do is make sure the suite around your tests is designed on purpose, so all the effort you pour into writing them does not get buried under a structure that cannot scale.

So try it on the suite you already dread. Point the playwright-pro skill at your messiest project, run the audit, and read the scorecard. Worst case, you learn exactly where the bodies are buried. Best case, you get the fast, sane, hundred-test suite you actually wanted.

Track your next build in FliprForge

Open the app