How to Write a CLAUDE.md (and AGENTS.md) for Test Automation

The first time I let a coding agent write a test for me, it handed back something that broke nearly every rule my suite has. A CSS class for a locator I had banned years ago. A cy.wait(3000) dropped on a race condition. No test ID, the selectors sitting inline in the spec, an assertion that the page loaded and nothing more.

It was not being careless. It simply did not know any of my conventions, because I had never told it. A coding agent writes whatever the codebase nudges it toward, and if your standards live in your head, it cannot follow them.

So I stopped re-explaining myself every session. I wrote my standards down once, in the one file the agent reads before it writes a line. For Claude that file is CLAUDE.md. Other agents read AGENTS.md, the cross-tool open standard that Cursor, Codex, and Copilot look for. Same idea, different name, so wherever I say CLAUDE.md, read it as “the file your agent loads.”

That one file is the difference between generated tests I have to rebuild and tests I can actually review. Below is the whole thing I work from, ready to copy. But the file is the easy half. The part worth your time is how you build it, because you do not write this file by hand. You build it with the agent.

Get the ready-made file, freeCopy or download →the whole thing, as a `CLAUDE.md` and an `AGENTS.md`

Two rules for the file itself

Before the how, two rules decide whether this file does anything at all:

Keep it tight. The agent loads the whole file at the start of every session, before you ask for anything, so every line costs context. A long, rambling file buries the real rules in noise and the agent follows them less. The official guidance is to keep it under roughly 200 lines. My first version was too thin and the tests stayed generic; the next was too long and the agent missed rules in the middle. Tight is the target.
Make every rule checkable. “Write good locators” tells an agent nothing. “Use role, aria-label, or a data-* hook, never a CSS class or an nth-child chain” tells it exactly what passes and what fails. Write rules the agent can hold its own work against.

One thing matters even more than the file: how much real context the agent can see. The file sets your conventions, but the agent still needs your code, the spec, and the running app to test what is really built, not what it assumes. Give your AI real context for QA is the companion to this piece, worth setting up first.

How I actually build it

Most write-ups hand you a blank file and tell you to fill it in. I never do that. The agent that is going to read this file already knows what a good rules file looks like, because it was trained on thousands of them. My job is to bring the judgment of what good looks like, then direct it to draft, prove, and tune. None of this is Claude-only; the same loop runs in Cursor, Codex, or Copilot.

The order I actually work in is five steps:

Talk it out with a conversational AI first.
Mine your corrections.
Have the agent read your suite and draft the file.
Make it prove the file by writing and running real tests.
Fold the fixes back in as new rules.

Here is each step, with the exact prompt I use. Copy them.

1. Talk it out before you touch the repo. Before I create a file, I think the whole thing through with a conversational assistant, usually ChatGPT, on what I want the suite to enforce. I talk, it pushes back, and I get clear on what the file needs to say. This step writes nothing. It just gets me specific, so the agent is not guessing later.

2. Mine your corrections. The things you keep retyping are your highest-value rules, and the agent already remembers them:

Based on everything you know about how I work and every correction I have given you, list the things I correct you on the most. Group them so I can turn each into a checkable rule.

That surfaces the rules that live in your head as “not this again,” the ones you would never have thought to write down.

3. Have it read your suite, then draft. Point the coding agent at the real code and make it work out your conventions before it proposes a single rule:

Read this test suite before you write anything. Open the specs, the page objects, the drivers, the fixtures, and the config, and work out the conventions already in use: how locators are written, how interactions are wrapped, where selectors and test data live, how tests are named and isolated, and where the suite contradicts itself. Then draft a CLAUDE.md at the repo root that turns those conventions into rules an agent can check itself against, each with a one-line example of what passes and what fails. Keep it under 200 lines. Show me the draft and the biggest inconsistencies you found, and do not change a test yet.

The draft comes back specific to your suite, not a generic template, because it read your real code.

4. Make it prove the file. A draft is a guess until you watch it generate against the rules, so make it use the file the moment it exists:

Using only the rules in that CLAUDE.md, write three tests and run them: one simple case, one hard case with real setup, and one edge case I keep meaning to cover. Every test must assert what the user actually sees, not just that the page loaded. Let each test fail for one clear reason; if a test checks two behaviors, split it. Run them until they pass, and show me the output.

Two failures show up here constantly: the test that does all the work and then asserts almost nothing, and the test that quietly checks three things at once so you cannot tell which one broke. Name them both. It is not finished until you have watched it pass. A generated test nobody ran is a guess in a green wrapper.

5. Fold the fixes back in. When you give a correction, have the agent add it to the file as a new rule. The correction you give once is now enforced on every test the agent writes after it:

Me → Claude Code

Based on every correction I gave you this session, add the general ones to CLAUDE.md as checkable rules, each with a one-line example of what passes and what fails. Keep it under 200 lines; if it would push past that, tell me what to move to a skill or a path-scoped rule.

Claude Code

Added four rules from this session: drivers only, no fixed sleeps, page-object selectors, stable test IDs. Each has a pass/fail example. The file is now 168 lines. Your "how to scaffold a new page object" correction is a multi-step procedure, so I left it out and flagged it as a skill instead. Want me to draft that skill?

The correction you give once is now enforced on every test the agent writes after it. That is the loop that pays off: the corrections you make this week become the defaults next week.

The whole file: copy this

Here is what that loop produced for me, genericized. It is drawn from a Cypress suite but written so the principles carry to Playwright or WebdriverIO, and it is the complete thing in one block: how to run, the layout, the rules, and the review fence at the bottom. Drop it at your repo root, edit the bracketed parts, and commit it.

Get the whole file, freeCopy or download →ready as a `CLAUDE.md` and an `AGENTS.md`

# Test suite standards (read before writing or changing any test)

## How to run
- Install: `npm ci`
- Run all: `npx cypress run`        # swap for your runner
- Run one: `npx cypress run --spec [path]`
- Lint: `npm run lint`

## Layout
- Specs in `cypress/e2e/`, page objects in `cypress/pages/`,
  user-action drivers in `cypress/support/drivers/`.

## Rules (every one is checkable)
- Driver-only interactions. Specs never call a raw framework command
  (`cy.get`, `page.click`). Every interaction goes through a named driver
  that is a USER ACTION (`openFirstRowDrawer()`), not a primitive
  (`clickButton(selector)`). If no driver exists, add one. Do not inline.
- Page Object Model. All selectors, visible text, and URL params live in
  page object files, never inline in a spec. One place to change per screen.
- Self-documenting specs. No inline comments. The test title and the driver
  names say what is happening; if you reach for a comment, the name is wrong.
- Stable selectors only. Use role, aria-label, or a data-* hook. Never a
  CSS class, a DOM hierarchy path, or an nth-child chain.
- No arbitrary waits. Never sleep for a fixed time. Wait on a real signal:
  an intercepted API response, an element becoming visible, or a content
  assertion. Intercept the call BEFORE the action that triggers it.
- Every test has a stable ID. Each test starts with a non-empty ID tag
  (the existing case ID, or a feature prefix + zero-padded number).
- Test isolation. Every test passes in any order. Set up all state in
  `beforeEach`; cache login for speed but navigate fresh. No test reads
  another test's leftover state.
- Resilient to live data. No exact-count or specific-name assertions
  against real environments. Use "at least" assertions and regex; guard a
  possibly-empty table with a synchronous DOM check, not a retrying query.
- Assert what the user sees. Validate visible text, element presence, and
  URL state, not implementation details. Prefer positive assertions;
  reserve negatives for access-denied and permission tests.

## Never generate (send it back if you see it in a spec)
- a fixed-time sleep of any kind (`cy.wait(3000)`)
- a raw framework command in a spec (use a driver)
- an inline selector or inline text (it belongs in a page object)
- a brittle selector (long nth-child, positional, or generated class)
- hardcoded data that rots when the environment resets
- a test that depends on another test's state
- an exact-count assertion against live data
- an empty test ID

That is the entire file. The top half is the rules, the things the agent should do. The Never generate block at the bottom is the review fence, the things that should never appear in a generated test. I keep the fence as an explicit list because it doubles as my review checklist: when a spec comes back, I scan for those eight things first.

That fence is not theoretical for me. I once spent real effort turning a suite people had stopped trusting back into one they did, and the single biggest fix was tearing out the fixed sleeps and waiting on a real signal instead: the actual network response, the element appearing, the call intercepted before the action that triggers it. A fixed-time sleep is the first line of that list because it is the first thing I had to remove by hand, hundreds of times. Putting it in the fence means the agent never writes the flake I would only catch later.

Notice what is not in the file: no step-by-step walkthrough of setting up a page object, no API-test-only convention. Those are procedures and corner cases, and they belong in their own homes so this file stays short enough that the agent reads all of it. A multi-step procedure you run on demand is better off as a skill, which is a piece on its own, how to build a Claude skill for QA.

If the full file is too much, start with six lines

You do not have to begin with the whole thing. If you want the shortest possible starting point, drop this at your repo root and grow it from there:

# AGENTS.md - how AI should work in this test suite
- Test runner is Playwright. Specs live in `tests/`, named `verify-<behaviour>.spec.ts`.
- Locate elements by role and visible text, never by auto-generated class names.
- Use the seeded accounts in `tests/fixtures/`. Never use real customer data.
- A test must fail for one clear reason. If it can fail for two, split it.
- Before adding a test, check `tests/` for one to extend instead.
- Run `npx playwright test --repeat-each=5` on any new test before opening a PR.

Six lines, every one checkable. Start there, then layer in the rules and the fence as your suite earns them.

One gotcha: Claude does not read AGENTS.md

This is the one people get wrong. Claude Code reads CLAUDE.md. It does not read AGENTS.md. So if your repo already has an AGENTS.md, do not copy its contents into a CLAUDE.md and let the two drift apart, you will fix a rule in one and forget the other. Instead, make the first line of your CLAUDE.md an import of the shared file, @AGENTS.md, then add anything Claude-specific below it. Or, if you have nothing Claude-specific to add, symlink the two so there is one real file: ln -s AGENTS.md CLAUDE.md. One more bit of good news: run /init in a repo that already has an AGENTS.md and Claude Code folds it into the CLAUDE.md it generates, so you are not starting from scratch.

The honest limit: it is context, not a gate

A rules file shapes behaviour. It does not enforce anything. It is context the agent reads and is strongly nudged by, not a wall it cannot climb over. Most of the time the nudge is enough, and a good file changes the default output dramatically. But “strongly nudged” is not “blocked,” so do not treat it as a guarantee.

For the non-negotiable few, no fixed waits, no XPath, every test isolated, back the rule with a hook or a lint rule or a CI check so a spec that violates it cannot merge. A hook is a command the harness runs at a fixed point in the lifecycle, like before every commit, whether the agent feels like it or not. The file says “please don’t”; the gate says “you can’t.” A rule you are not willing to let block a merge was never really a rule. It was a preference.

Start this week

Talk it out, then mine your corrections. Shape what the suite should enforce with a conversational AI first, then ask your coding agent to list what it corrects you on the most.
Have the agent draft and prove it. Point it at your suite, have it draft the file from your real conventions, then make it write a simple, a hard, and an edge-case test and run them until they pass with real assertions.
Handle AGENTS.md. If the repo already has one, make @AGENTS.md the first line of your CLAUDE.md, or symlink the two so they never drift.
Commit it, then evolve it. It is a team-shared file, so every teammate and every agent inherits the same rules. Each time you catch yourself retyping a correction, add it as a checkable rule and keep the file tight.

Do that and the next test the agent writes comes back in your conventions, with your locators, your drivers, your IDs, ready to review instead of rebuild. The file does the onboarding you used to do by hand, every session, for free. The ready-to-paste AI test automation standards file will save you the typing.

How to Write a CLAUDE.md (and AGENTS.md) for Test Automation

Two rules for the file itself

How I actually build it

The whole file: copy this

If the full file is too much, start with six lines

One gotcha: Claude does not read AGENTS.md

The honest limit: it is context, not a gate

Start this week

Julia Pottinger

Comments 0

Keep reading

The Visual Bugs AI Missed in My Game (and How I Caught Them)

AI Visual Testing: What It Can Check and Where Humans Decide

Match-3 Game Mechanics With AI: Powers, Combos, Balance, and Feel