Subscribe Watch on YouTube
All articles

How to Write a CLAUDE.md (and AGENTS.md) for Test Automation

Point a coding agent at your test suite, ask for a new test, and watch what comes back. A CSS class for a locator your team banned two years ago. A cy.wait(3000) to make a race condition go away. A test with no ID, inline selectors in the spec, an assertion that the page loaded.

It ignored every convention your suite has. Not out of malice. It simply did not know, because nobody told it, and a coding agent writes whatever the codebase nudges it toward.

The fix is not to re-explain your conventions every session until you are tired. Write them down once, in a file the agent reads before it writes a line.

For Claude that file is CLAUDE.md; the cross-tool version other agents read is AGENTS.md, an open standard Codex, Cursor, and others use. Same idea, different names, and nearly everything here applies to both, so wherever I say CLAUDE.md, read it as “the file your agent loads.”

I work from a real one for a Cypress end-to-end suite. The gap between a thin file and a serious one is the gap between generated tests I have to rebuild and tests I can actually review.

I have a shorter take in using AI to generate tests and test data, where the file is one step in a bigger loop. This is the deeper version, the one I work from when the file does the onboarding I used to do by hand.

First, one thing moves output quality more than this file does: how much real context the agent can see. A CLAUDE.md sets your conventions, but the agent still needs your actual code, the spec, the tickets, and the running app to test what is really built, not what it assumes.

Conventions and context are two halves of one job. Give your AI real context for QA is the companion to this piece, worth setting up first.

What a CLAUDE.md is, and why it has to stay tight

A CLAUDE.md is a plain markdown file of persistent instructions. Claude Code loads it in full at the start of every session, before you ask for anything. That is the whole point and the whole constraint.

You own what goes in it, but owning it does not mean typing it from a blank page. The best way to build one is to have the agent draft and prove it while you bring the judgment, which I come back to below.

People mix this up with the other ways an agent gets context, so it is worth being precise:

MechanismWhen it loadsWho writes it
CLAUDE.mdIn full, every session, at launchYou own it (the agent can draft it)
A skillOn demand, only when the task calls for itYou, as a procedure
Auto memoryWhen relevant, pulled in as neededThe agent, about itself

Because the whole file loads every session, every line costs context. The official Claude Code docs say to keep it under roughly 200 lines, and that number is not arbitrary: a long, rambling file eats the context window, and worse, buries the real rules in noise so the agent follows them less.

My first version was too thin and the tests stayed generic; the next was too long and the agent started missing rules in the middle. Tight and specific is the target.

Specific is the other half of tight. “Write good locators” tells an agent nothing. “Use role, aria-label, or a data-* hook for locators, never a CSS class or an nth-child chain” tells it exactly what passes and what fails. Write rules an agent can check itself against.

The four scopes, and where your rules belong

CLAUDE.md is not one file. It lives at four scopes, which Claude Code loads broad to specific, then concatenates.

The one closest to your working directory is read last, so on a genuine conflict the most local file wins. This matters the moment your personal preference disagrees with the team’s committed standard.

Loaded broad to specific, then concatenated. The file closest to your work wins.1Managed policySet by your organisationOrg-wide · loads first2User~/.claude/CLAUDE.mdEvery project · just you3Project./CLAUDE.md · in version controlThe whole team · your standards live here4LocalCLAUDE.local.mdGitignored · just you · wins on conflictNested files load on demandA CLAUDE.md in a subfolder loads only when the agent reads files there.

Here is what goes where, for a test suite specifically:

  • Managed policy is org-wide and set above you. You will rarely touch it. It is for rules an organisation wants on every repo, like “never paste real customer data into an AI tool.”
  • User, at ~/.claude/CLAUDE.md, follows you across every project on your machine. Put your personal cross-project habits here: how you like commits worded, that you prefer getByRole over getByTestId when both fit. Teammates never see this file.
  • Project, at ./CLAUDE.md or ./.claude/CLAUDE.md, is checked into version control and shared with the whole team. This is the one that matters most for a test suite. Your locator policy, driver pattern, and “never do this” fence all live here, committed, so every teammate and every agent that clones the repo gets the same rules. When I say “write a CLAUDE.md for your suite,” I mean this file.
  • Local, at ./CLAUDE.local.md, is gitignored and personal to you in this one repo. Good for a scratch note (“the staging seed script is flaky on my box, rerun it”) nobody else needs.

A fifth, quieter scope: a CLAUDE.md inside a subdirectory loads on demand only when the agent reads files in that folder. If your API tests have different conventions from your UI tests, a small CLAUDE.md in the API test folder keeps those rules out of the global file and out of context until they are needed.

What to put in it: facts, not procedures

The rule of thumb is simple. CLAUDE.md holds the facts every session needs and cannot infer from the code alone: build and test commands, project layout, naming and structure conventions, the “always do this” and “never do that” rules.

What does not belong is any multi-step procedure or anything that only matters in one corner of the codebase; that has a better home, which I will get to.

For a test suite, the facts every session needs are your standards. This is the genericized version of the file I work from, drawn from a Cypress E2E suite but written so the principles carry to Playwright or WebdriverIO. Drop it at your repo root, edit the bracketed parts, and commit.

# Test suite standards (read before writing or changing any test)

## How to run
- Install: `npm ci`
- Run all: `npx cypress run`        # swap for your runner
- Run one: `npx cypress run --spec [path]`
- Lint: `npm run lint`

## Layout
- Specs in `cypress/e2e/`, page objects in `cypress/pages/`,
  user-action drivers in `cypress/support/drivers/`.

## Rules (every one is checkable)
- Driver-only interactions. Specs never call a raw framework command
  (`cy.get`, `page.click`). Every interaction goes through a named driver
  that is a USER ACTION (`openFirstRowDrawer()`), not a primitive
  (`clickButton(selector)`). If no driver exists, add one. Do not inline.
- Page Object Model. All selectors, visible text, and URL params live in
  page object files, never inline in a spec. One place to change per screen.
- Stable selectors only. Use role, aria-label, or a data-* hook. Never a
  CSS class, a DOM hierarchy path, or an nth-child chain.
- No arbitrary waits. Never sleep for a fixed time. Wait on a real signal:
  an intercepted API response, an element becoming visible, or a content
  assertion. Intercept the call BEFORE the action that triggers it.
- Every test has a stable ID. Each test starts with a non-empty ID tag
  (the existing case ID, or a feature prefix + zero-padded number).
- Test isolation. Every test passes in any order. Set up all state in
  `beforeEach`; cache login for speed but navigate fresh. No test reads
  another test's leftover state.
- Resilient to live data. No exact-count or specific-name assertions
  against real environments. Use "at least" assertions and regex; guard a
  possibly-empty table with a synchronous DOM check, not a retrying query.
- Assert what the user sees. Validate visible text, element presence, and
  URL state, not implementation details. Prefer positive assertions;
  reserve negatives for access-denied and permission tests.

See @./docs/locator-standards.md for the full locator order.

Notice what is not in there: no step-by-step walkthrough of setting up a new page object, no API-test-only convention. Those are procedures and corner cases. They get their own homes so this file stays short enough that the agent reads all of it.

The review fence: what an agent should never generate

The rules above tell the agent what to do. The fence tells it, and you, what should never appear in a generated test.

I keep it as an explicit list because it doubles as my review checklist: when a spec comes back, I scan for these. The fastest tell that an agent ignored the file is a raw command or a fixed sleep in a spec.

Put this block straight into the CLAUDE.md, under the rules:

## Never generate (send it back if you see it in a spec)
- a fixed-time sleep of any kind (`cy.wait(3000)`)
- a raw framework command in a spec (use a driver)
- an inline selector or inline text (it belongs in a page object)
- a brittle selector (long nth-child, positional, or generated class)
- hardcoded data that rots when the environment resets
- a test that depends on another test's state
- an exact-count assertion against live data
- an empty test ID

This is the part that changes your review. Without the fence, you re-derive every objection by hand on every run, catching the same eight things every time.

With it, the agent stops producing most of them, and your attention moves up to the questions that need a human: does this test prove what it claims, and would it go red if the feature broke?

That is the line I draw in the QA control layer for AI-assisted development, and it is why turning your repeated corrections into a committed file matters more than any single rule in it.

I keep this exact file ready to copy and download as a Cypress AI test automation standards resource, as both a CLAUDE.md and an AGENTS.md. Which brings me to the trap.

Imports, and the AGENTS.md gotcha people get wrong

Two things will save you from a bloated file and a duplicated one.

The first is imports. A CLAUDE.md can pull in another file with the @path/to/file syntax, the way the fence above references @./docs/locator-standards.md. Relative and absolute paths both work, and imports chain up to four hops deep.

They organise; they do not save context, because they load at launch right along with the main file. Use them to keep one tidy top-level file that pulls in your longer locator doc, not to smuggle a thousand lines of standards in through the back door.

The second is the one I watch people get wrong constantly. Claude Code reads CLAUDE.md. It does not read AGENTS.md. AGENTS.md is the cross-tool open standard, read by more than twenty other agents; Cursor, Codex, and others look for it. Claude Code is not one of them.

So if your repo already has an AGENTS.md, do not copy its contents into a CLAUDE.md and let the two drift apart. You will fix a rule in one and forget the other, and three months later your tools disagree about your standards.

You have two clean ways to avoid the drift:

  • Import it. Make the very first line of your CLAUDE.md an import of the whole shared file, then add anything Claude-specific below it:
@AGENTS.md

## Claude Code notes
- When no driver exists for an action, add one in
  `cypress/support/drivers/` rather than inlining a raw command.
  • Symlink it. If you have nothing Claude-specific to add, point CLAUDE.md at AGENTS.md with a symlink, so there is one real file and one name for it: ln -s AGENTS.md CLAUDE.md.

One more piece of good news: run /init in a repo that already has an AGENTS.md and Claude Code folds it into the CLAUDE.md it generates, so you are not starting from scratch.

If you want the shortest possible starting point rather than the full Cypress file above, here is a starter AGENTS.md you can drop at your repo root and grow from:

# AGENTS.md - how AI should work in this test suite
- Test runner is Playwright. Specs live in `tests/`, named `verify-<behaviour>.spec.ts`.
- Locate elements by role and visible text, never by auto-generated class names.
- Use the seeded accounts in `tests/fixtures/`. Never use real customer data.
- A test must fail for one clear reason. If it can fail for two, split it.
- Before adding a test, check `tests/` for one to extend instead.
- Run `npx playwright test --repeat-each=5` on any new test before opening a PR.

Six lines, every one checkable. Start there, then layer in the scopes, the fence, and the imports below as your suite earns them.

When a rule should be a skill or a path-scoped rule instead

Not everything you want the agent to do belongs in CLAUDE.md, and stuffing it all in there is exactly how the file bloats past the point where it works. Two questions tell you when something should move out.

Is it a multi-step procedure? Then it is a skill. CLAUDE.md is for facts and standing rules, the things that shape every session. A procedure, like “generate a test plan from a requirements doc in our exact format, with an Open Questions section,” is a sequence of steps you run on demand, not a fact you want loaded all day.

A skill loads only when the task calls for it, so it costs nothing the rest of the time. I have a custom skill for exactly this, and I walk through building one in how to build a Claude skill for QA. The rule of thumb: if you would write it as numbered steps, it is probably a skill.

Does it only matter in one part of the codebase? Then it is a path-scoped rule. Put a markdown file in .claude/rules/ with a paths: glob in its frontmatter, and it loads only when the agent touches files matching that glob. So your API-test conventions can live in a rule scoped to cypress/e2e/api/** and stay out of context when the agent works on UI tests.

---
paths:
  - "cypress/e2e/api/**"
---
# API test rules
- Assert on status code and response schema, not on UI state.
- Seed and tear down through the API, never through the interface.

The principle behind both is the same: protect the context budget of your main CLAUDE.md. Every standing fact goes in it; every procedure becomes a skill; every corner-case convention becomes a path-scoped rule. That keeps the always-loaded file under 200 lines and still encodes everything your team knows.

A rule you want the agent to followA standing fact every session needsCLAUDE.mdLoaded every session · under 200 linesA multi-step procedure, run on demandA skillLoaded on demand · free when idleOnly matters in one folderA path-scoped rule.claude/rules/ · loads on matching paths

You do not write this file by hand. You build it with the agent

Most write-ups hand you a blank CLAUDE.md and tell you to fill it in. I never do that. The agent that will read this file already knows what a good rules file looks like, because it was trained on thousands of them.

My job is not to type the rules from a blank page. It is to bring the judgment of what good looks like, then direct the agent to draft, prove, and tune. I use AI to guide AI.

That judgment is the half you cannot hand off. I know what a stable locator looks like, how a page object should be shaped, why test data belongs apart from the test, because I wrote suites by hand for years before any of this existed.

The sections above are that knowledge written down. You bring it; the agent fills in the mechanical detail and reads the conventions already in your code. None of this is Claude-only. The same loop runs in Cursor, Codex, or Copilot. Only the file name changes.

Here is the loop I actually run.

Talk it out before you touch the repo. Before I create a file, I think the whole thing through with a conversational assistant, usually ChatGPT, on what I want the suite to enforce. It is the best medium I have found for turning a vague want into something specific: I talk, it pushes back.

This step writes nothing. It gets me clear on what the file needs to say, so the agent is not guessing later.

Mine your own corrections. The corrections you keep retyping are the highest-value rules you own, and the agent already has a memory of them. Before you draft anything, ask it:

Based on everything you know about how I work and every correction I have given you, list the things I correct you on the most. Group them so I can turn each into a checkable rule.

This surfaces the rules you would never have thought to write down, the ones that live in your head as “not this again.”

Have the agent investigate, then draft. Point the coding agent at the suite and make it read how it really works before it proposes a single rule:

Read this test suite before you write anything. Open the specs, the page objects, the drivers, the fixtures, and the config, and work out the conventions already in use: how locators are written, how interactions are wrapped, where selectors and test data live, how tests are named and isolated, and where the suite contradicts itself. Then draft a CLAUDE.md at the repo root that turns those conventions into rules an agent can check itself against, each with a one-line example of what passes and what fails. Keep it under 200 lines. If a rule is really a multi-step procedure, leave it out and tell me to make it a skill; if it only matters in one folder, tell me to make it a path-scoped rule. Show me the draft and the biggest inconsistencies you found, and do not change a test yet.

You bring the standard; it brings the reading of your real code. The draft that comes back is specific to your suite, not a template.

Let it plan before it writes. For anything bigger than a one-line change, I have the agent lay out its approach before it generates a line, which on Claude Code is what plan mode is for. A plan you can read is a plan you can correct, and it is far cheaper to fix a wrong approach in a paragraph than in a finished spec.

This is also where the agent should stop and ask when it is unsure instead of inventing a selector and pressing on. A question now saves a rebuild later.

Make it prove the file, then tune from what breaks. A draft is a guess until you watch it generate against the rules, so make the agent use the file the moment it exists. This is where you actually tune it:

Using only the rules in that CLAUDE.md, write three tests and run them: one simple case, one hard case with real setup, and one edge case I keep meaning to cover. Every test must assert what the user actually sees, not just that the page loaded. Assert each thing that matters on its own, and let each test fail for one clear reason; if a test checks two behaviors, split it into two. Run them until they pass, and show me the output.

Two failures show up here constantly, and you have to name them or the agent hands them straight back to you.

  • The test that asserts almost nothing. It does all the work, then makes one weak check that the page exists. Tell it what a real assertion is: visible text, the element, URL state, every outcome that matters, each checked on its own.
  • The test that verifies three things at once. When it goes red you cannot tell which one broke. One behavior per test, every time.

It is not finished until it has run the test and I have watched it pass. I run it myself and make it show me the real output, not a sentence claiming it passed. A generated test nobody ran is a guess in a green wrapper.

Then you give it the fix steps, and this compounds: it does not just patch the three tests, it folds your fixes back into the CLAUDE.md as new rules. The correction you gave once is now enforced on every test the agent writes after. The same prove-and-tune loop sharpens a skill, too.

Then make it turn on its own work. Green tests are where I start, not where I stop. I ask the agent the same things I would in a review: where are the gaps, the bugs, what would make this more maintainable, more readable, and I tell it to go deep, not skim.

When it gets something right, I tell it to remember the path that worked and fold that into the file or skill, so good is repeatable, not a one-off. Your corrections teach it what to stop doing; this teaches it what to keep doing.

If you would rather start from a machine-generated skeleton, /init reads your repo and writes a first CLAUDE.md with the build commands, layout, and conventions it can see in the code; if one already exists it suggests improvements instead of overwriting.

What /init cannot do is read your mind, so you still bring your locator policy, driver pattern, and the three things your team never does. Bootstrap, then tune with the loop above.

Then keep refining from the corrections you make as you work. When you catch yourself typing the same correction you typed last session, it belongs in the file. After a few rounds you can hand the writing to the agent and have it police its own length:

Me → Claude Code
Based on every correction I gave you this session, add the general ones to CLAUDE.md as checkable rules, each with a one-line example of what passes and what fails. Keep it under 200 lines; if it would push past that, tell me what to move to a skill or a path-scoped rule.
Claude Code
Added four rules from this session: drivers only, no fixed sleeps, page-object selectors, stable test IDs. Each has a pass/fail example. The file is now 168 lines. Your "how to scaffold a new page object" correction is a multi-step procedure, so I left it out and flagged it as a skill instead. Want me to draft that skill?

To capture a one-off, tell the agent “remember that…” or use the # shortcut, which saves to auto memory; /memory lists and edits everything currently loaded.

But a real standard, the kind every teammate should inherit, goes in the committed CLAUDE.md, not your personal memory. The deeper reason this loop matters lives in the QA control layer; I will not re-argue it here.

The honest limit: it is context, not a gate

Here is the thing the marketing around these files tends to skip. A CLAUDE.md shapes behaviour; it does not enforce anything. It is context the agent reads and is strongly nudged by, not a wall it cannot climb over.

Most of the time the nudge is enough, and a good file changes the default output dramatically. But “strongly nudged” is not “blocked,” so do not treat it as a guarantee.

So if a rule absolutely must run at a fixed point, every time, no exceptions, a CLAUDE.md line is the wrong tool. Use a hook: a command the harness runs at a defined moment in the lifecycle, like before every commit or after each edit, whether the agent feels like it or not.

“Format every file after an edit” or “block a commit if a spec contains cy.wait with a number” is hook work, not CLAUDE.md work. The file says “please don’t”; the hook says “you can’t.”

Some rules are not optional, and those need to live somewhere that can actually fail. No XPath, no fixed wait, every test isolated: put a rule like that where it breaks the build, a lint rule or a CI check, so a spec that violates it cannot merge.

A rule you are not willing to let block a merge was never really a rule; it was a preference. Keep the CLAUDE.md for the standards that shape the work, and back the non-negotiable few with a gate that does not have a bad day.

Knowing which rules need a hard gate and which are fine as guidance is its own judgment call, the kind that stays with you. Speed without judgment is just faster guessing, and a rules file is no different: it makes the agent fast, not right.

Start this week

You do not need a perfect file to get value. You need a real one, committed, that you improve as you go. Here is the whole thing in five steps:

  1. Talk it out, then mine your corrections. Shape what the suite should enforce with a conversational AI first, then ask your coding agent to list what it corrects you on the most.
  2. Have the agent draft and prove it. Point it at the suite, have it draft the file from your real conventions (hold it to the standards and fence above), then make it write a simple, a hard, and an edge-case test and run them until they pass with real assertions.
  3. Handle AGENTS.md. If the repo already has one, make @AGENTS.md the first line of your CLAUDE.md or symlink the two, so they never drift.
  4. Commit it. It is a team-shared project file. The point is that every teammate and every agent inherits the same rules.
  5. Evolve it. Each time you retype a correction, add it as a checkable rule. When a rule turns into a procedure, move it to a skill; when it only matters in one folder, move it to a path-scoped rule. Keep the always-loaded file tight.

Do that and the next test the agent writes comes back in your conventions, with your locators, your drivers, your IDs, ready to review instead of rebuild. The file does the onboarding you used to do by hand, every session, for free.

For the bigger workflow, using AI to generate tests and test data is the next read, and the ready-to-paste AI test automation standards file will save you the typing.

Found it useful? Share it.
Julia Pottinger

Written by

Julia Pottinger

Hi, I'm Julia. I've been in QA for over a decade. I spend my days testing software and my own time building apps and games, and I write here to share what I learn, the practical, honest lessons you can actually use.

Comments 0

Share your thoughts, ask questions, or add to the conversation.

Be kind and constructive. Stay on topic. No spam or self-promotion.
Loading comments…