Tool · Playbook
Give your AI real context.
AI is only as good as the context you give it. An agent that cannot see your code, your spec, and the conversations around the work is guessing. Connect the real sources through Model Context Protocol (MCP), and know exactly where the human stays non-negotiable.
An agent working blind is just guessing
You would not ask a new tester to find bugs without showing them the app, the spec, and where the team made its decisions. An AI agent is no different. Most disappointing AI output is not the model being weak. It is the model working blind, asked to reason about a product it has never been allowed to see. Give it what a good teammate would already have, and the quality of what comes back changes completely.
Real access does not move the judgment. It moves the grunt work. The agent reads, analyses, and drafts against your actual system. You keep the risk call, the visual sense, and the decision to ship.
Your real context
AI agent
reads, analyses, drafts
You
risk, visual sense, exploratory testing, and the call to ship
What to connect, and what each one gives QA
These are real MCP servers: the sources a good teammate already has. Switch on the ones that fit your stack, starting with your code.
| Switch on | What it gives QA | Cost |
|---|---|---|
| Filesystem / Git | Reads your actual repo, so it tests what is implemented and not what it assumes. Pair it with a CLAUDE.md or AGENTS.md so it follows your conventions and knows your build and test commands. | free |
| GitHub | Searches code, reads and opens issues, and manages pull requests. This is the find it, fix it, open a PR loop. | free |
| Atlassian (Jira, Confluence) | The official Rovo server reads the spec and the acceptance criteria, and files or updates tickets, all with your own permissions. Tests get grounded in intended behaviour, not a guess. | tiered |
| Slack | Reads the channels and threads where the real decisions and edge cases were worked out, the context that never reaches the ticket. | free |
| Google Drive / Docs | Turns a meeting note into a structured ticket or bug in your own format, filed where it belongs. | tiered |
| Playwright | Drives a real browser through the accessibility tree rather than screenshots, so the agent exercises the UI the way a user, and a screen reader, would. | free |
The repo is the one to connect first. Playwright for the running app comes a close second, because it lets the agent exercise the real UI instead of reasoning about screenshots. The rest you add as the work needs them.
Your first ten minutes
Connecting a server is a small file you commit or a single command. Here is the whole path, and what each step actually looks like on your screen.
- 1Give it your code. Drop a .mcp.json file at your repo root. That one file points the agent at your real frontend and backend, and gives it a browser to drive.
- 2Add your hosted tools. Connect GitHub, Jira, Confluence, and Slack with one command each. You sign in as yourself, so the agent only ever sees what you can see.
- 3Check it connected. Run /mcp and confirm every server says connected. A server that is missing is a source the agent is still guessing about.
- 4Prove it has your context. Ask one question that only works if the wiring is real, and tell it to cite its sources. Real file paths and ticket numbers mean you are good to go.
Step 1. Drop this one file at your repo root. It hands the agent your real code and a browser to drive:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
},
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
} # Give your AI real context for QA: starter kit
# From https://juliapottinger.com/resources/give-ai-real-context/
## 1. Drop this .mcp.json at your repo root
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
},
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}
## 2. Verify the wiring is real
Show me what you can see for this project. List the repo, the issue tracker, the chat workspace, and the running app, and for each one name one specific real thing: a file path, an open ticket, a channel. If you cannot reach one, tell me.
## 3. Turn a conversation into a filed bug
Read the cart-bug discussion in the #support channel, then turn it into a single filed bug in our issue tracker. Pull the reproduction details, the affected browsers, and the free-plan account type out of the conversation, and confirm the likely cause by checking the cart retention and cleanup code in the repo, especially anything that changed in the 1.42 release. Write clear numbered steps, a specific expected versus actual, and list the concrete evidence to attach, including the failing cart request and the relevant server log line. Set severity to High and assign it to the cart team.
## 4. Take the same bug to a pull request
The saved-cart bug is confirmed: the cleanup job is keying off the free-plan retention timestamp instead of the cart timestamp. Find the exact code in the repo, make the smallest fix that keeps saved carts independent of the account retention window, and add a test that fails on the old behaviour and passes on the new one. Open a pull request with a clear description and the test output. Change nothing else.
Steps 2 and 3. Add each hosted server with one command, then run /mcp to watch them connect. Check each provider's MCP docs for the current URL:
$claude mcp add github --transport http <the GitHub MCP url>
✓ Added github. A browser opened for you to sign in.
>/mcp
●filesystemconnectedreads your repo
●playwrightconnecteddrives the browser
●githubconnectedcode, issues, PRs
●atlassianconnectedJira and Confluence
●slackconnectedchannels and threads
Every server you care about should say connected. A missing one is a blind spot: a source the agent will guess about instead of read.
Step 4. Prove it. Ask one thing that only works if the wiring is real, and tell it to cite sources:
Show me what you can see for this project. List the repo, the issue tracker, the chat workspace, and the running app, and for each one name one specific real thing: a file path, an open ticket, a channel. If you cannot reach one, tell me.
If it answers with specifics, you are connected. If it stays vague, something is not wired up yet. Pair all of this with a CLAUDE.md at the repo root for your conventions, and the agent has both your standards and your live systems.
The workflows it unlocks
Here is what this looks like once it is really set up, and it is the part I did not expect to love. With my repo open in the editor, the front end and the back end both in view, plus Slack, Confluence, and the tracker connected, the agent can reach everything I can. So the tedious automation work mostly disappears. I do not go hunting for a locator; it reads the frontend and finds a stable one. It finds the API calls. It finds bugs. It goes past what I asked and does the deep research in seconds, and because it is reading my actual application and everything I have built, what it gives back is connected and specific to my system, not a generic answer. That is the whole reason to give it real access.
Within reason: scope access to the task, prefer read access and test data, and keep a human reviewing every change. Real access speeds the work up. It does not move the release decision.
One workflow, worked through: conversation to a filed bug
Here is the most common one in practice. A bug surfaces in a chat channel, gets half-reproduced and guessed at, then lost because everyone is busy. Give an agent access to that channel, the repo, and the issue tracker, and one instruction turns the mess into a filed bug you can act on:
Read the cart-bug discussion in the #support channel, then turn it into a single filed bug in our issue tracker. Pull the reproduction details, the affected browsers, and the free-plan account type out of the conversation, and confirm the likely cause by checking the cart retention and cleanup code in the repo, especially anything that changed in the 1.42 release. Write clear numbered steps, a specific expected versus actual, and list the concrete evidence to attach, including the failing cart request and the relevant server log line. Set severity to High and assign it to the cart team.
Before: the thread
After: the filed bug
Saved cart is emptied for free-plan users after the session expires
High Nyam Box web app, cart and checkout area. Reproduced on Firefox 126 and Chrome 125, signed-in free-plan accounts. Build 1.42, which shortened the retention window for free users.
Steps to reproduce
- Sign in with a free-plan account.
- Add three or more items to the cart.
- Close the browser tab and leave the account idle for longer than the free-plan retention window, which is now 12 hours in build 1.42.
- Open the app again and sign back in with the same account.
- Open the cart.
Expected
The cart still holds the three items that were added. Saved carts are meant to persist independently of the account retention window, so changing that window should not clear them.
Actual
The cart shows zero items. Everything the user added is gone, with no message explaining why.
Evidence
Attach a screen recording of the empty cart after sign-in, plus the network capture (HAR file) of the cart load request. The response from GET /api/v1/cart returns an empty items array with a 200 status. Also include the server log line where the cart cleanup job runs against the free-plan retention timestamp instead of the cart timestamp.
Impact
Free-plan users only, both Firefox and Chrome. Started with build 1.42. Likely affecting a meaningful share of free accounts, since the retention window is short enough that normal overnight gaps trigger it. These are the users we most want to keep through the upgrade decision, so the impact on conversion matters as much as the broken behavior.
The agent did not invent any of this. It read the conversation, confirmed the cause against the code, and wrote it up in your format. You review the filed bug, not the messy thread. The structure it follows is the one in the bug report template.
The same bug, taken to a pull request
Filing it is half the loop. Because the agent already has the repo, you can hand the confirmed bug straight back and ask it to fix it, prove the fix, and open a pull request you review. One instruction:
The saved-cart bug is confirmed: the cleanup job is keying off the free-plan retention timestamp instead of the cart timestamp. Find the exact code in the repo, make the smallest fix that keeps saved carts independent of the account retention window, and add a test that fails on the old behaviour and passes on the new one. Open a pull request with a clear description and the test output. Change nothing else.
fix/saved-cart-retention
Keep saved carts independent of the free-plan retention window
What changed
- cart/cleanup-job.ts: compare each cart against its own updatedAt, not the account retention timestamp.
- cart/cleanup-job.test.ts: new test that a free-plan saved cart survives past the account retention window.
The test it added
cart cleanup > keeps a free-plan saved cart past the account retention window: passing. The same test fails on the previous code, so it locks the bug out.
Left to you
Review the diff, confirm this is the retention rule the product actually wants, and decide whether it ships as a hotfix or in the next release.
This is the part to hold onto. The agent did the finding, the fixing, and the proving, and it brought evidence: a test that fails on the old code and passes on the new one. The merge decision is still yours. That is what real access looks like in practice. It speeds the work up and leaves the judgment where it belongs.
Where AI stops and you start
AI is not one thing. Hand it what it is good at, and keep what needs a human, especially anything visual. I am building a game right now, and this is exactly where it bites: the agent cannot see when something is visually off, a button too big, an element a few pixels out of line, a screen that falls apart on a real device. Across real screen sizes and on mobile, that judgment is still my eyes, not the model's.
Hand it to AI
- Contracts and integrations between services
- Combing through large amounts of data
- Spotting the backend is not returning what it should
- Generating web test scenarios from lots of training data
- Summarising logs and clustering failures
Keep it human
- Seeing a UI is off-centre or a button is too big
- Visual and game testing across real devices and screen sizes
- Unified end-to-end user-flow testing
- Exploratory testing and finding the real gaps
- The business decisions and the call to ship
Keep it safe
Real access is powerful, so scope it the way you would for any new teammate. Three rules cover almost everything.
- Scope access to the task. An agent does not need blanket write access to production to read a flow and reason about it.
- Prefer read access and test data. Be deliberate about credentials and customer records, the same way you would for a new hire.
- A human reviews every change before it merges or ships. Real access speeds the work up. It does not move the release decision.