Shipyard | Test-first development with agents and the Playwright MCP

Use the Playwright MCP to help your agent write smarter tests by visiting your app. You can apply TDD principles to help your agent write better features, and help it use these skills to build its own feedback loops.

You just wrote 500 lines of React with your coding agent. It compiled. That means it’s time to approve and merge, right?

You probably know the answer. Many AI-generated features fall somewhere in between “code that runs” and “code that works”, requiring lots of time for manual (human) review.

You can assign your agents better processes, just like you do with your human engineers. For example, you can define a workflow for “test-first development” using config files and the Playwright MCP. By doing so, your agent can write, run, and validate tests, leading to higher code quality.

Where AI agent-written code falls short

AI agents are proficient at writing syntactically-correct code. Without a ton of context or prompting, they can struggle with understanding whether that code actually solves your problem (or works as expected). Here’s what typically happens:

You ask Claude Code to add a user auth flow
It generates beautiful code with proper error handling
You push it to staging
The login button doesn’t actually submit the form. There’s an event handler conflict with your existing code
An engineer spends 30 minutes debugging the code that the agent wrote in 30 seconds

The problem here isn’t that agents write bad code, it’s that they need to iterate more. You can help aid this by implementing feedback loops (e.g. instruct them to click through your app until changes work).

Better AI agent feedback loops with Playwright MCP

The Playwright MCP server lets your coding agent (e.g. Claude Code, Codex CLI) use the browser, which is usually a necessary part of the dev/test loop for human engineers. Using the MCP, it can write tests, run them, see them fail, fix the code, and run them again.

Of course, your agent can “guess” how to write tests by using the context of your codebase, but (in our experience), these often break.

Let’s say your agent writes this form:

function LoginForm() {
  return (
    <form onSubmit={handleLogin}>
      <input name="email" type="email" />
      <input name="password" type="password" />
      <button type="submit">Login</button>
    </form>
  );
}

It can then visit the webpage with the Playwright MCP and view the layout or grab the DOM. Using that info, it can then write accurate tests to assess that feature:

test('user can log in with valid credentials', async ({ page }) => {
  await page.goto('/login');
  await page.fill('input[name="email"]', 'test@example.com');
  await page.fill('input[name="password"]', 'testpass123');
  await page.click('button[type="submit"]');

  // Verify redirect to dashboard
  await expect(page).toHaveURL('/dashboard');
  await expect(page.locator('h1')).toContainText('Welcome back');
});

The agent now has a solid feedback loop: it writes a test, runs it, sees the results, and iterates on both the test and the feature until everything passes.

Setting up test-first development for agents

Start by defining your test specs before asking the agent to implement features. This is a good way to get high-quality results from your agents (in the same way that test-driven development is a great approach for implementing features).

1. Write behavioral specs

Instead of asking “add a search feature,” you should outline user stories and/or feature requirements:

## Search feature requirements

Users should be able to:
- See search results as they type (debounced by 300ms)
- Filter results by category using checkboxes
- Clear all filters with one button
- See "No results found" when search returns empty
- Navigate results with keyboard (arrow keys + enter)

2. Convert specs to Playwright tests

Your agent can now convert these requirements into Playwright tests.

To do true test-driven development, have it write the tests BEFORE implementing the feature. Have the agent run these tests against your current build, see them fail (expected!), then implement the feature to make them pass.

Use the Playwright MCP to cross-reference the tests with the new feature. Usually in TDD, you’ll change the feature code before editing the tests.

test.describe('Search Feature', () => {
  test('displays results as user types', async ({ page }) => {
    await page.goto('/search');
    await page.fill('#search-input', 'react');

    // Wait for debounce
    await page.waitForTimeout(400);

    // Verify results appear
    const results = page.locator('.search-result');
    await expect(results).toHaveCount(greaterThan(0));
  });

  test('filters by category', async ({ page }) => {
    await page.goto('/search');
    await page.fill('#search-input', 'tutorial');
    await page.check('input[value="video"]');

    // All results should be videos
    const results = page.locator('.search-result');
    const count = await results.count();
    for (let i = 0; i < count; i++) {
      await expect(results.nth(i)).toHaveAttribute('data-category', 'video');
    }
  });
});

3. Connect to Ephemeral Environments

You can get more agent workflows running concurrently with ephemeral environments. Use the Shipyard MCP server with an agent to get better agent feedback loops. It might look like this:

Your agent pushes code changes to a branch
It waits for Shipyard to spin up the environment
Agent uses the Shipyard MCP server to get the env link
It runs Playwright tests against that environment
It reads the results and gets immediate feedback on what works and what doesn’t, iterates with Playwright MCP if things break

Configure your agent’s behavior (e.g. CLAUDE.md)

Your CLAUDE.md (or other agent config) file helps you customize your agent’s processes. You can give it general steps/rules, so it can better solve things without needing your intervention. For example, we can show it how we expect it to use a testing workflow:

## Testing requirements

### Test-driven development (TDD) workflow
When implementing new features or fixing bugs, follow TDD:
1. Write failing tests first that define the expected behavior
2. Run tests to confirm they fail (red phase)
3. Write minimal code to make tests pass (green phase)
4. Use Playwright MCP to cross reference tests with webpage
5. Refactor while keeping tests green
6. Never skip directly to implementation

### Test ccverage rules
- Every new function must have at least 1 test
- Every bug fix must include a regression test that would have caught the bug
- UI components need both unit tests and Playwright integration tests
- API endpoints require both success and error case tests

### Running tests before committing

Always run tests before pushing code:

npm test                    # Unit tests
npm run test:integration    # API tests
npx playwright test         # E2E tests

If any tests fail, fix them before committing. Do NOT comment out failing tests.

## Test file locations
- Unit tests: Same directory as source file, named `.test.js` 
- Integration tests: `/tests/integration/`
- E2E tests: `/tests/e2e/` with Playwright specs

## Writing tests for agents
- Test names should describe business requirements, not implementation
- Include comments explaining WHY a test expects certain behavior
- Use consistent data-testid attributes: `data-testid="verb-noun"` format
- Helper functions should have descriptive names: `createUserWithPremiumSubscription()` not `setupTest()`

## Test data
- Use predictable test data from `/tests/fixtures/`
- Reset database state between test runs using migrations

Making your tests agent-friendly

Your agents can locate your tests faster (and consume fewer tokens) if you follow a few basic patterns.

Use data-testid attributes consistently:

<button data-testid="submit-order">Place Order</button>

Write descriptive test names that explain what they’re supposed to do:

// Bad: test('checkout works')
// Good: test('applies free shipping for orders over $50')

Include comments filling in anything that isn’t obvious:

  // Complete checkout
  await page.goto('/checkout');
  await fillTestPaymentDetails(page); // Helper function with test card: 4242...
  await page.click('[data-testid="place-order"]');

Help your agents help themselves

Just like how TDD helps engineers build features better, going test-first with your agents and the Playwright MCP can get you better agent feedback loops. By validating everything they write via tests and page visits, they can then iterate as many times as it takes for them to figure out a feature.

It’ll take some time to figure out the perfect test workflow for your agents. Start by setting rules in your agent config file, then tweaking until your agent is fine-tuning its outputs in a feedback loop.

Good luck, and happy testing!

Test-first development with agents and the Playwright MCP

Where AI agent-written code falls short

Better AI agent feedback loops with Playwright MCP

Setting up test-first development for agents

1. Write behavioral specs

2. Convert specs to Playwright tests

3. Connect to Ephemeral Environments

Configure your agent’s behavior (e.g. CLAUDE.md)

Making your tests agent-friendly

Help your agents help themselves

Try Shipyard today

About Shipyard

Stay connected

Latest Articles

Comparing CI/CD Tools

Devoting December to developer enrichment

K0s vs K3s vs K8s: Comparing Kubernetes Distributions