How P2ER built a high-trust environment for agentic coding with Chrome DevTools for agents

Peer Weidner

José Luis Zapata

Published: Jun 22, 2026

P2ER, a digital solutions agency, uses Chrome DevTools for agents to ensure that only verified, working software is passed to humans for final review. By transforming their workflow into an agentic infrastructure, they have empowered AI agents to perform empirical UI verification, increasing deployment frequency from once a week to multiple times per day.

The challenge: Scale quality in existing applications

P2ER delivers high-end digital experiences for global brands, including car manufacturers, watch brands, and hospitality companies. Their primary challenge, as it is for many companies, was working within complex, existing applications. As the team adopting agentic coding, they faced three major hurdles:

Brittle UI testing. Standard test suites struggled with dynamic data, such as fluctuating hotel prices or seasonal offerings in some of P2ER's projects. Mock data often hid integration flaws that a human tester would find immediately.
Agent reliability issues. Without explicit instructions, AI agents sometimes claimed a task was complete without actually verifying it.
Loss of context. Broad tasks and model timeouts caused agents to lose track of session goals. This made it difficult for developers to trace and continue work an agent had started.

The solution: Build infrastructure for craftsmanship

P2ER built an infrastructure that treats AI as a "sparring partner" that could also handle the repetitive aspects of development. This approach allows the team to scale craftsmanship by focusing on architecture and creative problem solving.

Enforce empirical verification with DevTools for agents' MCP server

To ensure reliability, P2ER established a Mandatory Empirical Verification rule. This engineering mandate, codified in the project's AGENTS.md file, states:

All claims regarding service availability and component rendering
MUST be empirically verified (log output, dev compiler, browser/devtools inspection)
before asserting to the user.

Instead of taking the agent's word for it, the team uses Chrome DevTools for agents to give agents a safe environment to navigate the application visually and interactively.

These "testing agents" perform several key tasks that standard static tests miss:

Dynamic data testing: Even in a staging environment, agents test against real, fluctuating data (like changing hotel prices across seasons) to experience the application exactly as a user would. This is enabled by DevTools for agents' interaction tools like new_page, navigate_page, fill, click, and hover, called out in their github-issue-test skill, allowing the agent to dynamically authenticate and simulate a realistic user click path.
Visual audits: Agents identify visual discrepancies between Figma layouts and the actual implementation. By using the take_screenshot tool from DevTools for agents, their figma-validate skill captures high-resolution screenshots of live Storybook renders to perform a side-by-side comparison with Figma exports.
Usability checks: Agents catch missing translations or usability errors that automated scripts often overlook. By interacting directly with the accessibility tree and reviewing visual snapshots, retrieved through take_snapshot and take_screenshot, agents actively scan for UI anomalies like MISSING_MESSAGE strings as explicitly instructed in their automated verification workflows.

Decompose and persist subtasks

To combat session timeouts and context loss, P2ER strictly compartmentalizes work through sub agents. Then they instruct their agents to act as orchestrators like this:

Rather than executing everything in the main thread, you must decompose large
or complex objectives into modular subtasks that can be delegated
to specialized subagents.

To keep human product owners informed in this process, the team integrated a custom skill for agents to track their work in GitHub issues. This ensures that every subagent task and its results are persisted and documented as a sub-issue using the GitHub API, creating a clear audit trail and persistent context that other developers can pick up.

Isolate environments for parallel execution

To scale their development process so multiple agents run code in parallel, P2ER mandates isolated environments per task for their agents. This prevents state conflicts and network issues during UI verification.

The technical setup for this isolation includes:

Isolated Git worktrees: To prevent file collisions and workspace pollution when multiple agents operate in parallel, tasks are executed within isolated Git worktrees. Each agent gets a dedicated file system space where environment variables are copied and dependencies are symlinked, ensuring file changes never overwrite each other.
Unique environments: Each agent and task runs its Next.js development server on a unique isolated port. According to their project rules, servers are started dynamically with npx next dev -p <custom_port> --turbopack to ensure parallel execution without network conflicts.
Database clones: To prevent data collisions during parallel testing, P2ER programmatically duplicates the main database into a task-specific schema at agent startup. After the agent completes its verification and the task is approved, an automated cleanup process drops the isolated database. This lifecycle ensures that every agent operates in a pristine workspace and leaves no dangling data behind.
Targeted testing: All browser testing through Chrome DevTools for agents must target the exact custom port allocated to that specific agent instance. Their testing mandate prohibits hardcoding default ports, requiring test target URLs like http://localhost:<custom_port>.

Impact: A 10x increase in development velocity while keeping quality

The shift to agentic coding with high-trust guardrails transformed P2ER's output. These changes were originally necessary to ensure the agent performed reliably but, they also benefited the entire development lifecycle:

10x faster work cycles: Most issues are now closed within a single day, compared to the previous 1–3 day average. Deployment frequency jumped from once per week to multiple times per day.
Strategic focus for QA teams: Because agents now catch basic regressions and "low-hanging fruit," the human testing team can focus on more in-depth, complex test scenarios.
Robust implementations for stakeholders: Implementations are now more resilient because testing moves beyond the programmer's standard "happy path".
Clearer communication and traceability: By enforcing a "human issue to implementation subissue" rule, stakeholders receive clear instructions on what logical improvements were made instead of reading through tickets bloated with technical implementation details and how to test them.

As an example of how this impacts development velocity, P2ER built a new platform in six months that would have taken many years using their established methods. The human remains the final quality gate, reviewing Pull Requests that have already been validated by agents.

Technical insights for teams

While building this workflow, P2ER identified several strategies that helped them transition from experimentation to a mature, agent-assisted development model.

These strategies can help other teams refine their own agentic implementations:

Optimize token usage with script injection and CLI batching

MCP servers can become token-intensive during long development sessions if agents rely solely on step-by-step navigation (for example, taking a snapshot, finding an ID, filling an input, and waiting). To minimize this overhead, P2ER uses a two-pronged approach:

Inline script injection: For targeted interactions, such as authenticating through complex React forms, agents use the evaluate_script tool to inject vanilla JavaScript directly into the browser. This bypasses built-in setter overrides and executes multiple actions at once, saving numerous conversational turns.
CLI script batching: When agents hit a "snag" or encounter an exceedingly long, repetitive browser flow, they switch to a CLI batching fallback. Instead of spending tokens on repeated, individual MCP tools or writing custom automation scripts from scratch, P2ER prompts the Chrome DevTools CLI to persist and batch browser actions. This allows agents to programmatically execute entire multi-step flows in one go, drastically reducing the overhead of constant model-to-tool communication.

Automate performance tracking with Trace Analysis

Instead of relying purely on human perception, P2ER created a review-performance skill that uses the DevTools for agents to run automated Lighthouse audits and performance traces.

Agents use the performance_start_trace and performance_analyze_insight tool to capture and investigate Core Web Vitals (LCP, INP, CLS) and identify main thread bottlenecks or layout shifts. To round out the quality gate, agents can run a full lighthouse_audit to specifically guard against regressions in Accessibility (a11y), SEO, and general web best practices, ensuring only high-quality code is submitted for a Pull Request.

Enhance verification with Chrome DevTools for agents

In addition to their custom skills, P2ER uses the core capabilities of the Chrome DevTools for agents MCP server to perform functional verification. This includes using the server to emulate different devices and test for responsiveness, making sure that the user interface works across different screen sizes and devices.

By using the MCP server to navigate the application, agents can identify visual discrepancies between layouts and the actual implementation, identifying errors that static tests often overlook.

Resources

To explore P2ER's use case even further, explore all mentioned skills in their related GitHub repository.

To get started yourself and learn more about implementing similar workflows with DevTools for agents, explore these resources: