Published: Jun 22, 2026
P2ER, a digital solutions agency, uses Chrome DevTools for agents to ensure that only verified, working software is passed to humans for final review. By transforming their workflow into an agentic infrastructure, they have empowered AI agents to perform empirical UI verification, increasing deployment frequency from once a week to multiple times per day.
The challenge: Scale quality in existing applications
P2ER delivers high-end digital experiences for global brands, including car manufacturers, watch brands, and hospitality companies. Their primary challenge, as it is for many companies, was working within complex, existing applications. As the team adopting agentic coding, they faced three major hurdles:
- Brittle UI testing. Standard test suites struggled with dynamic data, such as fluctuating hotel prices or seasonal offerings in some of P2ER's projects. Mock data often hid integration flaws that a human tester would find immediately.
- Agent reliability issues. Without explicit instructions, AI agents sometimes claimed a task was complete without actually verifying it.
- Loss of context. Broad tasks and model timeouts caused agents to lose track of session goals. This made it difficult for developers to trace and continue work an agent had started.
The solution: Build infrastructure for craftsmanship
P2ER built an infrastructure that treats AI as a "sparring partner" that could also handle the repetitive aspects of development. This approach allows the team to scale craftsmanship by focusing on architecture and creative problem solving.
Enforce empirical verification with DevTools for agents' MCP server
To ensure reliability, P2ER established a Mandatory Empirical Verification rule.
This engineering mandate, codified in the project's AGENTS.md file, states:
All claims regarding service availability and component rendering
MUST be empirically verified (log output, dev compiler, browser/devtools inspection)
before asserting to the user.
Instead of taking the agent's word for it, the team uses Chrome DevTools for agents to give agents a safe environment to navigate the application visually and interactively.
These "testing agents" perform several key tasks that standard static tests miss:
- Dynamic data testing: Even in a staging environment, agents test against
real, fluctuating data (like changing hotel prices across seasons) to
experience the application exactly as a user would. This is enabled by
DevTools for agents' interaction tools like
new_page,navigate_page,fill,click, andhover, called out in theirgithub-issue-testskill, allowing the agent to dynamically authenticate and simulate a realistic user click path. - Visual audits: Agents identify visual discrepancies between Figma
layouts and the actual implementation. By using the
take_screenshottool from DevTools for agents, theirfigma-validateskill captures high-resolution screenshots of live Storybook renders to perform a side-by-side comparison with Figma exports. - Usability checks: Agents catch missing translations or usability errors
that automated scripts often overlook. By interacting directly with the
accessibility tree and reviewing visual snapshots, retrieved through
take_snapshotandtake_screenshot, agents actively scan for UI anomalies like MISSING_MESSAGE strings as explicitly instructed in their automated verification workflows.
Decompose and persist subtasks
To combat session timeouts and context loss, P2ER strictly compartmentalizes work through sub agents. Then they instruct their agents to act as orchestrators like this:
Rather than executing everything in the main thread, you must decompose large
or complex objectives into modular subtasks that can be delegated
to specialized subagents.
To keep human product owners informed in this process, the team integrated a custom skill for agents to track their work in GitHub issues. This ensures that every subagent task and its results are persisted and documented as a sub-issue using the GitHub API, creating a clear audit trail and persistent context that other developers can pick up.
Isolate environments for parallel execution
To scale their development process so multiple agents run code in parallel, P2ER mandates isolated environments per task for their agents. This prevents state conflicts and network issues during UI verification.
The technical setup for this isolation includes:
- Isolated Git worktrees: To prevent file collisions and workspace pollution when multiple agents operate in parallel, tasks are executed within isolated Git worktrees. Each agent gets a dedicated file system space where environment variables are copied and dependencies are symlinked, ensuring file changes never overwrite each other.
- Unique environments: Each agent and task runs its Next.js development
server on a unique isolated port. According to their project rules, servers
are started dynamically with
npx next dev -p <custom_port> --turbopackto ensure parallel execution without network conflicts. - Database clones: To prevent data collisions during parallel testing, P2ER programmatically duplicates the main database into a task-specific schema at agent startup. After the agent completes its verification and the task is approved, an automated cleanup process drops the isolated database. This lifecycle ensures that every agent operates in a pristine workspace and leaves no dangling data behind.
- Targeted testing: All browser testing through Chrome DevTools for agents
must target the exact custom port allocated to that specific agent instance.
Their testing mandate prohibits hardcoding default ports, requiring test
target URLs like
http://localhost:<custom_port>.
Impact: A 10x increase in development velocity while keeping quality
The shift to agentic coding with high-trust guardrails transformed P2ER's output. These changes were originally necessary to ensure the agent performed reliably but, they also benefited the entire development lifecycle:
- 10x faster work cycles: Most issues are now closed within a single day, compared to the previous 1–3 day average. Deployment frequency jumped from once per week to multiple times per day.
- Strategic focus for QA teams: Because agents now catch basic regressions and "low-hanging fruit," the human testing team can focus on more in-depth, complex test scenarios.
- Robust implementations for stakeholders: Implementations are now more resilient because testing moves beyond the programmer's standard "happy path".
- Clearer communication and traceability: By enforcing a "human issue to implementation subissue" rule, stakeholders receive clear instructions on what logical improvements were made instead of reading through tickets bloated with technical implementation details and how to test them.
As an example of how this impacts development velocity, P2ER built a new platform in six months that would have taken many years using their established methods. The human remains the final quality gate, reviewing Pull Requests that have already been validated by agents.
Technical insights for teams
While building this workflow, P2ER identified several strategies that helped them transition from experimentation to a mature, agent-assisted development model.
These strategies can help other teams refine their own agentic implementations:
Optimize token usage with script injection and CLI batching
MCP servers can become token-intensive during long development sessions if agents rely solely on step-by-step navigation (for example, taking a snapshot, finding an ID, filling an input, and waiting). To minimize this overhead, P2ER uses a two-pronged approach:
- Inline script injection: For targeted interactions, such as
authenticating through complex React forms, agents use the
evaluate_scripttool to inject vanilla JavaScript directly into the browser. This bypasses built-in setter overrides and executes multiple actions at once, saving numerous conversational turns. - CLI script batching: When agents hit a "snag" or encounter an exceedingly long, repetitive browser flow, they switch to a CLI batching fallback. Instead of spending tokens on repeated, individual MCP tools or writing custom automation scripts from scratch, P2ER prompts the Chrome DevTools CLI to persist and batch browser actions. This allows agents to programmatically execute entire multi-step flows in one go, drastically reducing the overhead of constant model-to-tool communication.
Automate performance tracking with Trace Analysis
Instead of relying purely on human perception, P2ER created a
review-performance skill that uses the DevTools for agents to run automated Lighthouse audits
and performance traces.
Agents use the performance_start_trace and performance_analyze_insight tool
to capture and investigate Core Web Vitals (LCP, INP, CLS) and identify main
thread bottlenecks or layout shifts. To round out the quality gate, agents can
run a full lighthouse_audit to specifically guard against regressions in
Accessibility (a11y), SEO, and general web best practices, ensuring only
high-quality code is submitted for a Pull Request.
Enhance verification with Chrome DevTools for agents
In addition to their custom skills, P2ER uses the core capabilities of the Chrome DevTools for agents MCP server to perform functional verification. This includes using the server to emulate different devices and test for responsiveness, making sure that the user interface works across different screen sizes and devices.
By using the MCP server to navigate the application, agents can identify visual discrepancies between layouts and the actual implementation, identifying errors that static tests often overlook.
Resources
To explore P2ER's use case even further, explore all mentioned skills in their related GitHub repository.
To get started yourself and learn more about implementing similar workflows with DevTools for agents, explore these resources: