Awesome-Agentic-Engineering

🌐 Browser and Desktop Agents

Last reviewed: April 2026.

Computer-use and browser agents operate on GUI surfaces (DOM, pixels, or accessibility trees) rather than pure APIs. Evidence tags follow the Benchmark and Evidence Policy. Entries that are marketing-only or have no active development were removed in this phase.

Consumer Products

| Agent | Description | Evidence | |β€”β€”-|β€”β€”β€”β€”-|β€”β€”β€”-| | OpenAI Operator | ChatGPT autonomous web agent; human checkpoints; built on the CUA (Computer-Using Agent) model. | [official] | | Claude Computer Use | Anthropic desktop/browser control via screenshots and tool loop. | [official] | | Claude for Chrome | Anthropic browsing agent running inside Chrome. | [official] | | Google Project Mariner | Gemini browser agent with multi-task execution in the user’s browser context. | [official] | | ChatGPT Atlas | OpenAI’s AI-native browser with Agent Mode. | [official] | | Dia Browser | AI-native browser from The Browser Company (acquired by Atlassian). | [official] |

Developer Infrastructure

| Tool | Description | Evidence | |β€”β€”|β€”β€”β€”β€”-|β€”β€”β€”-| | Browser Use | OSS browser agent library with DOM + vision hybrid; widely embedded in other agent stacks. | [official] | | Skyvern | Vision-driven browser automation using multimodal LLMs for navigation without coded selectors. | [official] | | UI-TARS | ByteDance open-source native GUI agent model + desktop app for end-to-end computer use. | [official] Β· [benchmark] paper | | Agent S2 (Simular) | OSS compositional GUI automation framework with generalist + specialist models. | [official] Β· [benchmark] | | Browserbase | Cloud browser infrastructure for agents; headless Chrome at scale with session persistence. | [official] | | Amazon Nova Act | AWS browser automation research preview aimed at enterprise reliability. | [official] | | Playwright MCP | Official MCP server wrapping Playwright for agent-driven browser automation. | [official] |

Benchmarks & Evaluation

| Benchmark | Description | Evidence | |———–|β€”β€”β€”β€”-|β€”β€”β€”-| | OSWorld | Real computer environments benchmark for multimodal agents on open-ended tasks. | [official] Β· [benchmark] | | WebArena / VisualWebArena | Reproducible web agent benchmark on real-website snapshots. | [official] Β· [benchmark] | | WindowsAgentArena | Benchmark for Windows desktop agents across real applications. | [official] Β· [benchmark] |