Benchmark

This page documents the full methodology behind the Deep Research Benchmark. What Are We Comparing? Three fundamentally different approaches to AI-powered research: Perplexity Deep Research — a black-box hosted service (sonar-deep-research). You send one prompt, it autonomously plans, searches, reads, iterates and synthesises. ~50 sources, 2-5 minutes. Extended Search Pipeline — a custom 7-step pipeline where we control every stage. Each step is a separate LLM call in a separate sub-agent. Full transparency, full control. Single-Agent Pipeline — the same 7 steps, the same prompts, but executed in one continuous session without spawning sub-agents. Context grows, but there’s zero spawn overhead. We also tested optimisations (cheap models, lightContext, minimal steps) that turned out to be dead ends. ...