How the audit works.
This is not a sentiment check. Not a prompt-and-pray approach. It is a mechanistic measurement pipeline. You feed it a URL. It returns a scored dossier showing exactly how AI systems see, rank, and cite the brand behind that URL. Every conclusion traces back to a specific signal.
The measurement thesis
The problem with asking an LLM “what do you think of this brand?” is that the answer is a function of the prompt, the temperature, the decoding strategy, and the model's mood. You get a different answer every time. That is not measurement; it is anecdote with a confidence interval the size of a continent.
Our approach is different. We measure the signals that determine whether your brand can appear in AI-generated answers, independent of any single query. We measure across the full pipeline: training corpus presence, retrieval index inclusion, entity graph status, token-level probability, co-occurrence statistics, knowledge-layer decomposition. These are the inputs to the probability distribution the model produces. If the inputs are weak, no prompt engineering will save you. If the inputs are strong, the model reaches for you automatically.
The 5-stage pipeline
Each audit passes through five discrete stages. Stages execute sequentially; modules within a stage run in parallel where possible. The full pipeline completes in approximately 30 seconds for a single URL.
Playwright launches a real Chromium instance. Not an HTTP fetch. Not a meta-tag scrape. A full browser with JavaScript execution, cookie consent handling, and client-side rendering completion. Two viewport passes: desktop (1280x720) and mobile (390x844). Both produce screenshots for the visual audit.
Cheerio parses the rendered DOM. 182 signals extracted per page: title, meta description, all h1-h6 elements, internal and external link graph, image alt text, JSON-LD blocks (parsed and validated against schema.org), OpenGraph tags, canonical URL, word count, reading level (Flesch-Kincaid), robots directives, sitemap presence, TTFB, HTTP status codes, and security headers.
AI bot access checks run against robots.txt for five user-agents: GPTBot, ClaudeBot, Google-Extended, PerplexityBot, Applebot-Extended. Blocking any of these is a scored penalty.
A single Gemini 2.5 Flash call with structured JSON output. The classifier returns: brand_name, category, top_3_competitors (with URLs), audience_description, site_type, page_intent, awareness_level.
This is the context injection point. Every downstream module keys off this classification, not URL-derived heuristics. The classification is cached on the audit context object; no double calls. If classification fails, the pipeline falls back to URL-derived guesses, but scores a confidence penalty.
17 measurement modules dispatched via Promise.allSettled. Each returns a DimensionScore (0 to 100). If a module throws, it settles with score 50 and logs the failure. No single module can crash the pipeline.
This is where the hard signals live: tokenizer tax across two encoding schemes, entity resolution across three knowledge graphs, retrieval pool cosine similarity in pgvector, token-level log-probabilities with bootstrap confidence intervals, pointwise mutual information against category anchors, seven-layer knowledge-source decomposition across three providers, chunk-gap clustering, hallucination persistence tracking, feedback loop velocity, counterfactual impact prediction, and more. The 15 mechanism capabilities (detailed below) all execute here or in the post-processing stage that follows.
A 10-step sequential LLM pipeline. Model routing is deliberate: Gemini 2.5 Flash for classification (fast, cheap, structured JSON output), DeepSeek Chat for all analysis tasks (cost-optimized at roughly 30x less than frontier), Claude Sonnet for premium synthesis when the audit tier warrants it.
The 10 steps:
Weighted average across all dimensions produces an overall score (0 to 100) and a letter grade (A+ to F). 15 mechanism capabilities account for 66% of the score. The judge pipeline contributes 20%. Traditional SEO signals contribute the remaining 14%.
Issues are extracted per module with severity, category, and recommended fix. Judge output persists as structured JSONB. A cache entry is written with a 48-hour TTL. The full audit is reconstructable from the database: every prompt, every response, every signal, every intermediate calculation.
The 15+1 mechanism capabilities
Each capability is a distinct measurement module with its own flag in the database. They run in parallel. Each produces a score from 0 to 100. Each can be enabled or disabled per audit without affecting the others. The +1 is C16 (AI Overview), gated behind an API key.
The data-source waterfall
Each audit has 7 data needs. Each need has an ordered stack of sources, tried free-to-paid. The dispatcher enforces 5 gates per attempt: env-key present, paid-allowed flag, daily budget ceiling, per-request budget, per-source quota plus circuit breaker. Every attempt (success or skip) writes one row to audit_source_calls so the entire fallback chain is reconstructable.
Scoring weights
The overall score is a weighted average. Weights sum to 1.0. Two-thirds of the score comes from the 15 mechanism capabilities. The reasoning: traditional SEO is table stakes. Every tool measures it. AI visibility is determined by signals that SEO tools do not measure, and those signals are what determine whether a model cites you. The weights reflect this.
Key numbers
The receipts
Every prompt sent to every model. Every response received. Every citation extracted. Every classifier output. The full crawl signal sheet. Desktop and mobile screenshots. The complete data-source waterfall trace (which sources were tried, which succeeded, which were skipped, and why). The full AIO simulation (both passes, expandable). Entity probes with raw API responses. Seven-source matrix cells. Token probability distributions. PMI calculations. A JSON download button for the entire audit payload.
The audit is not a black box. The Receipts tab ships with every report. You can verify every conclusion against the evidence that produced it. If you disagree with a score, you can trace it to the specific signal and the specific measurement that generated it.
Architecture
Frontend: Next.js 16, React 19, Tailwind CSS. Worker: Node.js + Playwright (Docker on Railway). Queue: BullMQ + Upstash Redis. Database: Supabase PostgreSQL with pgvector. Embeddings: MiniLM-L6-v2 (local, zero API cost). Tokenizer: js-tiktoken. Counterfactual sidecar: LightGBM on Railway.
What we do not do
We do not prompt-hack. We do not pay for placement. We do not manipulate model outputs. We do not run adversarial attacks against AI systems to inflate your visibility.
We measure the structural signals that determine your position in the probability distribution. The audit tells you where you stand and what to change. The changes are yours to make.