Methodology

How the audit works.

This is not a sentiment check. Not a prompt-and-pray approach. It is a mechanistic measurement pipeline. You feed it a URL. It returns a scored dossier showing exactly how AI systems see, rank, and cite the brand behind that URL. Every conclusion traces back to a specific signal.

The measurement thesis

The problem with asking an LLM “what do you think of this brand?” is that the answer is a function of the prompt, the temperature, the decoding strategy, and the model's mood. You get a different answer every time. That is not measurement; it is anecdote with a confidence interval the size of a continent.

Our approach is different. We measure the signals that determine whether your brand can appear in AI-generated answers, independent of any single query. We measure across the full pipeline: training corpus presence, retrieval index inclusion, entity graph status, token-level probability, co-occurrence statistics, knowledge-layer decomposition. These are the inputs to the probability distribution the model produces. If the inputs are weak, no prompt engineering will save you. If the inputs are strong, the model reaches for you automatically.

The 5-stage pipeline

Each audit passes through five discrete stages. Stages execute sequentially; modules within a stage run in parallel where possible. The full pipeline completes in approximately 30 seconds for a single URL.

01Crawl

Playwright launches a real Chromium instance. Not an HTTP fetch. Not a meta-tag scrape. A full browser with JavaScript execution, cookie consent handling, and client-side rendering completion. Two viewport passes: desktop (1280x720) and mobile (390x844). Both produce screenshots for the visual audit.

Cheerio parses the rendered DOM. 182 signals extracted per page: title, meta description, all h1-h6 elements, internal and external link graph, image alt text, JSON-LD blocks (parsed and validated against schema.org), OpenGraph tags, canonical URL, word count, reading level (Flesch-Kincaid), robots directives, sitemap presence, TTFB, HTTP status codes, and security headers.

AI bot access checks run against robots.txt for five user-agents: GPTBot, ClaudeBot, Google-Extended, PerplexityBot, Applebot-Extended. Blocking any of these is a scored penalty.

1.5Classify

A single Gemini 2.5 Flash call with structured JSON output. The classifier returns: brand_name, category, top_3_competitors (with URLs), audience_description, site_type, page_intent, awareness_level.

This is the context injection point. Every downstream module keys off this classification, not URL-derived heuristics. The classification is cached on the audit context object; no double calls. If classification fails, the pipeline falls back to URL-derived guesses, but scores a confidence penalty.

02Measure

17 measurement modules dispatched via Promise.allSettled. Each returns a DimensionScore (0 to 100). If a module throws, it settles with score 50 and logs the failure. No single module can crash the pipeline.

This is where the hard signals live: tokenizer tax across two encoding schemes, entity resolution across three knowledge graphs, retrieval pool cosine similarity in pgvector, token-level log-probabilities with bootstrap confidence intervals, pointwise mutual information against category anchors, seven-layer knowledge-source decomposition across three providers, chunk-gap clustering, hallucination persistence tracking, feedback loop velocity, counterfactual impact prediction, and more. The 15 mechanism capabilities (detailed below) all execute here or in the post-processing stage that follows.

03Judge

A 10-step sequential LLM pipeline. Model routing is deliberate: Gemini 2.5 Flash for classification (fast, cheap, structured JSON output), DeepSeek Chat for all analysis tasks (cost-optimized at roughly 30x less than frontier), Claude Sonnet for premium synthesis when the audit tier warrants it.

The 10 steps:

Site classification (Gemini Flash)
Fan-out: 7 category-specific queries generated
E-E-A-T assessment (4 axes, parallel)
Content depth analysis
Conversion walkthrough (CTA analysis, objection mapping)
Persona narrative (ideal vs. current user)
AIO simulation pass 1: simulate the AI Overview
AIO simulation pass 2: critique pass 1, fix errors
Strategic synthesis (executive summary + prioritized actions)
Multi-LLM consensus (premium tier only)

04Score & Store

Weighted average across all dimensions produces an overall score (0 to 100) and a letter grade (A+ to F). 15 mechanism capabilities account for 66% of the score. The judge pipeline contributes 20%. Traditional SEO signals contribute the remaining 14%.

Issues are extracted per module with severity, category, and recommended fix. Judge output persists as structured JSONB. A cache entry is written with a 48-hour TTL. The full audit is reconstructable from the database: every prompt, every response, every signal, every intermediate calculation.

The 15+1 mechanism capabilities

Each capability is a distinct measurement module with its own flag in the database. They run in parallel. Each produces a score from 0 to 100. Each can be enabled or disabled per audit without affecting the others. The +1 is C16 (AI Overview), gated behind an API key.

Tokenizer Tax

Real tiktoken measurement using cl100k_base and o200k_base encodings. Computes token count for your brand name, 3 competitors, and 10 common vocabulary terms in the same category. A brand that costs 3 tokens is structurally disadvantaged against a 1-token competitor; the model must allocate more probability mass just to emit your name.

Entity Resolution

3 entity probes across 5 knowledge providers: Google Knowledge Graph API, Wikidata SPARQL, DBpedia Spotlight. Plus a schema audit of your on-page JSON-LD (parsed, validated, checked for @graph completeness and schema.org conformance). The question: do the machines know you exist as a distinct entity with typed properties, or are you an unresolved string?

Retrieval Pool

Owned content and Wikipedia chunks embedded via MiniLM-L6-v2 (384 dimensions), stored in pgvector. 5 category-specific prompts run against the index; top-20 results retrieved by cosine similarity per prompt. Measures whether your content falls within the retrieval window that RAG pipelines would actually surface. If your nearest chunk ranks 47th, you are invisible to retrieval-augmented generation.

Token Probability

OpenAI and DeepSeek logprobs endpoints. 5 prompt templates, 10 samples per template, 50 total completions. Extracts position-specific log-probabilities for the brand token appearing in context. Computes bootstrap confidence intervals (n=1000 resamples) for the mean log-probability. This is the raw measurement of how likely a model is to generate your brand name when the context calls for it.

PMI Co-occurrence

Pointwise mutual information between your brand and 5 category anchor terms, benchmarked against 3 competitors. Currently runs on a seed corpus; production target is a Common Crawl slice. PMI measures whether the brand and category co-occur more than chance would predict, which directly influences the token co-occurrence statistics in the training data.

Seven-Source Analysis

3 LLM providers queried across 7 knowledge layers: pretraining, retrieval, instruction tuning, RLHF, system prompt, safety layer, fine-tune. Maps which layer of each model's stack has information about you. A brand present in pretraining but absent from retrieval has a different failure mode than one present in retrieval but absent from pretraining.

Aggregator Presence

DuckDuckGo HTML search against a 30-aggregator registry (G2, Capterra, TrustRadius, Product Hunt, Crunchbase, etc.) plus URL citation extraction from all 5 LLM providers. Aggregators are the secondary training corpus; models learn about brands through the aggregators that index them. Zero aggregator presence means zero secondary signal.

Chunk Gaps

Brand and competitor content fetched, chunked (512-token windows), embedded via MiniLM-L6-v2, clustered with k-means. Gap severity measures semantic regions where competitors have content and you do not. Each gap is a topic cluster that RAG systems can serve from a competitor but not from you.

Hallucination Log

Reads fact-archaeology output (DeepSeek fact-checks against your crawled content), tracks per-brand factual persistence across providers. Are the AIs fabricating facts about you? How consistently? A brand with high hallucination rates has a trust problem that will surface in any AI-generated recommendation.

C11

Loop Thresholds

5 feedback loops measured: citation-drives-traffic, traffic-drives-content, content-drives-indexing, indexing-drives-retrieval, retrieval-drives-citation. Velocity computed from prior audit deltas. Tells you whether your AI visibility is in a reinforcing cycle (each citation drives more) or a decaying one (each absence compounds).

C12

Counterfactual

LightGBM sidecar model (hosted on Railway) predicts: what would your score be if you fixed this specific issue? Per-issue impact estimation. The sidecar takes the current feature vector, flips one feature, and returns the predicted delta. Mock fallback when the sidecar is unavailable. Model currently trained on n=29 audits; accuracy improves with corpus size.

C13

Query Intent

Rule-based intent classifier weights mention rate by query type. Informational, navigational, transactional, and commercial intents measured separately. A brand that appears only in informational queries but never in transactional ones has a different optimization path than the reverse.

C14

Perception

Lexicon-based sentiment analysis plus attribute extraction, per provider. Extracts the exact adjectives each model uses when describing your brand. Sentiment alone is insufficient; the specific attributes determine positioning. Being described as "affordable" vs. "enterprise-grade" changes which queries surface you.

C15

Competitive Decomposition

6-component gap waterfall per competitor. Breaks the score difference into segments: tokenizer advantage, entity resolution gap, retrieval pool delta, probability differential, co-occurrence spread, knowledge-layer coverage. Tells you which specific capabilities you are losing on, not just that you are behind.

C16

AI Overview (Oracle)

SerpAPI scrapes real Google AI Overviews for 5 category queries and compares them to the audit's predictions. This is the ground truth validation layer. Gated behind SERPAPI_KEY. When active, it measures whether the audit's model of your visibility matches what Google's AI Overview actually produces.

The data-source waterfall

Each audit has 7 data needs. Each need has an ordered stack of sources, tried free-to-paid. The dispatcher enforces 5 gates per attempt: env-key present, paid-allowed flag, daily budget ceiling, per-request budget, per-source quota plus circuit breaker. Every attempt (success or skip) writes one row to audit_source_calls so the entire fallback chain is reconstructable.

NeedFree sourcesPaid fallbacks

entitygoogle-kg, wikidata, dbpedia-spotlight(none) + generator

serpduckduckgo-html, searxngapify-google-serp

page_fetchbasic-fetch, jina-readerfirecrawl, apify-web-scraper

redditreddit-public-json, pullpushapify-reddit

newsgdelt, google-news-rss, newsdata(none)

reviewstrustpilot-htmlapify-trustpilot, apify-g2

ai_overview(none)serpapi-ai-overview (only path)

Scoring weights

The overall score is a weighted average. Weights sum to 1.0. Two-thirds of the score comes from the 15 mechanism capabilities. The reasoning: traditional SEO is table stakes. Every tool measures it. AI visibility is determined by signals that SEO tools do not measure, and those signals are what determine whether a model cites you. The weights reflect this.

Traditional SEOtechnical (0.07) + on-page (0.07)

0.14

Judge pipelinecontent, UX, conversion, AI readiness, AIO sim (0.04 each)

0.20

15 mechanism capabilitiesC1 through C15, 0.044 each

0.66

Key numbers

182

signals per page

frontier models

15+1

mechanism capabilities

~$0.003

cost per audit

30s

to dossier

48h

cache TTL

LLM providers

data sources

The receipts

Every prompt sent to every model. Every response received. Every citation extracted. Every classifier output. The full crawl signal sheet. Desktop and mobile screenshots. The complete data-source waterfall trace (which sources were tried, which succeeded, which were skipped, and why). The full AIO simulation (both passes, expandable). Entity probes with raw API responses. Seven-source matrix cells. Token probability distributions. PMI calculations. A JSON download button for the entire audit payload.

The audit is not a black box. The Receipts tab ships with every report. You can verify every conclusion against the evidence that produced it. If you disagree with a score, you can trace it to the specific signal and the specific measurement that generated it.

Architecture

Next.js (Vercel)  -->  BullMQ (Upstash Redis)  -->  Worker (Railway)  -->  Supabase (Postgres + pgvector)

Stage flow:

  Crawl
    |
  Classify (Gemini Flash)
    |
  Measure (17 modules, parallel)
    |
  Post-process (8 modules)
    |
  Judge (10 steps, sequential)
    |
  Score
    |
  Store (JSONB + issues table + cache entry)

Frontend: Next.js 16, React 19, Tailwind CSS. Worker: Node.js + Playwright (Docker on Railway). Queue: BullMQ + Upstash Redis. Database: Supabase PostgreSQL with pgvector. Embeddings: MiniLM-L6-v2 (local, zero API cost). Tokenizer: js-tiktoken. Counterfactual sidecar: LightGBM on Railway.

What we do not do

We do not prompt-hack. We do not pay for placement. We do not manipulate model outputs. We do not run adversarial attacks against AI systems to inflate your visibility.

We measure the structural signals that determine your position in the probability distribution. The audit tells you where you stand and what to change. The changes are yours to make.

© 2026 ResourceAIBangalore · New York