The 15 Mechanism Capabilities
Reference documentation for every measurement capability in the audit pipeline. Each capability is an independent module with its own score, implementation, and cost.
The audit measures 15 distinct capabilities (labeled C1 through C16, with C10 unused). Each capability runs as an independent module during Stage 2 (Measure) or Stage 3.5 (Post-processing). Each produces a score from 0 to 100. Each has an independent feature flag in the database so it can be enabled or disabled without affecting the rest of the pipeline.
Together, the 15 capabilities account for 66% of the overall audit score (0.044 weight each). The remaining 34% comes from traditional SEO (14%) and the judge pipeline (20%).
Quick Reference
Detailed Reference
How many tokens your brand name costs in the two dominant tokenizer vocabularies (cl100k_base and o200k_base) compared to your competitors. A brand name that consumes 3 tokens is structurally disadvantaged against a 1-token competitor in every LLM context window, every completion, and every retrieval ranking calculation.
Uses js-tiktoken to encode the brand name plus up to 3 competitors and 10 common vocabulary terms in both cl100k and o200k. The score is derived from the ratio of your brand's token count to the category average. Single-token brands score near 100; brands exceeding the average token count score proportionally lower.
Token cost is a permanent structural factor. You cannot change how a tokenizer encodes your name, but you can understand the disadvantage and compensate in other dimensions.
Whether AI systems recognize your brand as a distinct entity with structured attributes, or whether it is just an unresolved string. This capability probes knowledge graphs, runs schema audits, and checks for Wikidata presence.
Sends 3 entity probes across 5 knowledge providers (Google Knowledge Graph, Wikidata, DBpedia Spotlight, plus two LLM-based resolvers). Each probe asks a differently phrased question about the brand. Results are cross-referenced with Wikidata for structured entity data and with the site's JSON-LD for schema completeness. The v6.2 data source waterfall cascades through the entity stack with confidence scoring per source.
Entity resolution is the gateway to AI citation. If a model does not recognize your brand as a distinct entity, it cannot accurately attribute facts, distinguish you from similarly named entities, or include you in structured comparisons.
Whether your content lands in the retrieval window when AI systems search for information in your category. Tests both your owned content and third-party sources (Wikipedia, and optionally Reddit and Hacker News in v6.2 deep tier).
Collects owned content from the crawled site and relevant Wikipedia articles. Content is chunked, then embedded via MiniLM-L6-v2 (384 dimensions, runs locally with zero API cost). Embeddings are stored in pgvector. Five category-specific prompts (generated from the Stage 1.5 classification) are embedded and compared against the stored chunks using top-20 cosine similarity. The score reflects the proportion of top-20 results that come from owned content vs. competitor or third-party sources.
Retrieval-augmented generation (RAG) is how modern AI systems incorporate current information. If your content is not in the retrieval pool for your category, you will not be cited regardless of how well you score on other dimensions.
The raw probability that a frontier model will generate your brand name when prompted with category-relevant context. This is the most direct measurement of AI brand recall available.
Runs 5 prompt templates across 10 samples each on both OpenAI and DeepSeek, requesting logprobs. The logprobs at the brand-name token position are extracted (using multi-position scanning to avoid the whitespace-token false negative). Bootstrap confidence intervals are computed from the 50 samples per template. The final score is a weighted average across templates, normalized to the 0-100 scale.
Token probability is the purest signal of whether a model has internalized your brand. Low probability means the model is unlikely to spontaneously mention you; high probability means you are part of the model's default vocabulary for your category.
The statistical association between your brand name and category-relevant anchor terms, benchmarked against 3 competitors. Pointwise mutual information (PMI) quantifies whether your brand co-occurs with important terms more or less than chance would predict.
Computes PMI between the brand name and 5 category anchor terms using a text corpus. The same calculation runs for each of the 3 competitors. Scores are normalized relative to the competitor set. Currently uses a seed SQLite corpus; the production target is a Common Crawl slice for broader coverage.
Co-occurrence patterns in training data directly influence how models associate brands with topics. Strong PMI means the model has seen your brand mentioned alongside the right terms frequently enough to form a statistical association.
Which specific layers of each AI model's knowledge stack are aware of your brand. The seven layers are: pretraining data, retrieval augmentation, instruction tuning, RLHF alignment, system prompts, safety filters, and fine-tuning.
Probes 3 providers (OpenAI, Anthropic, DeepSeek) across all 7 knowledge layers using specifically crafted prompts designed to activate each layer independently. Perplexity is marked N/A on 3 layers where the architecture does not separate them. Results are assembled into a 3x7 matrix showing presence/absence/strength per cell.
Knowing which layer knows about you tells you where to focus. If you are present in retrieval but absent from pretraining, you need more training-data exposure (publications, Wikipedia, structured data). If you are in pretraining but blocked by safety filters, that is a different problem entirely.
Whether the data aggregators that feed AI training pipelines are aware of your brand. Also measures whether AI providers actually cite your URLs in their responses.
Two components. First: DuckDuckGo HTML search against a registry of 30 known aggregators (Crunchbase, G2, Capterra, Product Hunt, etc.) to check if your brand appears. Second: URL citation extraction from all 5 provider responses (Perplexity, OpenAI, Anthropic, Gemini, DeepSeek), parsing raw JSON payloads for cited URLs. The v6.2 waterfall cascades from DuckDuckGo to SearXNG to Apify Google SERP when DuckDuckGo returns silent-empty challenge pages.
Aggregators are high-trust sources for AI training data. If Crunchbase, G2, and Wikipedia all describe your brand consistently, models internalize that as ground truth. Citation extraction tells you whether models are actually linking to your content when they mention you.
Content coverage gaps between your brand and competitors. Identifies specific topic clusters where competitor content exists but yours does not, creating retrieval blind spots.
Fetches content from your site and competitor sites (using the v6.2 page-fetch waterfall to handle Cloudflare/Akamai-gated pages via Jina Reader or Firecrawl). Content is chunked and embedded via MiniLM-L6-v2. K-means clustering groups chunks by topic. Gap severity is calculated as the distance between your content clusters and competitor clusters that have no matching content from your site.
Retrieval systems serve the best-matching chunk for a query. If a competitor has content covering a topic and you do not, the competitor's chunk will be served every time that topic comes up. Chunk gaps are specific and fixable: you know exactly which topics to cover.
Whether AI systems fabricate facts about your brand, and how consistently. Tracks per-brand factual persistence across multiple probes and providers.
Reads the output from the fact-archaeology measurement module, which uses DeepSeek to fact-check statements about the brand. Each fact is tracked for persistence (does the same hallucination appear across multiple providers?) and severity (is it a minor attribute error or a fundamental misidentification?). The score inversely reflects hallucination rate and severity.
Hallucinations erode trust in the AI's output about your brand. A model that fabricates your founding year, misattributes your product category, or confuses you with a competitor is actively harming your brand. Tracking hallucination patterns tells you which facts need reinforcement through structured data and authoritative sources.
Whether your AI visibility is in a reinforcing or decaying cycle. Measures velocity across 5 feedback loops by comparing the current audit to prior audit deltas.
Defines 5 feedback loops (entity recognition to citation to training data to retrieval to recommendation). For each loop, calculates a velocity metric from the delta between the current audit and the most recent prior audit for the same URL. Positive velocity means the loop is reinforcing (getting stronger); negative velocity means decay. When no prior audit exists, velocity defaults to a neutral baseline.
AI visibility compounds. A brand that gets cited more gets included in more training data, which increases future citation probability. Conversely, a brand losing citations enters a decay spiral. Loop velocity tells you whether your trajectory is positive or negative, independent of your current absolute score.
The predicted score impact of fixing each identified issue. Answers the question: if you fixed this specific problem, how much would your overall score improve?
For each issue in the audit, calls the LightGBM sidecar (deployed on Railway) with the current feature vector and a simulated fix. The model predicts the resulting score change. When the sidecar is unavailable (503), falls back to a heuristic estimate based on issue severity and dimension weight. The model is currently trained on n=29 audits (R-squared is negative; accuracy will improve as the corpus grows beyond 100 audits).
Not all issues are equal. Fixing a critical entity resolution gap might add 8 points; fixing a minor schema attribute might add 0.3 points. Counterfactual predictions let you prioritize by expected impact rather than by severity alone.
How your brand performs across different query intent types: informational, navigational, transactional, and commercial. Weights mention rate by intent category.
A rule-based classifier categorizes each prompt variant from the phrasing sensitivity module by intent type. Mention rates are calculated per intent category. The score weights transactional and commercial intent more heavily than informational (because these drive revenue), while still penalizing informational absence (because it indicates weak brand awareness in the model).
A brand might be well-known for informational queries ("what is X?") but invisible for commercial queries ("best X for Y"). Intent-weighted measurement reveals whether the model associates you with the queries that actually drive business outcomes.
The exact sentiment and attributes each AI provider associates with your brand. Extracts the adjectives, qualities, and framing that models use when describing you.
Runs lexicon-based sentiment analysis on each provider's raw text output about the brand. Extracts attribute terms (adjectives, descriptive phrases) per provider. Results are compared across providers to identify consensus attributes (mentioned by 3+ providers) vs. outliers. The score reflects sentiment positivity, attribute consistency, and alignment with the brand's stated positioning.
Knowing that AI systems perceive your brand as "affordable but outdated" vs. "innovative but expensive" is critical for positioning. Perception analysis shows you the exact words models use, which tells you what to reinforce or correct through content and structured data.
A 6-component gap waterfall for each competitor. Breaks the total score difference into segments showing exactly which capabilities drive the gap between you and each competitor.
Takes the scores from all 15 capabilities for your brand and each of the 3 competitors identified in classification. Computes a 6-component decomposition: entity/knowledge, content/retrieval, probability/co-occurrence, structural (tokenizer, schema), behavioral (loops, intent), and external (aggregators, citations). Each component shows its contribution to the overall gap, positive or negative.
A raw score comparison tells you that a competitor scores 12 points higher. Decomposition tells you that 8 of those points come from retrieval pool advantage and 4 from better entity resolution, while you actually lead on token probability. This specificity makes the gap actionable.
Ground truth validation. Scrapes real Google AI Overviews for category-specific queries and compares them to the audit's predictions. Tests whether the audit's analysis matches what Google is actually showing users.
Uses SerpAPI to scrape Google search results for 5 category queries (derived from the Stage 1.5 classification). For each query that triggers an AI Overview, extracts the overview text, cited sources, brand mention position, and sentiment. Compares these against the audit's AIO simulation predictions. Persists raw data to the ai_overviews table. This capability is gated behind the SERPAPI_KEY environment variable; without the key, C16 auto-skips.
All other capabilities measure inputs to AI systems. C16 measures the output. It is the only capability that checks whether Google's AI Overview actually cites your brand, providing a direct feedback signal for the entire audit model.