Documentation

The 15 Mechanism Capabilities

Reference documentation for every measurement capability in the audit pipeline. Each capability is an independent module with its own score, implementation, and cost.

The audit measures 15 distinct capabilities (labeled C1 through C16, with C10 unused). Each capability runs as an independent module during Stage 2 (Measure) or Stage 3.5 (Post-processing). Each produces a score from 0 to 100. Each has an independent feature flag in the database so it can be enabled or disabled without affecting the rest of the pipeline.

Together, the 15 capabilities account for 66% of the overall audit score (0.044 weight each). The remaining 34% comes from traditional SEO (14%) and the judge pipeline (20%).

Quick Reference

IDCapabilityCost
C1Tokenizer Tax$0.00
C2Entity Resolution~$0.001
C3Retrieval Pool$0.00 (local embeddings)
C4Token Probability~$0.001
C5PMI Co-occurrence$0.00 (local computation)
C6Seven-Source Analysis~$0.002
C7Aggregator Presence$0.00 (free search APIs)
C8Chunk Gaps$0.00 (local embeddings)
C9Hallucination Log~$0.001 (via fact-archaeology)
C11Loop Thresholds$0.00
C12Counterfactual$0.00 (sidecar call)
C13Query Intent$0.00 (rule-based)
C14Perception$0.00 (lexicon-based)
C15Competitive Decomposition$0.00
C16AI Overview~$0.01 per query (SerpAPI)

Detailed Reference

C1Tokenizer Tax$0.00
What it measures

How many tokens your brand name costs in the two dominant tokenizer vocabularies (cl100k_base and o200k_base) compared to your competitors. A brand name that consumes 3 tokens is structurally disadvantaged against a 1-token competitor in every LLM context window, every completion, and every retrieval ranking calculation.

How it works

Uses js-tiktoken to encode the brand name plus up to 3 competitors and 10 common vocabulary terms in both cl100k and o200k. The score is derived from the ratio of your brand's token count to the category average. Single-token brands score near 100; brands exceeding the average token count score proportionally lower.

Why it matters

Token cost is a permanent structural factor. You cannot change how a tokenizer encodes your name, but you can understand the disadvantage and compensate in other dimensions.

C2Entity Resolution~$0.001
What it measures

Whether AI systems recognize your brand as a distinct entity with structured attributes, or whether it is just an unresolved string. This capability probes knowledge graphs, runs schema audits, and checks for Wikidata presence.

How it works

Sends 3 entity probes across 5 knowledge providers (Google Knowledge Graph, Wikidata, DBpedia Spotlight, plus two LLM-based resolvers). Each probe asks a differently phrased question about the brand. Results are cross-referenced with Wikidata for structured entity data and with the site's JSON-LD for schema completeness. The v6.2 data source waterfall cascades through the entity stack with confidence scoring per source.

Why it matters

Entity resolution is the gateway to AI citation. If a model does not recognize your brand as a distinct entity, it cannot accurately attribute facts, distinguish you from similarly named entities, or include you in structured comparisons.

C3Retrieval Pool$0.00 (local embeddings)
What it measures

Whether your content lands in the retrieval window when AI systems search for information in your category. Tests both your owned content and third-party sources (Wikipedia, and optionally Reddit and Hacker News in v6.2 deep tier).

How it works

Collects owned content from the crawled site and relevant Wikipedia articles. Content is chunked, then embedded via MiniLM-L6-v2 (384 dimensions, runs locally with zero API cost). Embeddings are stored in pgvector. Five category-specific prompts (generated from the Stage 1.5 classification) are embedded and compared against the stored chunks using top-20 cosine similarity. The score reflects the proportion of top-20 results that come from owned content vs. competitor or third-party sources.

Why it matters

Retrieval-augmented generation (RAG) is how modern AI systems incorporate current information. If your content is not in the retrieval pool for your category, you will not be cited regardless of how well you score on other dimensions.

C4Token Probability~$0.001
What it measures

The raw probability that a frontier model will generate your brand name when prompted with category-relevant context. This is the most direct measurement of AI brand recall available.

How it works

Runs 5 prompt templates across 10 samples each on both OpenAI and DeepSeek, requesting logprobs. The logprobs at the brand-name token position are extracted (using multi-position scanning to avoid the whitespace-token false negative). Bootstrap confidence intervals are computed from the 50 samples per template. The final score is a weighted average across templates, normalized to the 0-100 scale.

Why it matters

Token probability is the purest signal of whether a model has internalized your brand. Low probability means the model is unlikely to spontaneously mention you; high probability means you are part of the model's default vocabulary for your category.

C5PMI Co-occurrence$0.00 (local computation)
What it measures

The statistical association between your brand name and category-relevant anchor terms, benchmarked against 3 competitors. Pointwise mutual information (PMI) quantifies whether your brand co-occurs with important terms more or less than chance would predict.

How it works

Computes PMI between the brand name and 5 category anchor terms using a text corpus. The same calculation runs for each of the 3 competitors. Scores are normalized relative to the competitor set. Currently uses a seed SQLite corpus; the production target is a Common Crawl slice for broader coverage.

Why it matters

Co-occurrence patterns in training data directly influence how models associate brands with topics. Strong PMI means the model has seen your brand mentioned alongside the right terms frequently enough to form a statistical association.

C6Seven-Source Analysis~$0.002
What it measures

Which specific layers of each AI model's knowledge stack are aware of your brand. The seven layers are: pretraining data, retrieval augmentation, instruction tuning, RLHF alignment, system prompts, safety filters, and fine-tuning.

How it works

Probes 3 providers (OpenAI, Anthropic, DeepSeek) across all 7 knowledge layers using specifically crafted prompts designed to activate each layer independently. Perplexity is marked N/A on 3 layers where the architecture does not separate them. Results are assembled into a 3x7 matrix showing presence/absence/strength per cell.

Why it matters

Knowing which layer knows about you tells you where to focus. If you are present in retrieval but absent from pretraining, you need more training-data exposure (publications, Wikipedia, structured data). If you are in pretraining but blocked by safety filters, that is a different problem entirely.

C7Aggregator Presence$0.00 (free search APIs)
What it measures

Whether the data aggregators that feed AI training pipelines are aware of your brand. Also measures whether AI providers actually cite your URLs in their responses.

How it works

Two components. First: DuckDuckGo HTML search against a registry of 30 known aggregators (Crunchbase, G2, Capterra, Product Hunt, etc.) to check if your brand appears. Second: URL citation extraction from all 5 provider responses (Perplexity, OpenAI, Anthropic, Gemini, DeepSeek), parsing raw JSON payloads for cited URLs. The v6.2 waterfall cascades from DuckDuckGo to SearXNG to Apify Google SERP when DuckDuckGo returns silent-empty challenge pages.

Why it matters

Aggregators are high-trust sources for AI training data. If Crunchbase, G2, and Wikipedia all describe your brand consistently, models internalize that as ground truth. Citation extraction tells you whether models are actually linking to your content when they mention you.

C8Chunk Gaps$0.00 (local embeddings)
What it measures

Content coverage gaps between your brand and competitors. Identifies specific topic clusters where competitor content exists but yours does not, creating retrieval blind spots.

How it works

Fetches content from your site and competitor sites (using the v6.2 page-fetch waterfall to handle Cloudflare/Akamai-gated pages via Jina Reader or Firecrawl). Content is chunked and embedded via MiniLM-L6-v2. K-means clustering groups chunks by topic. Gap severity is calculated as the distance between your content clusters and competitor clusters that have no matching content from your site.

Why it matters

Retrieval systems serve the best-matching chunk for a query. If a competitor has content covering a topic and you do not, the competitor's chunk will be served every time that topic comes up. Chunk gaps are specific and fixable: you know exactly which topics to cover.

C9Hallucination Log~$0.001 (via fact-archaeology)
What it measures

Whether AI systems fabricate facts about your brand, and how consistently. Tracks per-brand factual persistence across multiple probes and providers.

How it works

Reads the output from the fact-archaeology measurement module, which uses DeepSeek to fact-check statements about the brand. Each fact is tracked for persistence (does the same hallucination appear across multiple providers?) and severity (is it a minor attribute error or a fundamental misidentification?). The score inversely reflects hallucination rate and severity.

Why it matters

Hallucinations erode trust in the AI's output about your brand. A model that fabricates your founding year, misattributes your product category, or confuses you with a competitor is actively harming your brand. Tracking hallucination patterns tells you which facts need reinforcement through structured data and authoritative sources.

C11Loop Thresholds$0.00
What it measures

Whether your AI visibility is in a reinforcing or decaying cycle. Measures velocity across 5 feedback loops by comparing the current audit to prior audit deltas.

How it works

Defines 5 feedback loops (entity recognition to citation to training data to retrieval to recommendation). For each loop, calculates a velocity metric from the delta between the current audit and the most recent prior audit for the same URL. Positive velocity means the loop is reinforcing (getting stronger); negative velocity means decay. When no prior audit exists, velocity defaults to a neutral baseline.

Why it matters

AI visibility compounds. A brand that gets cited more gets included in more training data, which increases future citation probability. Conversely, a brand losing citations enters a decay spiral. Loop velocity tells you whether your trajectory is positive or negative, independent of your current absolute score.

C12Counterfactual$0.00 (sidecar call)
What it measures

The predicted score impact of fixing each identified issue. Answers the question: if you fixed this specific problem, how much would your overall score improve?

How it works

For each issue in the audit, calls the LightGBM sidecar (deployed on Railway) with the current feature vector and a simulated fix. The model predicts the resulting score change. When the sidecar is unavailable (503), falls back to a heuristic estimate based on issue severity and dimension weight. The model is currently trained on n=29 audits (R-squared is negative; accuracy will improve as the corpus grows beyond 100 audits).

Why it matters

Not all issues are equal. Fixing a critical entity resolution gap might add 8 points; fixing a minor schema attribute might add 0.3 points. Counterfactual predictions let you prioritize by expected impact rather than by severity alone.

C13Query Intent$0.00 (rule-based)
What it measures

How your brand performs across different query intent types: informational, navigational, transactional, and commercial. Weights mention rate by intent category.

How it works

A rule-based classifier categorizes each prompt variant from the phrasing sensitivity module by intent type. Mention rates are calculated per intent category. The score weights transactional and commercial intent more heavily than informational (because these drive revenue), while still penalizing informational absence (because it indicates weak brand awareness in the model).

Why it matters

A brand might be well-known for informational queries ("what is X?") but invisible for commercial queries ("best X for Y"). Intent-weighted measurement reveals whether the model associates you with the queries that actually drive business outcomes.

C14Perception$0.00 (lexicon-based)
What it measures

The exact sentiment and attributes each AI provider associates with your brand. Extracts the adjectives, qualities, and framing that models use when describing you.

How it works

Runs lexicon-based sentiment analysis on each provider's raw text output about the brand. Extracts attribute terms (adjectives, descriptive phrases) per provider. Results are compared across providers to identify consensus attributes (mentioned by 3+ providers) vs. outliers. The score reflects sentiment positivity, attribute consistency, and alignment with the brand's stated positioning.

Why it matters

Knowing that AI systems perceive your brand as "affordable but outdated" vs. "innovative but expensive" is critical for positioning. Perception analysis shows you the exact words models use, which tells you what to reinforce or correct through content and structured data.

C15Competitive Decomposition$0.00
What it measures

A 6-component gap waterfall for each competitor. Breaks the total score difference into segments showing exactly which capabilities drive the gap between you and each competitor.

How it works

Takes the scores from all 15 capabilities for your brand and each of the 3 competitors identified in classification. Computes a 6-component decomposition: entity/knowledge, content/retrieval, probability/co-occurrence, structural (tokenizer, schema), behavioral (loops, intent), and external (aggregators, citations). Each component shows its contribution to the overall gap, positive or negative.

Why it matters

A raw score comparison tells you that a competitor scores 12 points higher. Decomposition tells you that 8 of those points come from retrieval pool advantage and 4 from better entity resolution, while you actually lead on token probability. This specificity makes the gap actionable.

C16AI Overview~$0.01 per query (SerpAPI)
What it measures

Ground truth validation. Scrapes real Google AI Overviews for category-specific queries and compares them to the audit's predictions. Tests whether the audit's analysis matches what Google is actually showing users.

How it works

Uses SerpAPI to scrape Google search results for 5 category queries (derived from the Stage 1.5 classification). For each query that triggers an AI Overview, extracts the overview text, cited sources, brand mention position, and sentiment. Compares these against the audit's AIO simulation predictions. Persists raw data to the ai_overviews table. This capability is gated behind the SERPAPI_KEY environment variable; without the key, C16 auto-skips.

Why it matters

All other capabilities measure inputs to AI systems. C16 measures the output. It is the only capability that checks whether Google's AI Overview actually cites your brand, providing a direct feedback signal for the entire audit model.

© 2026 ResourceAIBangalore · New York