Hrizn logo
Buyer's Guide

Decoding Your AI Visibility Report (What the Numbers Actually Mean)

If you are a dealer principal, GM, or marketing director receiving a monthly “AI visibility” PDF from a vendor, this page is the decoder. What each headline metric sounds like. What it actually measures. What question to ask before the next invoice. And the one-page report you should be asking for instead.

Quick AnswerLast updated April 2026

Most monthly "AI visibility" PDFs circulating in automotive are built from vendor-invented composite scores derived from non-deterministic LLM outputs. This is a metric-by-metric decoder for the typical report. "Brand Mention Score" is a weighted sum of brand appearances across a vendor-chosen prompt panel. "AI Citation Index" is a count with a vendor-chosen denominator. "Answer Engine Rank" is a position inside a list the vendor’s prompt happened to generate on one run. Under the branding, nearly every vendor AI visibility product belongs to one of three archetypes (prompt panel, simulated browser, citation aggregator), and each has a unique failure mode. The substitute is a six-metric first-party one-page report derived from GA4, server logs, and Google Search Console.

  • No industry-standard "AI rank" exists. If two vendor reports disagree there is no neutral arbiter.

  • Every headline metric is reproducible in the vendor’s direction of choice by changing prompt panel, model, temperature, or trial count.

  • The three archetypes (prompt panel, simulated browser, citation aggregator) each have their own fundamental failure mode.

  • The minimum transparency demand: prompts, model, temperature, trial count, geography, date.

  • Replace or supplement vendor PDFs with a six-metric first-party one-page report. The data you already own is the honest number.

TL;DR
  • Most monthly “AI visibility” PDFs circulating in automotive are built from vendor-invented composite scores derived from non-deterministic LLM outputs.
  • There is no industry-standard “AI rank” today. If one vendor’s report and another vendor’s report disagree, there is no neutral arbiter.
  • Every headline metric in these reports is reproducible in a vendor’s direction of choice by changing the prompt panel, the model version, the temperature, or the trial count.
  • Replace, or at minimum supplement, the PDF with the six-metric first-party report on AI visibility measurement.
Part 1

Anatomy of a Typical Vendor AI Report

The reports follow a predictable structure. They open with a branded “visibility score” rendered as a single number or index. They show a trend line versus the prior month. They list example prompts the AI answered. They include competitor benchmarks without disclosing how the competitive set was chosen. They close with recommendations the vendor’s own product happens to deliver.

None of that is inherently dishonest. What makes it misleading is the absence of methodology. A benchmark report from a research institution would disclose model versions, trial counts, variance, prompt selection protocol, and timing. These reports almost never do. The result is a document that looks like measurement but behaves like marketing.

Below is the decoder for the metrics you are most likely to see, and the question to ask for each.

Part 2

Metric-by-Metric Decoder

Brand Mention Score
What it sounds like

A quantified measure of how often your dealership is mentioned in AI answers, usually rendered as a single number or percentage.

What it actually measures

A weighted sum of brand appearances across a vendor-chosen prompt panel, on a vendor-chosen cadence, run through a vendor-chosen LLM. Change any of those inputs and the number moves. The units are the vendor’s, not the industry’s.

Ask the vendor

Which exact prompts were run, on which models, at which temperature, on which dates, from which IP geography? Can we reproduce the score ourselves?

AI Citation Index
What it sounds like

How frequently AI tools cite your website as a source, normalized to a comparable index.

What it actually measures

Usually a count of how many times the vendor’s prompt panel produced a response that either linked to your domain or (worse) mentioned your brand name in any form. The index denominator is whatever the vendor says it is.

Ask the vendor

What counts as a citation? A linked citation, a brand mention in the answer text, or both? What is the denominator and why?

Answer Engine Rank
What it sounds like

Your rank in AI answers, by analogy with Google ranking.

What it actually measures

There is no such thing as a stable AI rank. See the primer on non-determinism. The number is usually a position within a list that the vendor’s prompt happened to generate on one run, and the list reshuffles on the next run.

Ask the vendor

How many trials were run per prompt? What is the variance? If the answer is one trial, the number is noise, not signal.

LLM Panel Results
What it sounds like

Results from a curated panel of AI engines run in a lab setting.

What it actually measures

Often a headless, incognito session run by a server that does not resemble any real shopper’s context. No personalization, no memory, no follow-up, no prior search history. The least representative query possible.

Ask the vendor

How does this map to a real shopper in our market? What is the evidence that the “panel” predicts customer behavior?

Share of Voice in AI
What it sounds like

The share of AI-generated answers that feature you versus competitors.

What it actually measures

A ratio of vendor-panel mentions (you) to vendor-panel mentions (competitors the vendor chose to include). Competitor selection changes the number before anything else does.

Ask the vendor

Who is the competitive set? Why those brands? What happens to our score if we swap one competitor out?

AI Sentiment Score
What it sounds like

Whether AI answers talk about you positively or negatively.

What it actually measures

LLM outputs graded by another LLM for sentiment. A stacked non-deterministic pipeline where both layers drift. Sentiment scoring is also a known-weak LLM capability at the sentence level.

Ask the vendor

Is the sentiment classifier itself audited? How does it score a human-written benchmark set? What is the inter-run agreement?

Visibility Trend (index over time)
What it sounds like

A time-series showing your AI visibility rising or falling.

What it actually measures

A time-series of the same unstable measurement. If the underlying metric has high per-trial variance and model versions change under the vendor’s feet, the trend line is mostly noise with a smoothing filter applied.

Ask the vendor

Was the prompt panel held constant? Were models held constant? How do you attribute trend movement to our actions versus vendor drift?

Part 3

The Three Report Archetypes

Under the branding, almost every vendor AI visibility product belongs to one of three archetypes. Each has a fundamentally different measurement approach and its own unique failure mode.

Prompt Panel
Mechanics

Vendor runs a fixed list of prompts (“best Honda dealer in [city]”, “most reliable [brand] service”, etc.) through one or more LLM APIs, then aggregates mentions of your brand.

Why it fails

The prompt list is the methodology. A different prompt list produces a different score. There is no industry-standard panel, so cross-vendor comparison is impossible and year-over-year continuity depends on the vendor never adjusting the list.

Simulated Browser
Mechanics

Vendor scripts a headless browser to visit chatgpt.com, perplexity.ai, gemini.google.com, etc., run prompts in the UI, and parse the resulting answer for citations or mentions.

Why it fails

Headless runs explicitly strip out the conditions that shape real answers: account memory, personalization, browser fingerprint, geography, time of day. The answers are the ones a robot from a data-center IP in whatever region gets, not the ones your customers get.

Citation Aggregator
Mechanics

Vendor scrapes AI-generated pages (Perplexity pages, Bing/Copilot answer cards, Brave summaries) across the public web and counts how often your domain or brand appears in citation footnotes.

Why it fails

The most honest of the three archetypes, but still flawed: the corpus is whatever the vendor managed to scrape, deduplication is fragile, and the denominator is rarely disclosed. Also captures only linked citations, not unlinked brand mentions in the answer body.

Part 4

The One-Page Report to Request Instead

Every metric below is reproducible from first-party data you already own or can readily access. No LLM API calls, no prompt panel, no proprietary index. If your vendor cannot produce this page, the measurement playbook shows how to build it yourself in an afternoon.

AI referrer sessions by source (GA4)

Total sessions from chatgpt.com, perplexity.ai, copilot.microsoft.com, gemini.google.com, claude.ai, you.com. Month, month over month, and trailing twelve.

Top AI-referred landing pages

Which pages are earning the referral traffic. This is what AI systems are actually citing you for.

AI crawler hit counts (server logs)

Monthly total fetches from GPTBot, OAI-SearchBot, ChatGPT-User, PerplexityBot, Perplexity-User, ClaudeBot. Raw counts, not ranks.

Branded search impressions (Google Search Console)

GSC impressions and clicks for queries containing your brand name variants, trend over prior period.

Structured data coverage

What percentage of VDPs, model pages, service pages, and core content has valid JSON-LD. First-party number, verified against Rich Results Test or GSC Enhancements.

Content shipped and indexed

Count of new resource, model, or service pages published in the month, and how many are indexed in Google. Ties outputs to the measurement.

Vendor Report vs First-Party Report

DimensionTypical vendor AI reportFirst-party one-page report
Data sourceLLM API outputs, scraped AI pagesYour GA4, your server logs, your GSC
Reproducibility
Non-deterministic; different per run
Same data every time; auditable
Methodology disclosedRarely, sometimes partiallyFully, by definition
Cross-vendor comparableNoYes (industry-standard analytics)
Ties to business outcomesNo documented evidenceLeads, VDP views, service traffic
Survives vendor changeNo; score dies with the contractYes; you own the data
FAQ

Questions Dealers Ask Before Confronting a Vendor

Are you saying every vendor AI visibility report is worthless?

No. A small number of vendors are honest about methodology, disclose prompt panels, report variance, and avoid invented metrics. Those reports are still proxies, but they are defensible proxies. The problem is that the majority of PDFs circulating in the automotive space assemble invented composite scores from non-deterministic outputs and present them as if they were rankings. This page is a decoder for those reports.

Is there any legitimate use for a prompt-panel-based score?

Yes, as a directional supplement, not a primary metric. A prompt panel can be useful when the panel is fixed for 12+ months, variance is reported, model versions are disclosed, and the panel is paired with the first-party signals in the GA4 and server-log playbook. On its own, with none of those safeguards, it is a number shaped primarily by the vendor’s methodology choices.

How do I introduce the substitute report to a vendor without blowing up the relationship?

Frame it as wanting parity, not replacement. Ask the vendor to deliver their existing report plus the six first-party metrics on one page in the same cadence. Within 60 to 90 days it becomes obvious which report moves with your investment and which moves with vendor methodology. Many vendor account managers will welcome the first-party data because it gives them something defensible to show when the vendor-generated score dips.

What if the vendor refuses to disclose methodology?

That is itself a finding. Reproducibility is the minimum bar for a metric anyone pays for. A vendor who will not disclose prompt panel, model, temperature, trial count, or geography is selling a narrative, not a measurement. See the four transparency questions to present on record.

Do any of these scores correlate with real business outcomes?

We have not seen evidence that vendor-invented AI composite scores correlate with leads, sales, or service revenue. First-party signals (AI referrer traffic, crawler logs, branded search) do trend with content and schema investment, and those are the signals a dealer can act on. If a vendor has rigorous evidence of correlation between their score and dealer revenue, ask for the methodology, the sample size, and the confidence interval. The silence tends to be instructive.

Diverse team of dealership professionals standing together
Diverse team of dealership professionals standing together
Don't Wait

Build Before You Need To

The teams gaining ground aren't reacting faster. They're building a content system that works for them even when they're not working on it.

That advantage grows every month.

Start Free

We Rise Together.