Most AI search teams built their reporting around a set of metrics that made sense through 2025. Referral traffic from LLM platforms was growing. Citation counts were climbing. Visibility scores were going in the right direction. Leadership understood the story because the numbers were clean and directional.
However, that’s getting harder to sustain. And the reason has less to do with what your team is executing and more to do with the fact that the behavior the metrics were designed to measure is changing faster than the measurement frameworks can keep up.
The Referral Traffic Number Was Useful Until It Wasn’t
LLM-referred sessions peaked around July 2025 at roughly 498K total sessions. Since then, they’ve declined about 25%.
If your reporting is anchored to AI referral traffic as a primary KPI, that trend puts you in a difficult position. The number is going down, but that doesn’t necessarily mean AI’s role in how your buyers make decisions went down with it.
There are several possible explanations for the decline, and we don’t have the data to confirm which one is doing the most work.
- AI models may be including fewer outbound links than they did six months ago.
- The links that do get shared may be opening in embedded browsers or internal navigation that analytics platforms don’t pick up as referral traffic.
- And the way people use AI may have shifted away from research-oriented queries that produce clicks and toward more operational use cases where the interaction stays inside the AI interface entirely.
The industry-level data makes the pattern more interesting. Between November 2025 and March 2026, the verticals that had been leading in AI traffic penetration saw the sharpest declines.
Finance went from 1.43% to 0.85%. Legal fell from 1.34% to 0.83%. Health dropped from 0.49% to 0.19%.
At the same time, SMB grew from 0.4% to 0.76%, and SaaS nearly quadrupled from 0.14% to 0.54%.
Finance, legal, and health are categories where AI queries tend to be consultative and research-heavy, and that, quite honestly, the LLM might be able to answer.
SMB and SaaS are more operationally oriented. People in those categories are more likely to be using AI to get work done than to browse for solutions.
If your team is reporting AI referral traffic as a flat number without looking at how the behavior behind it is segmenting, you’re probably missing the more useful story sitting underneath.
Visibility Scores Are Helpful, but They’re Being Treated Like Something They’re Not
The AI visibility tracking category has grown fast. There are now several tools available that let you track how often your brand shows up across AI platforms.
They’re filling a gap that genuinely needed to be filled, and they surface data that didn’t exist two years ago.
The issue is how those outputs get treated once they reach an executive audience.
AI visibility scores work by sending prompts to AI models, recording whether your brand appears in the response, and calculating a score from those results. That score is modeled from controlled testing.
No AI platform shares what real users are actually asking, which means no third-party tool has access to actual query volumes or real user sessions. The score is an estimate, and it’s built from the tool’s own prompt set and methodology.
When that score shows up in a quarterly report, leadership tends to interpret it the same way they’d interpret an improvement in organic traffic or pipeline velocity. But the data behind it is fundamentally different.
A score could shift because the tool changed its prompt set, because the model updated its retrieval sources, or because your content genuinely improved. It’s often hard to tell which of those things is actually driving the change.
Rand Fishkin’s research at SparkToro highlighted this from a different angle.
They ran nearly 3,000 prompts across AI platforms asking for brand recommendations. Fewer than 1 in 100 runs produced the same list of brands. Fewer than 1 in 1,000 produced the same list in the same order.
The outputs are probabilistic. There’s no fixed position in AI the way there’s a fixed ranking on a search results page. There’s a probability that your brand appears in any given response, and that probability moves every time.
That doesn’t make tracking it pointless. It means the tracking works best when it’s used as competitive intelligence and directional signal rather than a performance metric you hold a team accountable to in the same way you’d hold them to pipeline numbers.
The More Useful Metric Might Be Accuracy and Not Frequency
Across our own client work, the most consistently useful signal hasn’t been how often a brand shows up in AI responses, but rather whether what AI says about them is correct.
A brand that appears in 70% of relevant AI responses but gets described with outdated pricing, inaccurate feature comparisons, or framing pulled from a competitor’s content is in a worse position than a brand that appears less often but gets described accurately.
Most current dashboards are built to track frequency: mentions, citations, visibility scores. Very few are set up to evaluate whether the information in those mentions is actually right.
The teams we’ve seen get the most value from their AI measurement are the ones running qualitative audits alongside their quantitative tracking.
They open ChatGPT, Claude, and Perplexity. They run the prompts their buyers actually run. They read the full responses and check whether the model describes their product the way they’d describe it themselves. They look at what sources the model is pulling from and whether any of those sources are outdated, inaccurate, or owned by a competitor.
That kind of audit doesn’t produce a single number for a slide. But it consistently produces insights that change what teams prioritize, and those priorities tend to have a more direct connection to outcomes than a visibility score moving a few points in either direction.
How to Think About Reporting Going Forward
The AI search teams that are going to have the hardest conversations with leadership are the ones locked into reporting frameworks that were designed for a behavior pattern that peaked in mid-2025.
Referral traffic is declining across most verticals, and visibility scores are directional but hard to hold to with the same confidence as traditional search metrics.
All of these metrics tell you something. But the story they tell as a package is getting less complete as AI usage evolves and the behaviors that generate measurable signals shift.
The more productive approach is to be clear about what the data can and can’t show right now. Track referral traffic, but segment it by the kind of buyer behavior your category generates and put the trend in context. Use visibility tools for what they’re good at, which is surfacing competitive intelligence and identifying where your brand does and doesn’t show up. Monitor citations, but pair them with the qualitative audits that tell you whether AI is saying the right things about you.
And be honest with leadership about the measurement gap. The influence AI has on how buyers research and evaluate products is almost certainly growing. The portion of that influence that shows up in our dashboards is getting smaller as usage patterns evolve. The gap between those two things is where the real strategic risk sits, and the teams that acknowledge it openly are going to make better decisions than the ones trying to tell a clean story from metrics that can’t fully support it anymore.
Navigate the future of search with confidence
Let's chat to see if there's a good fit
SEO Jobs Newsletter
Join our mailing list to receive notifications of pre-vetted SEO job openings and be the first to hear about new education offerings.