Recognition ≠ Selection — The B2Ai Thesis v2.6.1 by Ernest Dwane Johnson

ToastDeck Research  ·  Working Paper  ·  v2.6.1  ·  June 2026
Recognition

Selection
The Upstream Business Layer Between Humans and AI Systems
Visibility to AI is not the same as being selected by AI.
Ernest D. Johnson
Author & Researcher · ToastDeck Research
Contents

1. Executive Thesis

In one of ToastDeck's early field observations, an AI system was asked to recommend trusted providers in a service category. The system correctly identified a national platform operating in that category, described it accurately, and placed it in the right industry. Then it weakened the recommendation with a caveat. The hedge was not a hallucination. The system had interpreted the business correctly — and that very correctness softened the recommendation.

The hedge in that observation was not random. The model had sufficient evidence to recognize the provider — but insufficient evidence to recommend it without qualification. The likely causes are familiar to anyone who has diagnosed a B2Ai failure: a thin corroboration graph, inconsistent category signals across platforms, or a representation that described the business accurately but not specifically enough to justify confidence against better-attested competitors. The caveat was not a malfunction. It was the model doing exactly what it is designed to do — expressing calibrated uncertainty when the evidence does not support a clean recommendation. That is why visibility alone is not enough. A business can be known and still be passed over, not because the model cannot find it, but because the model cannot sufficiently justify it.

The urgency of the B2Ai thesis is compounded by the directional shift from conversational AI to agentic AI. In conversational interfaces, a user asks a question and the AI generates an answer — the human still clicks, calls, or decides. In agentic interfaces, the AI is given a task and executes it: book the service, route the order, find and contact the provider, complete the transaction. In that environment, the human may never evaluate the shortlist. The model selects, acts, and the business either receives the customer or does not. The B2Ai thesis is relevant now, before the conversational phase ends, because the selection conditions being established today — which entities the model knows, trusts, and can justify — are the same conditions that will govern agentic routing tomorrow. Businesses that address the recognition-to-selection gap now will be positioned for the agentic environment. Businesses that wait until agents are the norm will be correcting a deficit that has already compounded. Even before full agentic commerce matures, the same selection conditions already shape AI-generated recommendations today; the agentic shift extends the argument's urgency but is not its only proof.

Businesses are entering a new commercial environment.

For the last two decades, the dominant question was whether a business could be found by people through search, social platforms, directories, marketplaces, paid ads, and content discovery. That question still matters. But it is no longer sufficient.

AI systems are becoming commercial intermediaries. They do not merely retrieve links; they interpret a business, weigh it against competitors, summarize its reputation, and decide whether to recommend it. Sometimes they hedge that recommendation. Sometimes they leave the business out of the answer altogether.

The new question is not only: Can customers find us?
The new question is: Can AI systems accurately understand, represent, trust, and select us when customers ask for help?

That shift creates the need for a new commercial layer: B2Ai — Business-to-AI. The emerging upstream layer where businesses must become legible, trustworthy, and selectable to AI systems before they are presented to humans, agents, or downstream decision workflows.

The central thesis is: Recognition is not selection.

The AI system may know that a business exists and still not recommend it. A business may be visible but not chosen. A company may appear in a response but be hedged, caveated, misunderstood, miscategorized, displaced by competitors, or excluded when the model must make a recommendation.

That difference between being known and being selected is the business gap B2Ai is built to study.

2. From Visibility to Selection

Traditional search created a ranked-choice environment. A user searched. A platform returned a list of links. The user evaluated options, clicked, compared, and decided. In that environment, visibility meant appearing where the user could find you.

AI systems change the shape of the decision environment. A user may ask:

  • Who is the best senior care provider near Cleveland?
  • Which insurance company is most trustworthy for small business coverage?
  • What law firm should I contact for this type of issue?
  • Which product is best for this use case?
  • Compare these providers and recommend one.
  • Find me a local company that can handle this.

The AI system does not only retrieve. It interprets the task, judges which entities are relevant, and weighs the signals it can find before selecting what the user sees.

That means businesses are now competing inside an AI-mediated selection process. The user rarely sees the full candidate environment, and almost never learns which entities were left out or why one was chosen over another. The AI system becomes a selection intermediary.

This creates a new set of business risks:

  • The AI recognizes the business but does not select it.
  • The AI selects a competitor instead.
  • The AI describes the business inaccurately.
  • The AI confuses the business with another entity.
  • The AI adds caveats that weaken trust.
  • The AI cannot justify the business clearly.
  • The AI lacks enough corroborating evidence to recommend the business confidently.
  • The AI treats the business as generic even when it has specific expertise.

The new commercial problem is not merely ranking. It is representation, trust, selection, justification, and the resolution of ambiguity.

Three Rooms Sit on One Foundation — ToastDeck Research
Figure (conceptual illustration): Three rooms sit on one foundation. The AI Selection Room is the gap most businesses are not measuring.
By the time a buyer reaches your website — ToastDeck Research
Figure (conceptual illustration): By the time a buyer reaches your website, the decision may have already been shaped somewhere else.

3. Recognition Is Not Selection

Recognition means the AI system knows an entity exists. Selection means the AI system chooses, recommends, includes, ranks, compares, or justifies the entity in response to a user's request.

These are different outcomes.

For the purposes of this thesis, selection encompasses any AI system output that routes, recommends, ranks, or delegates to a specific external entity in response to a user's decision context. This includes: a direct recommendation in a conversational interface ("I recommend X for this"); inclusion in a short list returned to a user making a comparison decision; priority routing by an agentic system executing a task on the user's behalf; and preference assignment in an automated workflow where the model must choose between competing providers. The common property across all of these is that the model is making or shaping a consequential choice, not merely surfacing information. Different interfaces trigger selection differently — a chat assistant weights narrative justifiability, an agent weights API reliability and transactability, a comparison surface weights structured attribute matching — but the underlying condition is the same: the model is committing to an entity rather than presenting options for the human to evaluate.

A business can be recognized but not selected. A brand can be described accurately but not recommended. A company can appear in an answer and still be known without being trusted enough to win the recommendation.

The distinction matters because AI systems compress the field of choice. In traditional search, being on the first page may still create opportunity — the user can scan multiple options. In AI-generated answers, the system may return a single recommendation, a short list, or a summary that frames the user's entire perception.

The business that is selected receives disproportionate attention. The business that is omitted may never enter the decision.

This is the core B2Ai problem: the commercial value does not come from recognition alone. It comes from being selected under model constraints. ToastDeck's research focuses on that gap.

3.1 Formal Grounding: Recognition ≠ Selection

Recognition ≠ Selection is not just a slogan. It maps onto the candidate-generation-versus-ranking architecture of large-scale recommender systems — an architecture that applies directly to AI answer engines and business selection.

Covington, Adams, and Sargin's 2016 YouTube recommendations paper frames its system as what the authors call the classic two-stage information retrieval dichotomy: a candidate generation model followed by a separate ranking model.1

B2Ai applies that architecture to business selection, but decomposes the admission problem more finely. In recommender systems, candidate generation often bundles together retrieval, availability, and candidate admission into one stage. In B2Ai, Layer 0 handles the machine path to the entity: can the system reach it, parse it, and find it? Recognition then asks whether the system correctly identifies and resolves the entity once it is available as a candidate.

Selection corresponds to the ranking problem: among available and recognized candidates, what gets surfaced, recommended, routed to, or chosen under constraint?

Key implication: Downstream ranking cannot surface an entity that never entered the candidate environment. Many businesses interpret AI invisibility as a ranking problem when it may be an admission problem. If the entity was never reached, parsed, discovered, or correctly recognized, optimization at the ranking layer cannot fix the failure. This is why Layer 0 matters.

Recommender-systems research also supports the consequence of candidate-set composition. Bower et al. study two-step recommender systems where ranking is applied only to the already-produced candidate set, showing that candidate-set imbalance shapes downstream exposure outcomes.2 The keystone for the Recognition ≠ Selection distinction is Covington; Bower supports the consequence.

The same two-stage structure is now visible from the practitioner side as well. Generative-engine-optimization practitioners increasingly describe AI answer engines as running retrieval-augmented generation in two steps — a retrieval step that pulls candidate sources from an index, followed by a generation step that synthesizes those sources and chooses which to attribute. That the recognition-vs-selection split arises independently in both the academic recommender-systems literature and applied GEO practice strengthens the claim that it is a real structural property of these systems, not a framing peculiar to this thesis.

A common objection to the B2Ai thesis is that AI systems are stochastic and constantly updated — so how can a business reliably engineer selectability? The answer is that B2Ai does not claim to guarantee selection. It claims that the conditions which make selection more probable are knowable, testable, and improvable. The goal is not to control the model. The goal is to ensure the entity's ground truth — its actual identity, category, location, credentials, and corroborating evidence — is legible, consistent, and well-sourced enough that the model can justify a recommendation when the query warrants one. A business that is ambiguous, inconsistently represented, or poorly corroborated gives the model a reason to hedge or displace. A business that is clear, consistent, and well-attested gives the model what it needs to select confidently. That is not gaming the system. That is meeting the system's evidentiary standard.

Recognition vs Selection — ToastDeck Research
Figure (conceptual illustration): Recognition is not Selection. The system knows the entity exists vs. the system chooses the entity.

4. B2Ai in Context

B2Ai means Business-to-AI. It describes the commercial relationship between businesses and AI systems that interpret, represent, recommend, exclude, or transact with them. ToastDeck does not claim to have coined B2Ai; this thesis focuses on the upstream selection layer within the broader Business-to-AI commerce shift.

B2Ai is not a replacement for B2B or B2C. It is a layer upstream of both. Before a customer sees a business through an AI system, the AI system must first form a representation of that business. If that representation is weak, inconsistent, outdated, or ambiguous, the business may lose selection before the human buyer ever evaluates it.

B2Ai in relation to SEO, GEO, and AEO

SEO handles search visibility — whether a business appears in ranked results. GEO and AEO address answer-surface visibility — whether a business appears in AI-generated answers, citation surfaces, and answer engines. That work matters and remains foundational for the platforms that operate within its scope.

B2Ai studies the upstream cross-system layer: how AI systems form business representations, and how they use those representations to compare, justify, caveat, and ultimately select or exclude an entity across multiple systems at once. A business can appear in AI-generated answers — visible, cited, present — and still fail the B2Ai test.

B2Ai begins where platform-specific optimization is no longer enough: when multiple AI systems form, compare, and act on representations of an entity outside the control of any single search index.

The two-layer environment

In May 2026, Google published official guidance stating that optimization for generative AI features in Google Search remains continuous with traditional SEO.34 Google explicitly frames GEO and AEO as still SEO from Google Search's perspective, and names tactics site owners can ignore for Google Search: llms.txt files, chunking content, AI-specific rewriting, inauthentic mentions, and special schema for generative AI search.

That position is important, but it is bounded. Google's guidance applies to Google Search experiences. It does not resolve the broader multi-system environment in which businesses are interpreted, compared, selected, caveated, excluded, or eventually transacted with by AI systems across ChatGPT, Claude, Perplexity, Gemini, Meta AI, Grok, vertical assistants, browser agents, and emerging agentic interfaces.5

The market is therefore splitting into two related but distinct layers. Layer one is platform-specific AI search optimization — for Google Search, the answer is still SEO. Layer two is cross-system AI interpretation and selection — this is the layer B2Ai studies.

Why selection is commercially meaningful

Google has clarified that its spam policies apply to generative AI responses in Google Search, including attempts to manipulate AI-generated responses.6 That clarification confirms that AI-generated recommendations are now commercially important enough to require explicit anti-manipulation enforcement.

In 2026, Google removed AI Overviews for specific health-related queries following an external investigation into misleading medical summaries.7 The important point is not that Google removed an entire health category. It is that the platform selectively intervened when generated answers created unacceptable risk in a sensitive domain. AI selection has become commercially meaningful.

Published research reinforces the instability of the current environment. A 2023 Stanford-affiliated study of four generative search engines found that on average only 51.5% of generated sentences were fully supported by their citations and only 74.5% of citations supported their associated statements.8 A 2026 Yelp / Morning Consult survey of 2,202 U.S. adults found that while 65% had used an AI-powered search tool in the prior six months, only 15% trusted that information "a lot," 63% double-checked AI search results against other sources, and 72% said AI platforms should always show where their information comes from.9

Related work and market position

AI-visibility monitoring tools. A growing set of commercial products — among them Profound, Brandlight, and the AI-visibility features now offered within established SEO platforms such as Ahrefs, Conductor, and others — measure whether and how often a brand appears in AI-generated answers across systems. This is recognition measurement. B2Ai asks the next question: when a user requests a recommendation, is the entity selected, caveated, displaced, or excluded, and which upstream condition explains the outcome.

Reputation and review platforms. Platforms oriented around reviews and reputation increasingly report on AI-surface presence. Their signal feeds Authority Resolution (Layer 4) but is one input among several; B2Ai treats reputation as a contributor to authority, not as the selection outcome itself.

GEO/AEO consulting and content tooling. Generative- and answer-engine-optimization practice focuses on producing content and structure that improves answer-surface visibility. B2Ai is the cross-system layer above it: how representations form, compete, and resolve to selection across systems no single optimization program controls.

The boundary in one line: If a tool tells you whether you appear, it is doing recognition work. B2Ai begins at the question recognition cannot answer: among the entities that appear, why is one selected and another excluded — and what, upstream, made the difference?

Mapping GEO/AEO tactics onto the B2Ai layers

The peer-reviewed root of the field is the Princeton GEO study, which found that a small set of content modifications can raise citation visibility by roughly 30–40%, with the strongest being the addition of citations, credible quotations, and verifiable statistics.10

Where each proven tactic lives in B2Ai. Statistics, citations, and credible quotations strengthen Layer 4 (Authority Resolution). Answer-first structure, comparison tables, and FAQ formatting strengthen Layer 2 (Representation) and the discovery-citation function of 0C. Schema markup and clear entity identity serve Layer 1 (Recognition) and Layer 2. Cross-platform presence is a Layer 3 (Consistency) tactic. Recency and active updating serve Layer 7 (Monitoring/Freshness).

Why the tactics are necessary but not sufficient. Most of the tactics above primarily optimize recognition-side conditions. The B2Ai claim is that they are not sufficient: an entity can execute every GEO tactic, become highly citable, and still lose the recommendation when the model must choose under constraint — because selection (Layer 5) is governed by competitor justifiability, ordering effects, and trust calibration that citation-optimization does not address. Recognition is not selection.

5. Selection vs. Substitution

Substitution does not describe a degraded selection outcome; it describes the disappearance of the selection event itself. B2Ai studies how AI systems interpret external entities, weigh them against one another, and decide whether to recommend, caveat, exclude, or transact with them inside commercial workflows. In some cases, however, the AI system no longer needs an external entity for a specific task layer of the workflow; at that point the commercial problem changes from selection to substitution.11

Selection occurs when the AI system mediates between competing external entities. Substitution occurs when the AI system performs the task directly rather than selecting a provider to perform it. The distinction is task-specific, not category-specific: AI systems rarely replace entire businesses at once; they absorb discrete functions, outputs, or workflow layers while surrounding layers remain dependent on external parties.12

The deciding variable is accountability: if the model can itself become the accountable party for the deliverable (execution, warranty, liability, compliance, continuity), it substitutes; where it cannot, it must still select. This accountability axis explains why substitution appears first in narrow, digitally native, low-liability tasks and remains constrained in trust-sensitive, regulated, relational, or physical-world domains.

This boundary condition does not weaken the B2Ai thesis; it sharpens its scope. Where substitution is feasible, the AI becomes the deliverable; where accountability prevents substitution, the AI remains a selector, mediator, and justificatory intermediary. B2Ai is the study of that mediation wherever external entities must be interpreted, compared, trusted, and selected.
Selection Failure Map — ToastDeck Research
Figure (conceptual illustration): The six failure modes — why businesses are recognized but not selected.

6. Operating Layers: The Eight-Layer Framework

ToastDeck's working model separates B2Ai into operating layers. These layers are not final doctrine. They are a research structure for studying how AI systems interpret and select entities.

B2Ai is organized across eight operating layers. Layer 0, the Machine Path Layer, contains three co-equal pre-selection conditions: Access (0A), Ingestibility (0B), and Discovery (0C).

Layer 0 — Machine Path Layer (Access, Ingestibility, Discovery)
Layer 1 — Recognition / Existence Resolution
Layer 2 — Representation
Layer 3 — Consistency
Layer 4 — Authority Resolution
Layer 5 — Selection Behavior
Layer 6 — Justification / Output Mediation
Layer 7 — Monitoring / Freshness / Re-grounding

The boundaries are drawn to be diagnostically separable: a failure isolated to Layer 0 (blocked access) has different causes, owners, and remedies than a Layer 5 selection displacement, and the value of the structure is that it tells a diagnostician which kind of failure they are looking at.

An important boundary on what these layers claim. The eight layers are not a claim about the internal architecture of any AI system. ToastDeck does not assert knowledge of how OpenAI, Google, Anthropic, or any provider internally retrieves, ranks, or selects entities. The layers are diagnostic layers: a model of observable failure modes and their distinct causes, owners, and remedies. They describe outcomes that can be measured from the outside, not mechanisms inferred about the inside.

A note on "resolution." Resolution is an act that occurs at several layers, not a standalone layer. Entity resolution occurs in Layer 1; authority resolution occurs in Layer 4; selection resolution occurs across Layers 5 and 6. The word remains in the framework; it is not counted as its own layer.

B2Ai Eight-Layer Framework — ToastDeck Research
Figure (conceptual illustration): The B2Ai Eight-Layer Framework. Layer 5 Selection Behavior is the core layer.

Layer 0The Machine Path Layer

The operating layers above Layer 0 begin with Recognition — whether an AI system correctly identifies an entity. Those layers carry an unstated assumption: that the entity's own surfaces — its website, its pages, its self-published evidence — can be reached, read, and found by the systems forming the representation.

That assumption does not always hold. Before recognition, representation, authority, or selection can be assessed, an entity must pass three co-equal pre-selection conditions. Layer 0 asks: when a machine seeks this entity, can it reach the surface, parse the surface, and find the surface as a candidate?

This is the trap: a human browser runs the script automatically, so the page looks complete to its builder — but that is the one observer whose judgment does not govern machine selection.

Layer 0 is the only layer in the framework that is falsifiable before it is interpretive. It makes testable predictions that single observations can confirm or disconfirm. For that reason, the three Layer 0 conditions must be diagnosed before scoring the higher layers. If Layer 0 fails, the system may still know the entity through third-party sources — directories, reviews, knowledge panels, or prior training data — but the entity loses control over its own primary evidence, and any assessment based on its self-published surfaces begins from a false premise.

0A — Access

Access asks whether the requesting system is permitted to retrieve the entity's surfaces at all. Modern sites increasingly sit behind infrastructure intermediaries — content delivery networks, web application firewalls, and bot-management layers — that decide which automated clients may retrieve a page.

Failure mode: Access Exclusion — the entity's content is present and well-formed, but the AI system is denied retrieval by an infrastructure layer before the content is served.

This is no longer an edge case. On July 1, 2025, Cloudflare announced a permission-based default for AI crawler access on new domains, requiring site owners to choose whether AI crawlers may retrieve their content.13 Cloudflare's own measurements put OpenAI's crawl-to-referral ratio at roughly 1,700 to 1 and Anthropic's at roughly 73,000 to 1 as of mid-2025.14 Infrastructure providers now offer per-crawler allowlists, purpose declarations, and paid-access flows as opt-in controls.1516

Diagnostic signature — 0A. Access Exclusion returns full content to an ordinary browser request but a refusal specifically to a request identifying as an AI crawler. It is detected only by varying the requesting identity.

0B — Ingestibility

Ingestibility asks whether, once a surface is reached, the meaning actually arrives in a form the consuming system can parse without executing code. The operating question is prior to quality and prior to recognition: When a machine requests this entity's primary surfaces, does the meaning arrive in the response itself, or only after a program runs?

The mechanism: client-side rendering and the rendering gap. In client-side rendering, the initial response is a near-empty shell plus a script bundle; the content is assembled afterward by executing that script in a browser. A 2024 study by Vercel and MERJ found that none of the major AI crawlers it measured render JavaScript — including OpenAI's crawlers, Anthropic's ClaudeBot, Meta's external agent, ByteDance's Bytespider, and PerplexityBot.17 The failure is specific to client-side rendering: content that resolves only in the post-execution document.18

Failure mode: Scale Inversion — a larger entity may have more resources, stronger operations, better reviews, and a broader real-world footprint, yet still lose selection to a smaller competitor whose site resolves cleanly in the initial response. The smaller entity is not necessarily more authoritative — it is simply more available to the machine.

Independent practitioner testing has documented this signature in the field. In a 2025 case study, ChatGPT, Perplexity, and Claude each failed to retrieve the site's content and in several instances stated explicitly that the content could not be read because it required JavaScript.19 Affected URLs were demoted, pushed to non-primary citation surfaces and stripped of snippets.20

Layer 0 — Three Conditions — ToastDeck Research
Figure (conceptual illustration): Layer 0 Three Conditions — 0A Access, 0B Ingestibility, 0C Discovery. Reach + Parse + Discover = Candidate Admission.
Same URL Different Document — ToastDeck Research
Figure (conceptual illustration): Same URL, Different Document. Human browser sees full content. AI crawler sees a shell. Before AI can recommend content, it has to retrieve it.

Diagnostic signature — 0B. Substrate Invisibility returns a 200 response with an empty or shell document to every client. Detected by reading the body: the entity's own site, viewed as raw HTML before script execution, returns little or no substantive content.

0C — Discovery

Discovery asks whether the entity surfaces as a candidate at all — whether it can be found, crawled, indexed, or retrieved into the candidate environment before recognition begins. A surface can be both reachable (0A) and parseable (0B) and still never be discovered as a candidate.

Layer 0C uses discovery citations, not authority citations. A discovery citation helps a machine system find, crawl, index, or retrieve an entity. An authority citation helps justify why an entity should be trusted, cited, recommended, or selected in an answer. Same surface form, different function, different layer.

Failure mode: Discovery Absence — the surface is reachable and parseable, but it is never assembled into the candidate environment for the queries it should match, so it cannot be recognized, represented, or selected no matter how strong its higher-layer signals are.

Discovery operates through more than one pathway: maintained index, live search queries, link and citation graph traversal, and query fan-out. An entity can be discoverable along one pathway and absent from another. This is why Discovery is a condition in its own right and not a restatement of Access.

Why Layer 0 is a B2Ai concern and not merely a technical SEO note

The Machine Path Layer is not a tactic; it is the substrate on which every higher layer depends. When an entity's own surfaces are unreachable, unreadable, or undiscoverable, the entity surrenders the one input over which it has full control — its own framing, its own claims, its own specific differentiation — and cedes its representation entirely to third-party sources. This makes the Machine Path Layer an upstream cause of failure modes including generic description, Category Drift, Location Drift, and Competitor Displacement.

A necessary boundary on the evidence: the principal measurement is vendor-affiliated research.21 The methodology is nonetheless sound and the sample large — over a billion crawler fetches — and the underlying measurement is a factual observation independent of the recommendation drawn from it. The rendering gap documented in 2024 has been corroborated in independent practitioner testing through early 2026,22 but is subject to change and belongs to Monitoring (Layer 7) as much as to a one-time audit. Browser-based agents may not share this gap.23

Layer 1Recognition / Existence Resolution

Does the AI system correctly identify what the entity is, and resolve it as the intended business, source, or organization? This is the baseline interpretive layer. Discovery (0C) means the entity can be found as a candidate; Recognition means the system correctly understands what the candidate is. If the entity is unknown or misresolved — confused with another entity, or identified as the wrong thing — it cannot be accurately represented or selected.

Layer 2Representation

How does the AI system describe the entity? This includes name, category, location, services, products, affiliations, reputation, and role in the market. Recognition asks "what is it?" Representation asks "how is it described?" Representation errors include: wrong category, wrong location, wrong services, wrong ownership, wrong credentials, outdated information, competitor confusion, and generic or incomplete descriptions.

Layer 3Consistency

Is the entity represented consistently across AI systems, prompts, and retrieval conditions? An entity may be described one way by ChatGPT, another way by Claude, another way by Gemini, and another way by Perplexity. Model disagreement is not noise. It is a signal of representation instability.

Layer 4Authority Resolution

Which sources, claims, entities, or signals does the AI system treat as trustworthy when forming an answer? Authority may come from: official website clarity, schema, reviews, listings, directories, third-party mentions, press, institutional signals, and source consistency. Representation describes the entity; Authority Resolution decides what should be trusted about it relative to competing signals. Authority is not just "popularity." It is the model's ability to justify why this entity belongs in the answer.

Authority signals are not uniform in kind. For diagnostic and remediation purposes, B2Ai separates them into two classes. Verifiable signals are deterministic facts the model can confirm or falsify against structured sources: business registration, licensed credentials, physical address, service area, accreditation, years in operation, regulatory standing. These either resolve or they do not — they are treated under the deterministic Evidence Class. Subjective signals are probabilistic inputs that the model weights but cannot confirm: customer sentiment aggregated from reviews, brand tone inferred from content, perceived expertise derived from publication history, trust inferred from mention volume and source quality. These vary by source, recency, and retrieval context, and are reported under the probabilistic Evidence Class. The practical consequence is that conflicting signals — strong verifiable facts paired with weak or negative subjective signals — do not cancel each other out cleanly. A business with verified credentials and poor review sentiment may still receive a caveated recommendation, because the model's authority resolution is a weighted synthesis, not a binary pass/fail. Understanding which signal class is causing a selection failure determines the remediation path: verifiable failures require factual correction; subjective failures require sustained evidence accumulation over time.

Authority Signal Classes B2Ai Diagnostic & Remediation Reference
Signal Class Type Examples Diagnostic Path
Verifiable Deterministic
Pass / Fail
Business registration, licensing, physical address, service area, accreditation, years in operation, regulatory standing Requires factual correction or source-data update. Addressable by the entity directly.
Subjective Probabilistic
Weighted
Review sentiment, brand tone, perceived expertise, mention volume and quality, publication history Requires sustained evidence accumulation over time. Cannot be corrected in a single action.
See §7 Evidence Class doctrine for the deterministic/probabilistic reporting rules that govern how these signal classes are measured and reported in B2Ai diagnostics.

Layer 5Selection Behavior

Does the AI system select the entity when the user asks a recommendation, comparison, or decision-oriented question? This is the core layer; the thesis is named for it. Authority Resolution is trust evaluation; Selection is the choice made after or during that evaluation. Trusted does not always mean selected.

The mechanism of selection instability. A 2025 study by Bito, Ren, and He provides the first in-depth investigation of position bias in LLM-based recommendation, reporting that LLM-based recommendation models are highly sensitive to the order in which candidate items appear in a prompt: minor changes in the appearance order of candidates can disproportionately change which entity the model recommends.24 The instability is architectural, not incidental.

Selection instability is the measurable variation in how AI systems recognize, compare, recommend, exclude, justify, caveat, or displace an entity across platforms, prompts, constraints, competitor sets, and time. It is distinct from stochastic variance.

The measurement method mirrors the B2Ai diagnostic approach. Bito et al. quantify position bias by generating a ranking from a shuffled candidate list, then a second ranking from a reversed list, and measuring the similarity between the two.25 A recommendation that depends on prompt serialization rather than the entity's merits is a recommendation a competitor can dislodge.

Layer 6Justification / Output Mediation

When an AI system selects or excludes an entity, how does it explain, frame, support, caveat, or mediate that decision? A weak justification can reduce trust. A strong justification can increase conversion. A caveated justification may signal unresolved trust or quality concerns.

Layer 6 is also the accounting home for a distinction that spans the whole framework: whether a given selection failure is addressable by the entity or structural to the platform — arising from model priors, platform policy, retrieval design, or commercial incentives.2627

This layer is also where Source Boundary Failure mechanically occurs — the conflation of facts across sources during generation, in which the model attributes to one entity material drawn from an adjacent source.

Layer 7Monitoring / Freshness / Re-grounding

Does the entity's AI representation and selection behavior remain stable, current, and evidence-grounded over time? B2Ai is not a one-time audit problem. AI systems change. Sources change. Competitors change. Model behavior changes.

Freshness is grounded in the Temporal Information Retrieval literature. The canonical survey by Campos, Dias, Jorge, and Jatowt distinguishes recency-sensitive queries from time-sensitive queries with explicit temporal constraints.28 A 2025 worked example applies a recency prior to RAG retrieval, illustrating how a freshness term can be combined with relevance at retrieval time.29 Independent evidence also indicates that the substrate of AI answers is itself temporally unstable: cited web sources can become inaccessible or change after the fact, motivating longitudinal tracking rather than one-time audits.30

7. Diagnostic Architecture

Evidence Class: how findings are reported

Deterministic findings are reported as facts. Some conditions resolve to a single, reproducible observation that any party can confirm — whether an entity's primary surface returns substantive content in raw HTML (Layer 0B), or whether a crawler identity receives a 200 or a 403 (Layer 0A).

Probabilistic findings are reported as distributions. Selection Behavior (Layer 5) does not resolve to a single value. The same entity, under the same query, can be selected in one run and not the next. Reporting any single run as "the" result is a category error. These findings are reported as distributions — across runs, orderings, competitor sets, platforms, and time — with their variability stated, not hidden.

Conflating the two classes is the most common way AI-visibility measurement misleads. A probabilistic selection result presented as a deterministic fact overstates a single favorable draw. The same result presented honestly — "the model selected you in 6 of 10 runs, displaced by a specific competitor in the other 4" — is both more truthful and more actionable.

Perturbation testing

Perturbation testing is the diagnostic method used to test whether selection behavior holds when surrounding conditions change. The core entity or decision question remains fixed while the test varies candidate order, competitor set, prompt framing, or comparison structure.

This method is grounded in LLM recommendation and evaluation research. Shi et al. extend the same concern into LLM-as-a-Judge settings, introducing repetition stability, position consistency, and preference fairness across pairwise and list-wise evaluations.31 Zheng et al. likewise document position, verbosity, and self-enhancement biases and recommend answer-order swapping as a mitigation.32

Entity-aware diagnostics

One mistake in the current market is treating every subject as a "brand." AI systems do not evaluate every entity through the same signal structure. A local business, product, person, organization, and brand each have different selection triggers, authority signals, representation risks, and resolution paths. The diagnostic layer identifies the type of entity and the selection problem.

The resolution discipline

Diagnosis exposes the recognition-to-selection gap; it does not close it. Closing it is a separate discipline: continuous, evidence-bound resolution as AI systems, sources, and competitors shift over time.

ToastDeck's resolution discipline is SOMAR: Selection, Output Mediation & Authority Resolution. In this thesis, SOMAR is treated as the operational bridge between diagnosis and evidence-bound correction. The proprietary operating methods are outside the scope of this paper. To be clear about what SOMAR is not: it does not prescribe manipulation of AI systems. It organizes evidence correction, authority strengthening, representation repair, and longitudinal monitoring around the specific failure layer a diagnosis identifies.

8. Field Study Foundation

ToastDeck's B2Ai thesis is not only theoretical. It is grounded in field-study work observing how AI systems represent and select entities across systems, industries, and prompts.

The first signed-consent case study supporting this research is Senior Sitters Club LLC, a pre-launch non-medical caregiver registry in Northeast Ohio. The Senior Sitters Club case predates the formal articulation of Layer 0. Its findings remain relevant to representation, selection, and failure-mode behavior, while substrate-level Machine Path diagnostics are now treated as a prior gate in the framework.

Field Study Snapshot: Senior Sitters Club LLC

Scope of evidence. This is a working thesis and diagnostic framework, grounded in bounded field observations — not a population-level prevalence study. The findings below describe observed behavior in a specific signed-consent case across four AI systems at a specific point in time. They illustrate the framework's failure modes; they do not establish market-wide rates.

The source audit was the VisibilityIQ Full Audit: Senior Sitters Club, generated May 8, 2026. The audit window was April 21–28, 2026. It ran 24 queries across ChatGPT, Claude, Gemini, and Perplexity, with each query executed five times to assess consistency.

Aggregate scores. Overall AI Visibility 64/100 (Mixed). Layer scores: Representation 75/100 (strongest), Selection 49/100 (weakest), Resolution 73/100.

The central finding. The clearest selection failure appeared in the category-level query "Best senior care providers in Cleveland, Ohio." ChatGPT consistently named Visiting Angels, Home Instead, and Comfort Keepers before Senior Sitters Club, in 5 of 5 runs. The dominant failure mode is therefore Recognized but Not Selected: the entity was known and describable, but competitors won the buyer-facing recommendation frame.

Distributional findings (reported per the Evidence Class rule). ChatGPT ranked the three national competitors ahead of SSC in 5 of 5 category-level runs. Claude framed SSC as a "sitting service" in 5 of 5 runs (Representation Instability: category wording weakening trust-sensitive positioning). Gemini relied on a Yelp-only citation pattern in 3 of 5 runs (Authority Displacement: a narrow citation graph). ChatGPT returned correct disambiguation in 4 of 5 runs and conflated the entity with an unrelated brand in 1 of 5 (a Recognition-layer instability). Perplexity returned stable service and service-area descriptions in 5 of 5 runs, the strongest baseline.

Industry research is also beginning to document the instability and platform divergence of AI visibility measurement. Birdeye's 2026 AI visibility research frames AI search visibility as cross-platform and affected by differences in how systems surface, cite, and recommend businesses.33

Field Study Snapshot: Senior Sitters Club — ToastDeck Research
Figure (illustrative summary): Senior Sitters Club B2Ai audit pattern — strong representation, weak selection. Authoritative scores are those stated in the text of this section (Overall 64/100; Representation 75/100; Selection 49/100; Resolution 73/100); any rounded values shown in the graphic are illustrative only.

Observed failure modes

  • Scale Inversion — A larger entity is disadvantaged because its self-published surface is unreadable to indexing systems.
  • Location Drift — An entity is associated with the wrong city, market, headquarters, or service area.
  • Category Drift — An entity is classified under the wrong or an overly broad category.
  • Competitor Displacement — The AI system selects competitors not because the entity is unknown, but because competitors are easier to justify.
  • Platform Caveat Penalty — Marketplaces or aggregators are described accurately but weakened by caveats such as "quality may vary by provider."
  • Representation Instability — Different AI systems describe the same entity differently, revealing unresolved ambiguity.
  • Substrate Invisibility — An entity is omitted or misrepresented because its primary self-published surface is unreadable to the systems forming the representation. Scoped to 0B.
  • Access Exclusion — An entity's content is present and well-formed, but AI retrieval is denied by an infrastructure layer. Scoped to 0A.
  • Discovery Absence — An entity is reachable and parseable but never assembled into the candidate environment. Scoped to 0C.

Correction Resilience and Source Boundary Failure

Source Boundary Failure occurs when the model attributes to one entity facts, claims, or identity drawn from an adjacent source, conflating two distinct things into one representation. This is documented in the source-attribution literature. A 2026 study measured that the factual accuracy of citations drops by approximately 42% on average as the number of tool calls scales from 2 to 150.34

First field instance. On June 1, 2026, a generative system conflated an unrelated commercial entity into its representation of this thesis, drawing identity material from an adjacent source rather than the canonical document. Under direct challenge, the system corrected cleanly. This is recorded as the first Source Boundary Failure observed and logged under this framework.

9. Trust-Sensitive Domains

B2Ai carries the highest stakes in industries where trust and accuracy are the product: insurance, healthcare, legal, finance, senior care, local service businesses, education, public-facing organizations, and products with safety, trust, or compliance implications. These industries cannot afford AI systems inventing or misstating credentials, licenses, services, locations, disclosures, affiliations, claims, reviews, safety information, eligibility, coverage, or professional qualifications.

For these entities, AI misrepresentation is not only a marketing issue. It can create reputational, legal, compliance, and customer-trust risk.

Empirical stakes in high-stakes domains

The risk in these domains is not hypothetical. A 2026 physician-led red-teaming study published in npj Digital Medicine evaluated four public chatbots on 222 patient-posed primary-care questions and found problematic-response rates ranging from 21.6% to 43.2% and unsafe-response rates from 5% to 13%.35 A 2024 Nature Medicine study using 2,400 real patient cases found that state-of-the-art models diagnose significantly worse than physicians and degrade further when they must gather information themselves rather than receiving it pre-curated.36

These findings sharpen why caveats matter in trust-sensitive selection. Models are systematically overconfident when they verbalize certainty, so the confidence a model expresses is not a reliable signal of the confidence it should hold.37 In a high-stakes category, a caveat may therefore be miscalibrated in either direction — hedging a sound provider or failing to hedge an unsound recommendation. Accurate representation and justified, well-calibrated selection are not marketing niceties in these domains; they are the difference between a recommendation that helps and one that harms.

10. Implications for Businesses

Businesses need to stop asking only: Are we visible? They also need to ask:

  • Are we represented accurately?
  • Are we selected when relevant?
  • Are competitors selected instead?
  • Are we trusted enough to be recommended?
  • Are we caveated in a way that weakens conversion?
  • Are our services clear enough for AI systems to classify?
  • Are our locations and service areas unambiguous?
  • Are third-party signals strong enough to justify recommendation?
  • Are AI systems confusing us with another entity?
  • Are we improving or drifting over time?

This requires a new operating discipline: the business must manage its machine-facing identity. The goal is not to game the model. The goal is to make the truth about the entity legible enough to be selected.

What good looks like: A strong B2Ai profile is not simply an entity with a lot of content. It is an entity that AI systems can accurately understand, consistently represent, and confidently select. The strongest entities are not necessarily the loudest. They are the most legible under model constraints.

11. Implications for AI Systems

AI systems are increasingly making or shaping commercial judgments. They are not neutral mirrors. They are interpretive systems that compress source environments into answers, decide which entities are relevant, generate explanations, and shape user trust — and they may become connected to agents that take action. This makes AI selection behavior a business-critical object of study.

AI selection will also be shaped by commercial arrangements, not legibility alone. In monetized environments, platform incentives, placement economics, and commerce partnerships may influence which entities are surfaced, favored, or transacted with.3839 B2Ai therefore concerns the layer a business can actually influence — the entity's clarity, corroboration, and machine-readable trust. That layer does not guarantee selection, but in a monetized environment it is the precondition for competing at all.

12. Final Thesis

The market is moving from search visibility to AI selection readiness.

SEO remains foundational for Google Search. AI visibility remains useful for understanding where an entity appears. But the deeper commercial question is selection. AI systems now interpret entities, compare them, and decide whether to recommend, caveat, exclude, or act on them — all before a human buyer makes a decision. That creates a new upstream commercial layer:

B2Ai: Business-to-AI. The cross-system commercial layer where businesses and entities are interpreted, trusted, selected, caveated, excluded, or eventually transacted with by AI systems across the broader generative and agentic environment.

Recognition is not selection.
Visibility is not trust.
Mention is not recommendation.
Citation is not resolution.

The entities that win in this layer will be the ones AI systems can understand, verify, justify, and select under constraint. The next commercial frontier is not only whether people can find a business. It is whether AI systems can correctly understand it, accurately represent it, and confidently select it. That is the B2Ai thesis.

Appendix A — Canonical Glossary

These definitions are canonical for the B2Ai framework as of v2.6.1. Adjacency notes mark the boundary between a term and its nearest neighbor to prevent conceptual drift.

B2Ai (Business-to-AI)

The upstream commercial layer where businesses must become legible, trustworthy, and selectable to AI systems before they are presented to humans, agents, or downstream decision workflows. A layer upstream of B2B and B2C, not a replacement for either.

SOMAR (Selection, Output Mediation & Authority Resolution)

ToastDeck's resolution discipline: the operational bridge between diagnosis and evidence-bound correction. It identifies whether a selection failure is addressable by the entity, structural to the platform, or caused by unstable authority and representation signals. Proprietary operating methods are outside the scope of this paper.

Machine Path Layer (Layer 0)

The first of the eight operating layers. One layer containing three co-equal pre-selection conditions — Access (0A), Ingestibility (0B), and Discovery (0C) — that together determine whether an entity can enter the machine-mediated decision environment at all.

Access (0A)

The Layer 0 condition testing whether the requesting system is permitted to retrieve the entity's surfaces. Failure mode: Access Exclusion — content present and well-formed, but retrieval refused by an infrastructure layer before content is served. Detected by varying the requesting identity.

Ingestibility (0B)

The Layer 0 condition testing whether substrate legibility holds under real machine retrieval — whether meaning arrives in the response itself or only after a program runs. Failure mode: Substrate Invisibility.

Substrate Legibility

The property of a page or entity surface being machine-readable and usable. The property that Ingestibility checks for.

Substrate Invisibility

The Layer 0B failure mode where the entity may exist and may even be reachable, but the machine cannot meaningfully parse, extract, or use it. Diagnostic signature: raw HTML before script execution returns little or no substantive content. Scoped strictly to 0B.

Discovery (0C)

The Layer 0 condition testing whether the entity surfaces as a candidate at all — whether it can be found, crawled, indexed, or retrieved into the candidate environment before recognition begins.

Discovery Citation

A citation whose function is findability: it helps a machine system find, crawl, index, or retrieve an entity. Operates at Layer 0C. Adjacency: Same surface form as an authority citation, but a different function at a different layer.

Authority Citation

A citation whose function is trust and persuasion: it helps justify why an entity should be trusted, cited, recommended, or selected in an answer. Operates at the higher layers. Adjacency: Discovery citations make an entity findable; authority citations make an entity persuasive.

Recognition

The condition where an AI system correctly identifies what an entity is. Adjacency: Discovery means the entity can be found as a candidate. Recognition means the system correctly understands what the candidate is.

Selection

The condition where an AI system chooses, recommends, ranks, routes to, or relies on an entity under a user constraint or decision context. Adjacency: Authority Resolution is trust evaluation. Selection is the choice made after or during that evaluation. Trusted does not always mean selected.

Selection Instability

The measurable variation in how AI systems recognize, compare, recommend, exclude, justify, caveat, or displace an entity across platforms, prompts, constraints, competitor sets, and time. Adjacency: Stochastic variance is run-level noise. Selection instability is pattern-level signal that changes business meaning, trust position, or competitive framing.

Evidence Class

The rule governing how a finding may be reported, determined by whether the finding is deterministic or probabilistic. Deterministic findings are reported as facts. Probabilistic findings are reported as distributions across runs, orderings, competitor sets, platforms, and time. Adjacency: Not a style preference but a condition of framework integrity.

Perturbation Testing

The diagnostic method for probabilistic findings: hold the entity or decision question fixed while varying candidate order, competitor set, prompt framing, or comparison structure, and observe whether the selection outcome holds. Adjacency: The procedure is deterministic and repeatable; the behavior it reveals is probabilistic and is therefore reported as a distribution, not a single verdict.

Scale Inversion

The condition where a larger, more established, or more visible business is disadvantaged in AI-mediated selection because the system resolves authority through signals that favor smaller, more specific, better-structured, or more machine-legible competitors. Adjacency: Not simple underperformance. It is the inversion of expected advantage when scale or market presence fails to translate into AI selection.

Substitution vs. Selection

The boundary between tasks an AI system can perform directly and decisions where it must choose an accountable external entity. Adjacency: Substitution means the AI does the work. Selection means the AI chooses who or what should do the work.

Appendix B — Research Grounding Map

This appendix maps load-bearing external claims to their verified primary sources. Tier 1 sources anchor an architectural claim; Tier 2 sources support a consequence or mechanism.

B2Ai ClaimAnchor SourceTier
Recognition ≠ Selection (candidate generation vs. ranking)Covington, Adams & Sargin (2016), RecSys — "classic two-stage IR dichotomy"Tier 1
Candidate-set composition shapes downstream exposureBower et al. (2022), arXiv:2209.05000Tier 2
Selection instability / Layer 5 mechanismBito, Ren & He (2025), arXiv:2508.02020 — position bias via shuffled-vs-reversed perturbationTier 1 (L5)
Source Boundary Failure / conflation during synthesisOnweller et al. (2026), arXiv:2605.06635 — ~42% drop in factual accuracy at scaleTier 1 (§8)
Freshness / Layer 7 — temporal IR field anchorCampos, Dias, Jorge & Jatowt (2014), ACM Computing Surveys 47(2) Art. 15Tier 1 (L7)
Freshness / Layer 7 — RAG recency-prior exampleGrofsky (2025), arXiv:2509.19376 — single-domain worked exampleTier 2
Caveat Penalty mechanism (calibration / overconfidence)Xiong et al. (2024), arXiv:2306.13063, ICLR 2024 — systematic overconfidenceTier 1 (§9)
Documented unsafe-advice rates (medical)Draelos et al. (2026), npj Digital Medicine 9:241 — problematic 21.6–43.2%, unsafe 5–13%Tier 1 (§9)
Degradation toward real practice (clinical)Hager et al. (2024), Nature Medicine 30(9):2613–2622Tier 1 (§9)
Perturbation testing / LLM-as-a-Judge position biasShi et al. (2025), IJCNLP-AACL — repetition stability, position consistencyTier 2
LLM-as-a-Judge position-bias mitigationZheng et al. (2023), arXiv:2306.05685 / NeurIPS — answer-order swappingTier 2
GEO tactic→layer mapping (Authority/Representation)Aggarwal et al. (2024), KDD, arXiv:2311.09735 — ~30–40% citation visibility increaseTier 1 (§4)
Senior Sitters Club field-study snapshotVisibilityIQ Full Audit: Senior Sitters Club, May 8, 2026; 24 queries × 4 systems × 5 runsPrimary field source
Notes & Citations
Paul Covington, Jay Adams, and Emre Sargin. "Deep Neural Networks for YouTube Recommendations." Proceedings of the 10th ACM Conference on Recommender Systems (RecSys '16), pp. 191–198. DOI: 10.1145/2959100.2959190. The paper describes its system as split according to the classic two-stage information retrieval dichotomy: a candidate generation model followed by a separate ranking model.
Amanda Bower, Kristian Lum, Tomo Lazovich, Kyra Yee, and Luca Belli. "Random Isn't Always Fair: Candidate Set Imbalance and Exposure Inequality in Recommender Systems." arXiv:2209.05000, 2022. Presented at the FAccTRec Workshop, RecSys 2022. https://arxiv.org/abs/2209.05000
Google Search Central, "Optimizing your website for generative AI features on Google Search." Developer documentation, last updated May 15, 2026. https://developers.google.com/search/docs/fundamentals/ai-optimization-guide
Google Search Central Blog, "A new resource for optimizing for generative AI in Google Search." Announcement post, May 15, 2026. https://developers.google.com/search/blog/2026/05/a-new-resource-for-optimizing
Google for Developers, "Getting started with Universal Commerce Protocol on Google." UCP is described as an open standard for commerce enabling agentic actions on AI Mode in Google Search and Gemini, including direct buying. https://developers.google.com/merchant/ucp
Google Search Central, "Spam policies for Google web search." Includes attempts to manipulate generative AI responses in Google Search. https://developers.google.com/search/docs/essentials/spam-policies
Anthony Ha, "Google removes AI Overviews for certain medical queries." TechCrunch, January 11, 2026. https://techcrunch.com/2026/01/11/google-removes-ai-overviews-for-certain-medical-queries/
Nelson F. Liu, Tianyi Zhang, and Percy Liang. "Evaluating Verifiability in Generative Search Engines." Findings of the ACL: EMNLP 2023. arXiv:2304.09848. Reported 51.5% citation recall and 74.5% citation precision on average across 1,450 queries. https://arxiv.org/abs/2304.09848
Yelp and Morning Consult, "Americans Use AI But Don't Trust It." Published April 14, 2026. Survey of 2,202 U.S. adults, Feb 26–28, 2026, margin of error approx. ±2 points. https://blog.yelp.com/news/americans-use-ai-but-dont-trust-it-thats-a-problem-worth-solving/
Pranjal Aggarwal et al. "GEO: Generative Engine Optimization." KDD '24, Barcelona, August 2024. arXiv:2311.09735. https://arxiv.org/abs/2311.09735
S. Nageeb Ali, Nicole Immorlica, Meena Jagadeesan, and Brendan Lucier. "Flattening Supply Chains: When do Technology Improvements lead to Disintermediation?" arXiv:2502.20783, March 2025. https://arxiv.org/abs/2502.20783
California Management Review (Insights), "The Rise of AI Intermediaries: How Agentic Systems Are Rewiring Customer Relationships." UC Berkeley Haas, April 29, 2026. https://cmr.berkeley.edu/2026/04/the-rise-of-ai-intermediaries-how-agentic-systems-are-rewiring-customer-relationships/
Cloudflare, "Cloudflare Just Changed How AI Crawlers Scrape the Internet-at-Large." Press release, July 1, 2025. https://www.cloudflare.com/press/press-releases/2025/cloudflare-just-changed-how-ai-crawlers-scrape-the-internet-at-large/
Matthew Prince (Cloudflare), "Content Independence Day: no AI crawl without compensation!" Cloudflare Blog, July 1, 2025. Reports OpenAI crawl-to-referral ratio approx. 1,700:1 and Anthropic approx. 73,000:1 as of June 2025. https://blog.cloudflare.com/content-independence-day-no-ai-crawl-without-compensation/
Cloudflare, "Control content use for AI training." Cloudflare documentation, 2025. https://developers.cloudflare.com/bots/concepts/bot/verified-bots/ai-crawlers/
Cloudflare, "Introducing pay per crawl." Cloudflare Blog, 2025. Documents paid-access flows for crawlers, including HTTP 402 (Payment Required) responses. https://blog.cloudflare.com/introducing-pay-per-crawl/
Giacomo Zecchini, Alice Roberts, Malte Ubl, and Ryan Siddle (Vercel and MERJ), "The rise of the AI crawler." Vercel Blog, December 17, 2024. Reports that no major AI crawler measured renders JavaScript (OpenAI: OAI-SearchBot, ChatGPT-User, GPTBot; Anthropic: ClaudeBot; Meta-ExternalAgent; ByteDance: Bytespider; PerplexityBot). https://vercel.com/blog/the-rise-of-the-ai-crawler
Zecchini, Roberts, Ubl, and Siddle (Vercel and MERJ), "The rise of the AI crawler," re-cited for the boundary condition that content present in the initial response remains interpretable even when JavaScript frameworks are used. https://vercel.com/blog/the-rise-of-the-ai-crawler
Glenn Gabe (G-Squared Interactive), "AI Search and JavaScript Rendering [Case Study]." August 11, 2025. Independent replication on a fully client-side-rendered production site. https://www.gsqi.com/marketing-blog/ai-search-javascript-rendering/
Glenn Gabe (G-Squared Interactive), "AI Search and JavaScript Rendering [Case Study]," re-cited for observed URL demotion, non-primary citation placement, and stripped snippets. https://www.gsqi.com/marketing-blog/ai-search-javascript-rendering/
Zecchini, Roberts, Ubl, and Siddle (Vercel and MERJ), "The rise of the AI crawler," re-cited for the vendor-affiliated billion-fetch measurement caveat. https://vercel.com/blog/the-rise-of-the-ai-crawler
Glenn Gabe (G-Squared Interactive), "AI Search and JavaScript Rendering [Case Study]," re-cited as independent practitioner corroboration of the rendering gap through 2025/early 2026. https://www.gsqi.com/marketing-blog/ai-search-javascript-rendering/
OpenAI, "Introducing ChatGPT Atlas." Product announcement, October 21, 2025. Establishes the architectural distinction: indexing crawlers do not execute JavaScript, whereas the agentic browser Atlas operates a full browser environment. https://openai.com/index/introducing-chatgpt-atlas/
Ethan Bito, Yongli Ren, and Estrid He. "Evaluating Position Bias in Large Language Model Recommendations." arXiv:2508.02020, August 2025. Reports LLM-based recommendation models are highly sensitive to the order of candidate items in a prompt, measured by comparing rankings generated from shuffled versus reversed candidate lists. https://arxiv.org/abs/2508.02020
Bito, Ren, and He, "Evaluating Position Bias in Large Language Model Recommendations," re-cited for the shuffled-versus-reversed candidate-list measurement method. https://arxiv.org/abs/2508.02020
European Commission, Case AT.39740 — Google Search (Shopping), decision of June 27, 2017, fine approx. EUR 2.42 billion; upheld on appeal by the EU Court of Justice in 2024. https://ec.europa.eu/commission/presscorner/detail/en/IP_17_1784
International Center for Law & Economics, "The Case for Self-Preferencing." ICLE research spotlight, 2026. https://laweconcenter.org/spotlights/self-preferencing/
Ricardo Campos, Gaël Dias, Alípio M. Jorge, and Adam Jatowt. "Survey of Temporal Information Retrieval and Related Applications." ACM Computing Surveys 47(2), Article 15, July 2014. DOI: 10.1145/2619088. Canonical Temporal IR survey establishing the distinction between recency-sensitive and time-sensitive queries.
J. Grofsky. "Solving Freshness in RAG: A Simple Recency Prior." arXiv:2509.19376, September 2025. A single-author preprint demonstrating a recency-prior scoring method for retrieval-augmented generation, tested on one domain dataset; cited here as a recent worked example, not a field-wide result.
Hailey Onweller, Elias Lumer, Austin Huber, Pia Ramchandani, Vamse Kumar Subbiah, and Corey Feld. "Cited but Not Verified: Parsing and Evaluating Source Attribution in LLM Deep Research Agents." arXiv:2605.06635, May 2026 (PwC Commercial Technology and Innovation Office). https://arxiv.org/abs/2605.06635
Lin Shi, Chiyu Ma, Wenhua Liang, Xingjian Diao, Weicheng Ma, and Soroush Vosoughi. "Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge." Proceedings of the 14th IJCNLP and 4th AACL, pp. 292–314, 2025. DOI: 10.18653/v1/2025.ijcnlp-long.18. https://aclanthology.org/2025.ijcnlp-long.18/
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, et al. "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena." arXiv:2306.05685; NeurIPS 2023 Datasets and Benchmarks Track. https://arxiv.org/abs/2306.05685
Birdeye, "AI visibility in 2026: The secrets behind how AI picks winners." February 18, 2026. https://birdeye.com/blog/ai-search-visibility-study/
Onweller et al., "Cited but Not Verified," re-cited for the approximately 42% average decline in factual citation accuracy as tool calls scale from 2 to 150. https://arxiv.org/abs/2605.06635
Rachel L. Draelos, Samina Afreen, Barbara Blasko, et al. "Large language models provide unsafe answers to patient-posed medical questions." npj Digital Medicine 9:241 (2026). DOI: 10.1038/s41746-026-02428-5. Physician-led red-teaming of four public chatbots on 222 patient-posed primary-care questions; problematic-response rates from 21.6% to 43.2% and unsafe-response rates from 5% to 13%.
Paul Hager, Friederike Jungmann, Robbie Holland, et al. "Evaluation and mitigation of the limitations of large language models in clinical decision-making." Nature Medicine 30(9):2613–2622 (2024). DOI: 10.1038/s41591-024-03097-1.
Miao Xiong, Zhiyuan Hu, Xinyang Lu, Yifei Li, Jie Fu, Junxian He, and Bryan Hooi. "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs." arXiv:2306.13063; accepted at ICLR 2024. Benchmarks confidence elicitation across five LLMs and finds that models are systematically overconfident when verbalizing certainty.
European Commission, Case AT.39740 — Google Search (Shopping), re-cited for platform incentives, placement economics, and commercial surfacing concerns. https://ec.europa.eu/commission/presscorner/detail/en/IP_17_1784
International Center for Law & Economics, "The Case for Self-Preferencing," re-cited for the commercial-incentive boundary around platform placement and self-preferencing. https://laweconcenter.org/spotlights/self-preferencing/