Topic HubReference

Embeddings in Advertising.

Vector representations are becoming the semantic infrastructure layer for agentic advertising, contextual targeting, lookalikes, clean rooms, and cross-media decisioning.

Embeddings turn words, pages, shows, ads, users, households, products, and behaviors into numerical vectors that can be compared by similarity. In advertising, that means audiences, contexts, creatives, and signals can be matched by meaning — not only by IDs, keywords, or fixed taxonomies.

Embeddings in Advertising — objects become an embedding space that drives decisions, under governance. EMBEDDINGS IN ADVERTISING Objects what advertising knows audience — an advertising object encoded as a vector in the embedding space. audience context — an advertising object encoded as a vector in the embedding space. context creative — an advertising object encoded as a vector in the embedding space. creative content — an advertising object encoded as a vector in the embedding space. content product — an advertising object encoded as a vector in the embedding space. product household — an advertising object encoded as a vector in the embedding space. household signal — an advertising object encoded as a vector in the embedding space. signal campaign — an advertising object encoded as a vector in the embedding space. campaign Embedding space — objects become vectors here, so meaning can be compared by similarity, distance, clusters, centroids, and cosine rather than by matching IDs. THE MODEL Embedding space vector · similarity · distance clusters · centroids · cosine Advertising decisions what advertising does lookalike — an advertising decision driven by comparing meaning in the embedding space. lookalike contextual match — an advertising decision driven by comparing meaning in the embedding space. contextual match suppression — an advertising decision driven by comparing meaning in the embedding space. suppression creative fit — an advertising decision driven by comparing meaning in the embedding space. creative fit signal activation — an advertising decision driven by comparing meaning in the embedding space. signal activation measurement — an advertising decision driven by comparing meaning in the embedding space. measurement agent recommendation — an advertising decision driven by comparing meaning in the embedding space. agent recommendation Governance rail — consent, provenance, freshness, output policy, evaluation, and audit sit under every comparison, so meaning-matching stays accountable. GOVERNANCE RAIL consent — a governance control applied to every embedding-driven comparison. consent provenance — a governance control applied to every embedding-driven comparison. provenance freshness — a governance control applied to every embedding-driven comparison. freshness output policy — a governance control applied to every embedding-driven comparison. output policy evaluation — a governance control applied to every embedding-driven comparison. evaluation audit — a governance control applied to every embedding-driven comparison. audit Compare meaning, not just IDs — under governance.
Objects become vectors; vectors drive advertising decisions — under a governance rail.

Embeddings are not a privacy shortcut. They are a semantic matching layer that still needs governance, provenance, consent, evaluation, and standards.

Fast read

What it is
A topic hub for embeddings, vectors, semantic similarity, and their role in advertising infrastructure.
Best for
AdTech, MarTech, data, clean-room, AI, DSP, SSP, publisher, measurement, and product leaders trying to understand how embeddings change targeting, activation, and agentic workflows.
Core idea
Embeddings make similarity computable. Advertising can use that for audiences, contexts, creative, signals, and measurement.
Main risk
Treating embeddings as anonymous, portable, or explainable by default.
Where it connects
Agentic Audiences / UCP, AdCP, Signal Containerization, Enterprise Data Collaboration, Semantic Infrastructure, DSP / Agentic Buying, and BI / MMM.
Best next read
Embeddings: The Next Frontier in Advertising?
In one view

Embeddings in one view.

The simplest way to understand embeddings: they turn messy real-world objects into vectors so machines can compare similarity at scale.

Embeddings in one view — object to vector to space to output; similarity is computable. EMBEDDINGS IN ONE VIEW Content — a page, article, or asset turned into a vector so its meaning is comparable. OBJECT Content Audience — a person or segment turned into a vector that lives in the same space as content. OBJECT Audience Creative — an ad or message turned into a vector so it can be matched against the rest. OBJECT Creative Content → vector — the object is encoded into coordinates the model can place in space. → vector Audience → vector — the object is encoded into coordinates the model can place in space. → vector Creative → vector — the object is encoded into coordinates the model can place in space. → vector The embedding space — one shared coordinate system where distance encodes similarity. THE EMBEDDING SPACE Similar — two vectors close together; near in space, but near is not identical. similar near in space A neighbouring vector — the close partner of the similar point. Adjacent — related but not the same; a short hop away in the space. adjacent related, not same Far — distant in space, yet far does not mean irrelevant; it can still inform. far distant, not lost Near ≠ identical. Far ≠ irrelevant. Match — surface the nearest neighbours: items whose vectors sit closest together. OUTPUT Match Exclude — push away by distance: brand-safety and negative targeting are also geometry. OUTPUT Exclude Recommend — rank candidates by similarity to what already worked. OUTPUT Recommend Inspect — read the space itself: clusters and gaps tell you what the model learned. OUTPUT Inspect Similarity is computable — only when model, data, and objective are understood.
Near does not mean identical. Far does not mean irrelevant. Similarity is useful only when the training data, model, and objective are understood.
Definition

What embeddings are.

An embedding is a vector representation of an object — text, an image, an audio signal, a user profile, a content page, a product, a show, a household behavior pattern, or a campaign. Similar objects tend to sit closer together in the embedding space, which makes similarity computable.

Plain English

Embedding = meaning compressed into numbers.

Technical

A learned vector representation where distance or angle can approximate relatedness, similarity, or context — depending on the model and training objective.

Embeddings are not always interpretable dimension by dimension. They are useful because the overall geometry can preserve relationships — not because each coordinate has an obvious human label.

ConceptPlain meaningAdvertising example
Vectorlist of numbershousehold viewing profile encoded as numbers
Embedding spacewhere vectors are comparedsports fans cluster near sports content
Cosine similarityangle-based similaritycreative and page context are close
Centroidaverage point for a groupseed-audience summary
Nearest neighborclosest itemsmost similar audiences / contexts
Translation layermaps one vector space to anotherpublisher vectors to buyer vectors
Why it matters

Why embeddings matter in advertising.

  • They make meaning computable

    A buyer can move beyond exact keywords and fixed categories into semantic relationships.

  • They reduce dependence on brittle IDs

    Where identity is limited, embeddings can support contextual, cohort, or signal-based similarity. They do not eliminate privacy obligations.

  • They connect different media surfaces

    Text, video, audio, CTV viewing, app behavior, commerce signals, and creative assets can be represented in comparable forms — if the system is designed properly.

  • They support agentic workflows

    Agents need objects they can discover, compare, activate, and evaluate. Embeddings help agents reason over meaning and similarity.

  • They improve retrieval and recommendation

    Search, ranking, content matching, creative selection, and signal discovery all benefit from vector-based retrieval.

  • They create new governance questions

    Vectors can encode sensitive patterns, inferred traits, and behavioral signals. They need policy, retention, access control, and evaluation.

Use cases

Core advertising use cases.

  • Lookalike audiences without direct ID matching

    Seed a group of converters, watchers, buyers, or high-value users; represent their behavior as vectors; find similar users, contexts, or households where permitted.

    Watch-out: Similarity is not consent. Validate data rights and avoid sensitive-inference risk.

  • Contextual targeting 3.0

    Match creative, page, video, app, show, or bid context by semantic similarity rather than only keyword match.

    Watch-out: Contextual does not automatically mean low-risk if the context reveals sensitive categories.

  • Cross-media similarity

    Represent web, app, CTV, audio, commerce, and content signals in ways that support comparison across media surfaces.

    Watch-out: Vector spaces may not align without common models or translation layers.

  • Creative-to-context matching

    Embed ad creative, landing pages, product metadata, and publisher content to choose better-fit creative or environments.

    Watch-out: Creative fit should be evaluated against outcome and brand safety, not just proximity.

  • Signal discovery

    Use embeddings to search signal catalogs by meaning — a natural-language brief can retrieve related audience, content, and contextual signals.

    Watch-out: Returned signals need provenance, freshness, and policy metadata.

  • Suppression and exclusion

    Identify users, contexts, or inventory that are semantically misaligned with a campaign or likely to waste budget.

    Watch-out: Wrong thresholds can suppress valuable reach or introduce bias.

  • Clean-room output interpretation

    Compare aggregated outputs, audience descriptions, product categories, or campaign cohorts without exposing raw data.

    Watch-out: Clean-room governance still applies; embeddings should not become an uncontrolled export channel.

  • Agentic activation

    Agents can use embeddings to compare brief intent with available signals, contexts, inventory, and measurement outputs.

    Watch-out: Agents still need permissions, output policy, human approval, and audit.

Embeddings use-case map — one embedding layer feeding eight advertising decisions. ADVERTISING USE CASES Lookalikes — find new audiences by nearest-neighbour similarity to known converters in the same vector space. Lookalikes nearest neighbours Contextual match — place ads against pages and moments whose meaning sits close to the campaign. Contextual match page · intent Creative fit — score which creative assets sit closest to each audience or context vector. Creative fit asset ↔ audience Signal discovery — cluster the space to surface latent audience and context signals worth activating. Signal discovery cluster · surface Suppression — exclude and de-duplicate near-identical profiles by distance, not brittle ID matching. Suppression exclude · dedupe Clean-room outputs — share privacy-safe overlaps and joins as vectors rather than raw identifiers. Clean rooms privacy-safe joins Agentic activation — hand agents a representation they can query and act on directly during a buy. Activation agent-readable Measurement — group exposed and control cohorts by similarity to read incremental lift cleanly. Measurement lift · attribution Embedding layer — one shared vector representation that every advertising use case reads from. ONE REPRESENTATION Embedding layer One representation, many advertising decisions.
One representation layer; many advertising decisions around it.
Privacy

The privacy reality: embeddings are not anonymization by default.

Embeddings can reduce raw-data movement, but they do not automatically remove privacy risk. A vector can still encode patterns about people, households, content, behavior, or sensitive interests. Whether it is personal data depends on context, identifiability, linkage risk, and how the vector is used. Under EU/UK GDPR concepts, pseudonymized data is still personal data, and anonymization requires that re-identification is no longer reasonably likely — a high, evidence-based bar.

ClaimBetter framingWhy it matters
Embeddings are anonymousEmbeddings may be lower-risk than raw data, but they are not anonymous by default.They can still support inference, linkage, or profiling.
No IDs means no privacy issuePrivacy risk can exist without direct identifiers.Behavioral vectors can still describe or single out people or households.
Vectors are safe to shareVector sharing needs purpose limits, access control, retention rules, and re-identification-risk review.Vectors can leak meaning or enable similarity matching across datasets.
Embedding equals pseudonymizationEmbedding and pseudonymization are different concepts — and pseudonymized data is still personal data.Both can still require GDPR / privacy compliance depending on context.
Contextual is always safeContextual targeting can still be sensitive when context reveals protected or sensitive categories.Policy and brand safety still apply.

These are EU/UK GDPR concepts applied to embeddings by analogy; regulators have not ruled on embeddings specifically. US frameworks (CCPA/CPRA) and Apple's ATT are separate regimes — embeddings do not bypass any of them.

Governance checklist

  • Source data rights
  • Consent / lawful basis
  • Purpose limitation
  • Sensitive-category review
  • Re-identification risk
  • Linkage risk
  • Retention policy
  • Vector deletion / update
  • Access control
  • Model provenance
  • Query audit
  • Output policy
  • Explainability + appeal path where needed
Embedding privacy is a pipeline, not a property — risk travels every stage, and controls must hold at each one. PRIVACY IS A PIPELINE Raw data — the highest-sensitivity input; identifiable records before any transform. 01 Raw data PII at source Embedding — text/records become vectors. Lower-risk than raw data, but not anonymous: vectors can be inverted. 02 Embed- ding vectorize Vector store — the durable home for embeddings. Persistence + inversion risk make this the control hot-spot. 03 Vector store persisted Query — similarity search over the store; queries themselves leak intent and can re-identify. 04 Query similarity Activation — retrieved vectors feed prompts and downstream systems, widening exposure. 05 Activa- tion retrieval use Outcome — the answer or action returned to a user; the last place a leak surfaces. 06 Outcome response Controls — consent, minimization, access, retention, policy, and audit must hold at every stage of the pipeline, not just one box. CONTROLS · EVERY STAGE consent basis minimization collect less access who reads retention how long policy rules audit prove it Lower-risk than raw data is not the same as anonymous.
Privacy is a property of the whole pipeline, not of the vector alone.
Interoperability

The portability problem.

Embeddings are powerful inside one system. They get harder when multiple companies need to exchange or compare them: a vector from one model may not mean the same thing as a vector from another. Model choice, dimensionality, training data, normalization, and distance metric all matter — embedding spaces are usually not directly fungible, so translation layers, benchmark sets, and common standards may be needed.

ProblemWhat breaksPossible solution
Different modelvectors do not alignshared model, adapter, projection
Different dimensionscannot compare directlytransformation / projection layer
Different training datasemantic driftbenchmark and calibration
Different objectivessimilarity means different thingsdocument objective and use case
Different privacy rulesunsafe exchangeoutput policy and governance
Different freshnessstale vectorstimestamp and refresh policy
The portability problem — a vector from one model may not mean the same in another. THE PORTABILITY PROBLEM Publisher vector space — the embedding space the publisher trained; its points are meaningful only relative to this model. VECTOR SPACE A Publisher vector space Buyer vector space — the buyer's own embedding space; the same audience lands in different coordinates here. VECTOR SPACE B Buyer vector space THE GAP — BRIDGE OPTIONS A common model — everyone embeds with the same encoder, so vectors share one space. Strong but rarely realistic across firms. Common model one shared encoder A translation layer — a learned map that projects one vector space onto another. Lossy, needs paired data. Translation layer learned mapping A benchmark set — shared probe items both sides embed, so spaces can be aligned and compared empirically. Benchmark set shared eval probes A signal container — a portable envelope that carries the signal alongside the vector, not just raw numbers. Signal container portable envelope Agentic Audiences / UCP — a protocol contract describing the audience by meaning, so it survives a model swap. Agentic Audiences / UCP protocol contract A governance policy — rules and provenance that say what a borrowed vector may be used for once it crosses spaces. Governance policy rules + provenance A vector from one model may not mean the same in another.
Bridging two vector spaces: common model, translation layer, benchmark set, signal container, Agentic Audiences / UCP, or governance policy.
Agentic

Embeddings in agentic advertising.

Agents need more than natural language. They need a way to compare meaning across briefs, signals, contexts, creative, inventory, and outcomes. Embeddings can become the similarity layer that lets agents reason over advertising objects.

  1. Buyer brief
  2. Parse intent
  3. Search signal catalog
  4. Retrieve similar signals
  5. Inspect provenance and policy
  6. Activate signal
  7. Monitor status
  8. Evaluate outcome
  9. Improve the next brief

Signal containerization packages embeddings with the missing layers: provenance, policy, activation path, allowed outputs, and evaluation logic.

Embeddings in agentic advertising — agents reason over meaning, inside governance. EMBEDDINGS IN AGENTIC ADVERTISING loop Agent prompt — a buyer or campaign goal stated in natural language, the seed of the run. 01 Agent prompt intent in natural language Embedding query — the prompt is encoded into a vector so the agent can reason over meaning, not keywords. 02 Embedding query meaning → vector Signal discovery — vector search surfaces the closest audiences, inventory, and context. 03 Signal discovery nearest audiences · context Activation — approved signals turn into media buys, placements, and served impressions. 05 Activation buy · place · serve Measurement — observe outcomes and attribute them back to the signals that drove them. 06 Measurement outcomes · attribution Feedback — results re-embed into the next query, refining what the agent reasons over. 07 Feedback re-embed · refine Governance check — every candidate signal is screened for policy, provenance, and consent before anything spends. 04 Governance check policy · provenance · consent Agents reason over meaning — inside governance.
Agent prompt → embedding query → signal discovery → governance check → activation → measurement → feedback.
Standards

Standards and protocols.

Embeddings only become market infrastructure when systems agree on how they are represented, exchanged, governed, and evaluated.

Agentic Audiences / UCP

IAB Tech Lab's Agentic Audiences, formerly the User Context Protocol, defines how agents exchange identity, contextual, and reinforcement signals. It uses embeddings — officially described as dense vectors of 256–1024 dimensions. As of June 2026 it is an initial proposal / draft; validate the current version against official IAB Tech Lab documentation.

AdCP

A separate, non-IAB agentic workflow layer (over MCP): discovery, activation, status, governance, creative, and media buying. Embeddings can support signal discovery and semantic matching inside AdCP-style workflows; AdCP does not mandate embeddings.

AAMP

IAB Tech Lab's broader agentic-advertising management initiative — foundations (ARTF, in public comment), protocols (incl. Agentic Audiences), and trust — built on existing standards (OpenRTB, AdCOM, OpenDirect, Deals API).

Signal containerization

A practical way to package embeddings with semantic meaning, provenance, policy, activation logic, and evaluation.

OpenRTB / bidstream

Embeddings may support bidstream scoring, contextual alignment, or signal enrichment — but real-time use requires latency, governance, and protocol design.

Clean rooms

A governed collaboration environment where vector-safe analysis and governed outputs still sit under clean-room policy.

LayerRoleEmbeddings connection
Agentic Audiences / UCPsignal exchangeembeddings as a compact signal representation
AdCPworkflow / tasksdiscovery, activation, status, governance
AAMPstandards umbrellaagentic protocols, trust, runtime
Signal Containerizationproduct / operating modelpackaging vectors with policy and activation
OpenRTBreal-time transactionpotential bidstream scoring / signal extension
Clean roomscollaboration environmentgoverned output and vector-safe analysis
Operating model

Embedding infrastructure operating model.

A serious embedding program is not just a model call and a vector database. It needs data rights, model choice, storage, retrieval, governance, evaluation, and business ownership.

  1. Source — content, behavior, CRM, campaign, CTV, app, commerce, creative, product metadata
  2. Embedding — model, dimensionality, normalization, distance metric, version
  3. Storage / retrieval — vector index, metadata, filters, freshness, deletion
  4. Governance — consent, access, retention, sensitive category, output policy, audit
  5. Activation — DSP, SSP, clean room, CDP, BI, agent, recommendation system
  6. Evaluation — precision, recall, lift, relevance, bias, waste reduction, revenue outcome
Embedding infrastructure operating model — a six-layer stack, closed by evaluation feedback. EMBEDDING INFRASTRUCTURE FEEDBACK Source — the raw signals that get embedded: content, behavior, CRM, campaign, CTV, app, commerce, creative, product. 01 Source content · behavior · CRM · campaign · CTV · app · commerce · creative · product Embedding — the choices that fix meaning: model, dimensionality, normalization, distance metric, and version. 02 Embedding model · dimensionality · normalization · distance metric · version Storage / retrieval — vector index, metadata, filters, freshness, and deletion paths. 03 Storage / retrieval vector index · metadata · filters · freshness · deletion Governance — consent, access, retention, sensitive-category handling, output policy, and audit. The layer most teams under-build. 04 Governance consent · access · retention · sensitive category · output policy · audit Activation — where embeddings do work: DSP, SSP, clean room, CDP, BI, agent, recommendation. 05 Activation DSP · SSP · clean room · CDP · BI · agent · recommendation Evaluation — precision, recall, lift, relevance, bias, waste, and revenue, fed back to the source. 06 Evaluation precision · recall · lift · relevance · bias · waste · revenue Not a model call and a database — an operating model.
Source → embedding → retrieval → governance → activation → evaluation → feedback.
Reading path
Key terms

Key terms.

Sources

Sources and validation.

Embeddings, privacy, and agentic standards evolve quickly. Validate official documentation, standards versions, and legal guidance before implementation.

Primary sources checked 18 sources
  • No Fluff Advisory · Evgeny Popov · checked 2026-06-07 · Primary

    The originating essay — why vector representations matter for advertising and where they create value and risk. Supports: POV, Use cases.

  • No Fluff Advisory · Evgeny Popov · checked 2026-06-07 · Primary

    Why agents need shared representations and protocols to interoperate across the advertising stack. Supports: Agentic framing.

  • No Fluff Advisory · Evgeny Popov · checked 2026-06-07 · Primary

    How AdCP, UCP / Agentic Audiences, and related efforts fit together as layers. Supports: Standards map.

  • No Fluff Advisory · Evgeny Popov · checked 2026-06-07 · Primary

    Packaging signals (including embeddings) with provenance, policy, activation path, and evaluation. Supports: Operating model, Governance.

  • No Fluff Advisory · checked 2026-06-07 · Supporting

    The No Fluff reference page for the AdCP agentic workflow layer. Supports: Standards.

  • No Fluff Advisory · checked 2026-06-07 · Supporting

    The No Fluff reference page for AAMP, ARTF, Agentic Audiences, and the Agent Registry. Supports: Standards.

  • OpenAI · checked 2026-06-07 · Primary

    An embedding is a vector of floating-point numbers; distance measures relatedness; cosine similarity for retrieval; model dimensions (e.g. 1536 / 3072) are vendor-specific and adjustable. Supports: Definition, Cosine similarity, Dimensionality.

  • Google for Developers · checked 2026-06-07 · Primary

    Embedding = vector representation in a lower-dimensional space; distance interpreted as relative similarity; word embeddings often 256 / 512 / 1024 dimensions. Supports: Definition, Embedding space, Dimensionality.

  • Microsoft Learn · checked 2026-06-07 · Primary

    A vector of floating-point numbers whose distance correlates with semantic similarity; cosine similarity often used; powers vector similarity search. Supports: Definition, Cosine similarity, Vector search.

  • Amazon Web Services · checked 2026-06-07 · Primary

    Numerical representations of real-world objects learned via neural networks; text, image, and graph embeddings; cross-modal matching (text ↔ image). Supports: Definition, Modalities.

  • IBM · checked 2026-06-07 · Supporting

    Embeddings learned from data; cosine / Euclidean / dot-product metrics; nearest-neighbor vector search; per-dimension features usually implicit, not human-labeled; text/image/audio + multimodal. Supports: Metrics, Vector search, Interpretability.

  • European Data Protection Board (EDPB) · checked 2026-06-07 · Primary

    Pseudonymised data remains personal data and stays in scope; identifiability assessed on means reasonably likely to be used; singling-out and linkage are re-identification vectors. Supports: Pseudonymisation != anonymisation, Re-identification.

  • AEPD + EDPS · checked 2026-06-07 · Primary

    Anonymisation is not automatic and rarely zero-risk; pseudonymisation is not anonymisation; removing direct identifiers is masking only; inference of sensitive traits is possible. Supports: Anonymity bar, Overclaims to avoid.

  • European Data Protection Supervisor (EDPS) · checked 2026-06-07 · Supporting

    Pseudonymised data qualifies as personal data under the GDPR; pseudonymisation mitigates risk but does not remove obligations. Supports: Pseudonymisation status.

  • Agentic Audiences (formerly UCP) ↗ Official standards page

    IAB Tech Lab · checked 2026-06-07 · Primary

    Formerly UCP; donated by LiveRamp; encodes identity, contextual, and reinforcement signals as dense vectors officially described as 256–1024 dimensions; status is an initial proposal / draft. Supports: Agentic Audiences, Embeddings in standards.

  • agentic-audiences (GitHub) ↗ Official standards page

    IAB Tech Lab · checked 2026-06-07 · Primary

    README: "formerly the User Context Protocol"; "initial proposal"; embeddings encode identity/contextual/reinforcement signals; 256–1024 dims vs thousands of raw features. Supports: Status caution, Signal types.

  • Agentic Advertising and AI / AAMP ↗ Official standards page

    IAB Tech Lab · checked 2026-06-07 · Primary

    AAMP umbrella across foundations (ARTF — public comment), protocols (incl. Agentic Audiences), and trust (Agent Registry), built on OpenRTB / AdCOM / OpenDirect / Deals API + taxonomies. Supports: AAMP framing, Status.

  • Ad Context Protocol (AdCP) ↗ Official standards page

    adcontextprotocol (project) · checked 2026-06-07 · Supporting

    Separate, non-IAB agentic workflow layer over MCP (discovery, media buy, creative, signals activation); can use embeddings for signal discovery but does not mandate them. Supports: AdCP separation.

Platform capabilities and naming change quickly. Last validated: June 7, 2026. Check current documentation before implementation.

Next step

Building semantic infrastructure for advertising?

Embeddings become useful when they are connected to data rights, signal design, activation paths, governance, and outcome measurement. That is where the operating model matters.