43% of major news publishers had their content cited in AI-generated summaries in 2025 without a single referral click, according to Reuters Institute research. The content did the work. The traffic did not follow. The IETF AIPREF working group met in Toronto this month to fix the infrastructure behind that problem, and the April 2026 session produced the clearest signal yet that a binding standard is coming.
Three draft protocols advanced to working group last call - the formal step immediately before public review and RFC ratification. These drafts cover three separate layers of the web stack: HTTP response headers, the robots.txt specification, and schema.org structured data. Together they give publishers specific, machine-readable ways to say what AI systems may and may not do with their content.
What the IETF AIPREF Working Group Is Building
The Internet Engineering Task Force formed the AIPREF (AI Preferences) working group in late 2024. The problem it addresses is precise: robots.txt was written in 1994 for a world where crawlers indexed pages for retrieval. It was never designed to distinguish between a search engine storing a URL and an AI system absorbing content permanently into model weights. Publishers needed a formal mechanism to express that distinction.
The working group includes representatives from browser vendors, CDN providers, major publishing groups, and AI laboratories. Progress has been deliberate. Achieving working group last call on three separate drafts in a single session is a significant milestone. RFC status - the point at which compliance becomes an industry expectation - is typically 12 to 18 months out from this stage.
The Three AIPREF Drafts That Advanced in Toronto
Each draft targets a different point where a publisher's preferences can be declared and read by a compliant crawler:
- draft-ietf-aipref-hints - Introduces two new HTTP response headers:
Ai-Preference, a structured field that declares per-resource permissions for indexing, training, and summarization, andAi-Preference-Policy, a URI pointing to a full machine-readable policy document hosted at the publisher's domain. These headers travel with every HTTP response, so any compliant crawler reads the publisher's declared preferences on the first request, before any content is processed. - draft-ietf-aipref-robots-ext - A formal extension to the robots.txt specification. The draft introduces new directives:
AI-Training: disallow,AI-Summarization: allow,AI-IndexOnly: allow, and granular user-agent targeting for named crawlers including GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Path-level granularity is fully supported, meaning publishers can allow AI indexing of their marketing pages while blocking training access to proprietary research or paid content. - draft-ietf-aipref-schema - A JSON-LD schema extension that adds a
contentPreferenceproperty to schema.org's CreativeWork type. Publishers embed AI permissions directly in the structured data already present on their pages. Any site running Article, BlogPosting, or NewsArticle schema can add three additional fields and communicate the same permissions as the HTTP header, without server configuration changes.
Publishers who define their IETF AIPREF preferences in 2026 will have more control over how their content appears in AI-generated answers than those who wait for enforcement to arrive.
llms.txt Already Works - The IETF Work Goes Further
The llms.txt proposal, championed by Jeremy Howard in 2024 and adopted by Cloudflare, several major documentation platforms, and thousands of individual publishers, solves a real and immediate problem. A plain markdown file at the root of a domain tells AI agents what the site is, which pages matter, and what permissions are granted. It works today. OpenAI and Anthropic both reference it in their crawler documentation.
The gap is enforcement. llms.txt is voluntary with no standards-body backing. Any crawler can ignore it with no consequence. The IETF drafts, once ratified, put the same concept inside the formal infrastructure of the web - alongside HTTP headers and robots.txt - where crawler developers face industry pressure to comply. The two approaches are complementary. Implement llms.txt now. Add the IETF-specified headers when the drafts stabilize. The investment at each stage is small; the combined signal is strong.
How IETF AIPREF Changes GEO Strategy for Publishers
Generative engine optimization has operated without a formal signaling layer. Publishers optimizing for citation in ChatGPT, Claude, and Gemini have had to infer what works from content structure, schema markup, and observed citation patterns. There was no standardized channel to communicate preferences directly to AI engines. AIPREF creates that channel.
When the drafts land as RFCs, publishers will be able to communicate the following preferences to any compliant AI system:
- Allow summarization but disallow training - protecting content from absorption into model weights while remaining eligible for citation in AI-generated answers
- Require attribution metadata - embedding source URL requirements in the policy header so compliant systems cite rather than paraphrase without reference
- Target specific crawlers by name - allowing Google-Extended while blocking less scrupulous scrapers that ignore all opt-outs
- Set path-level permissions - different rules for a public blog, proprietary research, and gated content, all within a single robots.txt file
- Declare citation format preferences - specifying whether direct quotes, attributed paraphrase, or no citation at all is acceptable
This mirrors the evolution of search robots.txt between 1994 and 1999, when Netscape, Yahoo, AltaVista, and eventually Google converged on a single informal specification. That convergence took five years and happened without any formal standards body driving it. The IETF process is faster and more structured. The AI industry is also under considerably more regulatory scrutiny than early search engines ever were, which accelerates compliance.
The Attribution Debate
The thorniest issue in Toronto was attribution. Several publisher representatives pushed for a directive that requires AI systems to include a source URL whenever they draw on specific content. The concept is clear. The implementation is not.
Current AI responses are generative - the system produces new text rather than quoting directly - which makes a strict referrer model technically ambiguous when the output synthesizes dozens of sources. The working group's current resolution is a "citation-hint" directive that requests attribution in a standardized format without mandating the exact mechanism. AI labs that implement it fully get a clear signal of publisher intent. Those that do not are on record ignoring a standards-track preference, which matters as regulatory frameworks catch up.
What to Implement Before the RFC Lands
You do not need to wait 18 months to act. Three steps are available right now and map directly onto the standards that are coming:
- Add llms.txt to your root domain. A markdown file at yourdomain.com/llms.txt describing your site's purpose, key pages, and content permissions. Keep it under 1,000 tokens so AI agents parse it on the first load. The format is simple enough to write in an hour and start delivering signal immediately.
- Add AI crawler directives to your robots.txt file. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, and Google-Extended all respect explicit robots.txt directives today. You can disallow training crawlers while allowing search bots with a few lines of standard syntax. This costs nothing and works right now without any server configuration changes.
- Implement complete structured data on every content page. Schema.org's Article, BlogPosting, and NewsArticle types are the existing mechanism AI systems use to understand what a page is and who produced it. A complete schema block with author, datePublished, publisher, and headline fields signals citable, attributable content. When draft-ietf-aipref-schema lands, you will add three fields to an existing block rather than building from nothing.
Our 2026 SEO and GEO guide covers the full structured data stack and how AI systems currently use it to select sources for citation. The technical overlap with AIPREF is direct. If you want an audit of your current AI crawler exposure and a roadmap for the AIPREF transition, our contact page is the fastest route to a conversation with our team.
At SARVAYA, we are already building llms.txt, full schema markup, and explicit AI crawler directives into every site we ship through our client projects. When the AIPREF HTTP headers reach stable draft status, adding them is a one-hour implementation on top of an already solid foundation. The sites that have that foundation in place will move faster and signal more credibly than those starting from scratch.
The AIPREF standard is not theoretical. Three working drafts are moving through the IETF process with active participation from browser vendors, CDN providers, and AI labs. The window to get ahead of it is open now. Once the RFC lands, these preferences become table stakes and the publishers who acted early keep their advantage in AI-generated search results.