Google's Helpful Content guidance now explicitly mentions that structured data "helps Google's AI understand your content." That single sentence changed the calculation on schema markup. It is no longer a rich-result optimization for traditional SERPs. It is a citation-extraction signal for AI Overviews, ChatGPT browsing, and Perplexity.
The catch is that most schema guides on the internet were written before AI search existed. They tell you to add Product schema to blog posts, which has never been correct, or they push BlogPosting markup with a single image and call it done. The 2026 reality is more selective. Six schema types do most of the citation work. The rest are noise.
Article - the foundation every blog post needs
Article schema (or its subtypes BlogPosting and NewsArticle) is the floor. Without it, AI agents must infer the author, publish date, and article body from the DOM, and they get it wrong often enough to skip your page in favor of a more clearly marked one.
What matters in your Article block: headline, datePublished, dateModified, author with a real name and URL, publisher with logo, image, and articleSection. The most-skipped field is dateModified. AI engines bias toward fresh content. A page modified last month outranks a page modified two years ago for many evergreen queries.
HowTo - the highest-value schema for tutorial content
HowTo schema is the type that drives the largest measurable citation gains in our internal tracking. It maps cleanly to the way ChatGPT and Perplexity answer "how do I do X" prompts. When your tutorial is wrapped in HowTo with explicit step, totalTime, and tool entries, the AI extraction becomes deterministic. Without it, the model picks paragraph fragments at random.
The trap is over-tagging. Marking every numbered list as a HowTo is wrong. Use it only for genuine procedural tutorials that produce a defined outcome. "How to write a cover letter" qualifies. "7 reasons your team should adopt Notion" does not.
FAQ schema - still useful even after the rich result deprecation
Google deprecated the FAQ rich result in mid-2023. A lot of teams stripped FAQ schema from their sites in response. That was a mistake. The rich result is gone, but the structured data is still parsed by AI Overviews and by every major LLM crawler. A FAQ block at the bottom of a long-form page is one of the cheapest citation wins available.
Keep the questions concrete and the answers tight. AI engines extract the answer text directly. A 50-word answer to "What is llms.txt?" wrapped in Question and Answer blocks ends up as a verbatim citation in Perplexity. A vague three-sentence ramble does not.
Speakable - the voice and audio overview signal
Speakable schema marks specific paragraphs of your page as suitable for text-to-speech extraction. It started as a Google Assistant feature and stayed niche. Then NotebookLM-style audio overviews became a real distribution channel, and Speakable became relevant again. The schema.org spec is straightforward: a CSS selector or an XPath pointing at the paragraphs you would actually want read aloud, usually your TL;DR and the lead paragraph of each H2.
Most sites do not need Speakable yet. Sites whose audience listens to AI-generated podcasts or uses voice search heavily should ship it. Audio overviews are a small share of total AI traffic but a high-attention share when they happen.
Author and Organization - the E-E-A-T proof layer
Two schema types do the work that E-E-A-T frameworks describe: Person and Organization. Both should be linked, not inline.
- Person schema on a real author profile page, with
jobTitle,sameAslinking to LinkedIn and X, andknowsAboutcovering the topics that author writes about. Google explicitly evaluates this when assessing first-hand expertise. - Organization schema on the homepage and About page, with
foundingDate,address,contactPoint, andsameAs. This anchors your brand entity in the knowledge graph. - Cross-link them. Author pages reference the Organization. Organization references the founders as Person entities. AI knowledge graphs deduplicate brands by these relationships.
Schema is the closest thing modern SEO has to a free win. The work is one-time, the cost is zero, and the AI extraction reliability gain is large enough to measure.
What to skip - and what to validate
Three schema types are commonly added to blog content where they do not belong. Product schema on non-product pages confuses crawlers. Review schema on first-party content (you reviewing your own service) violates Google's spam guidelines and can get a manual penalty. SoftwareApplication on a marketing landing page that is not actually a downloadable app is similarly off-topic. Add what fits the page, not what you wish the page were.
Validation matters more than ever because AI engines silently ignore broken schema. Run every page through three checks before shipping:
- Google's Rich Results Test for fast feedback on which schemas Google recognises.
- Schema.org's validator for stricter type-checking that catches issues Google's tool tolerates.
- A direct fetch as ChatGPT's GPTBot user agent to confirm your schema is present in the server-rendered HTML, not injected by client-side JavaScript that AI crawlers will not execute.
The third check catches more bugs than the first two combined. SPAs and partially hydrated React sites routinely render perfect schema in the browser and serve nothing to crawlers. If your content does not survive a curl request, AI agents will never see it.
Ship it once, maintain it forever
Schema is the closest thing modern SEO has to compounding interest. The work is one-time per template. The benefit accrues across every new page that uses the template. Every site we build at SARVAYA includes the six schemas above as part of the base template, validated through CI before deploy. If your existing site only has BlogPosting markup and nothing else, that is the single highest-impact SEO change you can make this quarter. Talk to us if you want a structured-data audit on your current site.