How to write llms.txt for a SaaS site (with examples)
A practical guide to llms.txt and llms-full.txt for B2B SaaS sites. What they are, what AI engines actually do with them, the schema, common mistakes, and a copy-paste starter you can ship today.
— TL;DR
llms.txt is a curated markdown index of your most important pages with one-line summaries. llms-full.txt is the concatenated full content. ChatGPT, Perplexity, and several smaller engines fetch them. Keep llms.txt under 5KB and llms-full.txt under ~120k tokens. Both ship in 30 minutes and cost nothing to maintain.
If you're a B2B SaaS team that wants to be cited by ChatGPT, Claude, or Perplexity in 2026, llms.txt is one of the highest-leverage 30-minute tasks on your AEO checklist. It's also one of the most misunderstood. Most teams either skip it entirely or ship a broken version that does nothing.
This piece is the practical, copy-paste version. By the end you'll know what llms.txt is, what AI engines actually do with it, the format, the common mistakes, and a complete starter file you can adapt to your site.
#What llms.txt actually is
llms.txt is an emerging web standard proposed in late 2024. The pitch: provide a single markdown file at yourdomain.com/llms.txt that gives AI crawlers a curated, well-structured index of your site's most important content.
The structure is informal but consistent across the sites that have adopted it:
- A title line (
# Brand Name) - A one-line summary (blockquote with
>) - An optional intro paragraph
- One or more sections with
##headers and bulleted links
Each link follows the format:
- [Page Title](https://domain.com/path): One-line summary of the page.
That's it. No JSON, no schema, no special syntax. It's a markdown file because LLMs are trained on markdown and read it natively.
#What AI engines actually do with it
Adoption in early 2026 spans several major engines:
- ChatGPT (via OAI-SearchBot and ChatGPT-User user agents) explicitly fetches
/llms.txtwhen a user asks about your brand or domain - Perplexity uses
llms.txtas a hint about which pages on your site are canonical and worth citing (PerplexityBot docs) - Claude (via ClaudeBot) reads it as part of its general-purpose web crawl
- Smaller engines (You.com, Phind, Cohere's Command-A search) increasingly check for it
- Google AI Overviews does not yet fetch it explicitly, but Google's crawlers index it as static content
The practical effect: pages listed in llms.txt with clear one-line summaries get cited more often than equally-good pages that aren't listed. The summary is doing real work. It's the snippet the AI uses to decide whether your page is relevant to the user's question.
#The starter file
Here's a complete llms.txt for a typical B2B SaaS marketing site. Copy this, adapt the content to your site, and ship it as public/llms.txt (or generate it dynamically as we do further down).
# Acme. AI workflow automation for B2B teams
> Acme is the workflow automation platform for B2B SaaS operations teams.
> Build, monitor, and scale internal automations without engineering bottlenecks.
A small team building since 2023. Used by 1,200+ B2B SaaS teams worldwide.
## Core pages
- [Home](https://acme.com/): The product overview. Automation, AI agents, dashboards.
- [Pricing](https://acme.com/pricing): Public pricing. Starter $99/mo, Team $399/mo, Enterprise custom.
- [About](https://acme.com/about): The team, the playbook, what we believe.
- [Contact](https://acme.com/contact): Sales calls, support, partnership inquiries.
## Product
- [Workflow builder](https://acme.com/product/workflows): Drag-and-drop automation editor with code escape hatches.
- [AI agents](https://acme.com/product/agents): LLM-powered task agents with built-in tool use.
- [Integrations](https://acme.com/integrations): 350+ first-party integrations including Salesforce, Slack, Notion.
## Resources
- [Documentation](https://docs.acme.com/): Setup, API reference, integration guides.
- [Cost calculator](https://acme.com/calculator): Estimate your monthly cost based on workflow volume.
- [Templates library](https://acme.com/templates): 200+ pre-built workflow templates.
## Journal
- [How to pick the right automation platform in 2026](https://acme.com/blog/picking-automation-platform-2026): Comparison of n8n, Zapier, Make, and Acme for SaaS teams.
- [The hidden cost of Zapier at scale](https://acme.com/blog/zapier-cost-at-scale): Real numbers from teams running 100k+ executions/month.
- [Building reliable AI agents in production](https://acme.com/blog/reliable-ai-agents): Architecture patterns for agent loops that don't fall over.
## Optional
- [llms-full.txt](https://acme.com/llms-full.txt): Full markdown of every canonical page concatenated for deeper context.
- [Sitemap](https://acme.com/sitemap.xml): Machine-readable site index.
#What goes in (and what doesn't)
The mistake most teams make is treating llms.txt like a sitemap and listing every URL. Don't do that. AI engines don't need every URL. They need the curated set of pages that represent canonical, authoritative answers to questions about your brand and product.
Include:
- Home, About, Pricing, Contact (always)
- Every flagship product or service page
- Documentation root (link to it; don't list every doc)
- 5–10 highest-value blog posts (your pillar content, not your changelog)
- Any "definitive guide" content that you want cited
Exclude:
- Marketing landing pages targeted at specific campaigns
- Login, signup, settings pages
- Author pages and tag pages
- Most blog posts (only your pillar content goes here)
- Legal pages (they're indexable but not citation-worthy)
- Anything paywalled, gated, or session-dependent
Aim for under 5KB total. A llms.txt over 5KB suggests you're treating it as a sitemap. Curate harder.
#How to write the one-line summaries
The summary is the load-bearing part of llms.txt. It's what the AI reads to decide whether to cite the page. Three rules:
#1. Lead with what the page answers, not what it is
Bad: Our pricing page.
Good: Public pricing. Starter $99/mo, Team $399/mo, Enterprise custom.
The bad version says "this page exists." The good version says "if a user asks 'how much does Acme cost', cite this page."
#2. Include specific facts when relevant
Bad: Comparison of automation platforms.
Good: Comparison of n8n, Zapier, Make, and Acme for SaaS teams.
The good version names the entities being compared. AI engines match on entity overlap. Listing the comparison subjects makes citation matching far more reliable.
#3. Match the language your ICP uses
Bad: Leveraging AI for revenue operations workflow optimization.
Good: How RevOps teams cut manual work with AI agents.
The bad version is jargon-stuffed. The good version is the language a real RevOps person would type into ChatGPT.
#Generating llms.txt dynamically
For most SaaS sites with a content management system, the right pattern is to generate llms.txt from your content collection at request time. This way it never goes stale.
In a Next.js App Router project, that looks like:
// src/app/llms.txt/route.ts
import { listBlogPosts, listServices } from "@/lib/content"
export async function GET() {
const [services, blog] = await Promise.all([
listServices(),
listBlogPosts(),
])
const lines: string[] = [
`# Acme · ${tagline}`,
"",
`> ${description}`,
"",
"## Core pages",
`- [Home](https://acme.com/): ...`,
`- [Pricing](https://acme.com/pricing): ...`,
"",
"## Services",
...services.map(
(s) => `- [${s.title}](https://acme.com${s.href}): ${s.summary}`,
),
"",
"## Journal",
...blog.map(
(p) => `- [${p.title}](https://acme.com/blog/${p.slug}): ${p.description}`,
),
]
return new Response(lines.join("\n") + "\n", {
headers: {
"content-type": "text/plain; charset=utf-8",
"cache-control": "public, max-age=3600, s-maxage=3600",
},
})
}
In Astro, the equivalent is an endpoint at src/pages/llms.txt.ts. In Hugo or Jekyll, a static file regenerated on every build.
The key constraints regardless of stack:
- Serve as
text/plain; charset=utf-8(nottext/markdown) - Cache for an hour or so. AI crawlers don't re-fetch on every query
- Use absolute URLs in links (not relative)
- Keep the file readable as a markdown document; AI engines parse markdown semantically
#What about llms-full.txt?
llms-full.txt is the long-form companion. It contains the full markdown of every canonical page concatenated in a stable order, so an AI engine can ingest the whole site without crawling page by page.
A typical llms-full.txt is 40–200KB and contains, in order:
- Home page content (hero, proof, services overview)
- About page (company narrative, team, principles)
- Process / how-it-works content
- Pricing (every tier, every offer)
- Each service page in detail (scope, FAQ, stack, pricing)
- Pillar blog posts (the 5–10 you also linked from
llms.txt) - Site-level FAQ
- Contact information
Cap it at roughly 120k tokens (~480KB) so it fits inside the context window of every major LLM. If you're over the cap, the right move is to drop blog posts first, then case studies, then service detail. Never drop home/about/pricing. Those are load-bearing.
// src/app/llms-full.txt/route.ts (sketch)
import { loadAllContent } from "@/lib/content"
const MAX_CHARS = 480_000
export async function GET() {
const { services, blog } = await loadAllContent()
const sections: string[] = [
homeSection(),
aboutSection(),
processSection(),
pricingSection(),
...services.map(serviceSection),
...blog.map(blogSection),
faqSection(),
contactSection(),
]
let body = sections.join("\n\n---\n\n") + "\n"
if (body.length > MAX_CHARS) {
body = body.slice(0, MAX_CHARS) + "\n\n[truncated]\n"
}
return new Response(body, {
headers: { "content-type": "text/plain; charset=utf-8" },
})
}
#Common mistakes
A few patterns we've seen wreck llms.txt adoption:
#Listing every blog post
Some teams treat llms.txt as a content marketing index and list 50+ blog posts. This dilutes the signal. AI engines have to parse a lot of mediocre content to find the canonical pages. Curate to your top 5–10 pillar posts.
#Marketing-speak summaries
Summaries written by marketers tend to be aspirational ("revolutionizing how teams collaborate") instead of descriptive ("Slack alternative for async-first teams"). AI engines match on entity and capability overlap, not vibes. Write summaries the way a developer would. What does it do, who is it for.
#Forgetting llms-full.txt
llms.txt alone is useful but limited. It's just an index. The pages it links to still need to be crawled. llms-full.txt lets the AI ingest your site in one request, which dramatically improves citation reliability for content-heavy sites.
#Not linking from each other
llms.txt should explicitly link to llms-full.txt near the bottom (under an "Optional" or "More" section). Otherwise crawlers may not discover it.
#Hosting at the wrong path
It must be at /llms.txt at the root of the domain. Not /docs/llms.txt, not /.well-known/llms.txt. Crawlers don't search.
#Wrong content type
Serve as text/plain; charset=utf-8. Some servers default to text/html for files without a known extension and that breaks crawlers that expect plain markdown.
#Forgetting subdomain coverage
If you have app.yourdomain.com and docs.yourdomain.com, each subdomain needs its own llms.txt if you want it indexed. Most teams should focus all AEO investment on the marketing root domain and leave subdomains alone.
#Validating your llms.txt
After shipping, validate it:
- Curl as ChatGPT:
curl -A "GPTBot" https://yourdomain.com/llms.txt. Should return your content with a 200 status. - Same for Claude:
curl -A "ClaudeBot" https://yourdomain.com/llms.txt - Same for Perplexity:
curl -A "PerplexityBot" https://yourdomain.com/llms.txt - Check it's served as plain text: the
Content-Typeheader should betext/plain; charset=utf-8. - Validate the markdown: open it in a markdown previewer (VS Code's preview is fine). Every link should resolve.
If any of those fail, fix before moving on.
#How long until you see results?
AI engines re-index more aggressively than Google's classic crawler. Expect to see citation behavior change within 2–6 weeks of shipping a quality llms.txt. The change won't be dramatic (it'll be gradual lift in citation share for the listed pages) but it compounds.
The biggest gains come 3–6 months in, after AI engines have ingested your llms-full.txt content several times and the embeddings of your canonical pages have stabilized.
#TL;DR
- Ship
llms.txtatyourdomain.com/llms.txt. Use the starter above. Curate to 10–20 canonical pages with descriptive summaries. - Ship
llms-full.txtatyourdomain.com/llms-full.txt. Generate from your content collection at request time. Cap at ~120k tokens. - Link them from your sitemap and from
robots.txt(Sitemap:directive supports any URL, but adding a comment withllms.txtpaths helps crawlers). - Validate with curl + the major AI bot UAs. Verify content-type and accessibility.
- Don't over-engineer. A clean, curated
llms.txtships in 30 minutes. The marginal value of perfecting it is small compared to the cost of not shipping at all.
If you want this done as part of a broader AEO baseline (schema, robots, llms.txt, FAQ blocks, the technical baseline) that's exactly what our AEO Audit is for. Or, since this is a 30-minute job, just ship it yourself this week.
— Want this for your SaaS?
AEO and SEO for SaaS, done properly ↗
The schema, llms.txt, pillar content, and technical AEO infrastructure that gets your SaaS cited in ChatGPT, Perplexity, and Google AI Overviews. Not just ranked in classic search.
— Keep reading
AEO
GA4 → AI search attribution: tracking ChatGPT and Perplexity referrals correctly
A practical guide to tracking inbound traffic from ChatGPT, Perplexity, Claude, and Google AI Overviews in GA4. The referrer values, the channel-grouping fixes, the gotchas in 2026, and the dashboard view that actually informs decisions.
Read post
AEO
How to get cited by ChatGPT in 2026: a B2B SaaS playbook
A concrete playbook for B2B SaaS teams trying to be cited inside ChatGPT answers in 2026. The signals ChatGPT actually weighs, the page-level fixes that move citation share, and what to ignore from the AEO content-marketing noise.
Read post
AEO
Schema.org for B2B agencies: a 2026 implementation guide
A practical schema.org implementation guide for B2B agencies in 2026. The exact JSON-LD types every agency site needs, the agency-specific patterns (ProfessionalService, OfferCatalog, Service tiers), and the validation discipline that keeps schema healthy as the site grows.
Read post