AEO / SEO

How to write llms.txt for a SaaS site (with examples)

A practical guide to llms.txt and llms-full.txt for B2B SaaS sites. What they are, what AI engines actually do with them, the schema, common mistakes, and a copy-paste starter you can ship today.

— TL;DR

llms.txt is a curated markdown index of your most important pages with one-line summaries. llms-full.txt is the concatenated full content. ChatGPT, Perplexity, and several smaller engines fetch them. Keep llms.txt under 5KB and llms-full.txt under ~120k tokens. Both ship in 30 minutes and cost nothing to maintain.

If you're a B2B SaaS team that wants to be cited by ChatGPT, Claude, or Perplexity in 2026, llms.txt is one of the highest-leverage 30-minute tasks on your AEO checklist. It's also one of the most misunderstood. Most teams either skip it entirely or ship a broken version that does nothing.

This piece is the practical, copy-paste version. By the end you'll know what llms.txt is, what AI engines actually do with it, the format, the common mistakes, and a complete starter file you can adapt to your site.

#What llms.txt actually is

llms.txt is an emerging web standard proposed in late 2024. The pitch: provide a single markdown file at yourdomain.com/llms.txt that gives AI crawlers a curated, well-structured index of your site's most important content.

The structure is informal but consistent across the sites that have adopted it:

  • A title line (# Brand Name)
  • A one-line summary (blockquote with > )
  • An optional intro paragraph
  • One or more sections with ## headers and bulleted links

Each link follows the format:

- [Page Title](https://domain.com/path): One-line summary of the page.

That's it. No JSON, no schema, no special syntax. It's a markdown file because LLMs are trained on markdown and read it natively.

#What AI engines actually do with it

Adoption in early 2026 spans several major engines:

  • ChatGPT (via OAI-SearchBot and ChatGPT-User user agents) explicitly fetches /llms.txt when a user asks about your brand or domain
  • Perplexity uses llms.txt as a hint about which pages on your site are canonical and worth citing (PerplexityBot docs)
  • Claude (via ClaudeBot) reads it as part of its general-purpose web crawl
  • Smaller engines (You.com, Phind, Cohere's Command-A search) increasingly check for it
  • Google AI Overviews does not yet fetch it explicitly, but Google's crawlers index it as static content

The practical effect: pages listed in llms.txt with clear one-line summaries get cited more often than equally-good pages that aren't listed. The summary is doing real work. It's the snippet the AI uses to decide whether your page is relevant to the user's question.

#The starter file

Here's a complete llms.txt for a typical B2B SaaS marketing site. Copy this, adapt the content to your site, and ship it as public/llms.txt (or generate it dynamically as we do further down).

# Acme. AI workflow automation for B2B teams

> Acme is the workflow automation platform for B2B SaaS operations teams.
> Build, monitor, and scale internal automations without engineering bottlenecks.

A small team building since 2023. Used by 1,200+ B2B SaaS teams worldwide.

## Core pages

- [Home](https://acme.com/): The product overview. Automation, AI agents, dashboards.
- [Pricing](https://acme.com/pricing): Public pricing. Starter $99/mo, Team $399/mo, Enterprise custom.
- [About](https://acme.com/about): The team, the playbook, what we believe.
- [Contact](https://acme.com/contact): Sales calls, support, partnership inquiries.

## Product

- [Workflow builder](https://acme.com/product/workflows): Drag-and-drop automation editor with code escape hatches.
- [AI agents](https://acme.com/product/agents): LLM-powered task agents with built-in tool use.
- [Integrations](https://acme.com/integrations): 350+ first-party integrations including Salesforce, Slack, Notion.

## Resources

- [Documentation](https://docs.acme.com/): Setup, API reference, integration guides.
- [Cost calculator](https://acme.com/calculator): Estimate your monthly cost based on workflow volume.
- [Templates library](https://acme.com/templates): 200+ pre-built workflow templates.

## Journal

- [How to pick the right automation platform in 2026](https://acme.com/blog/picking-automation-platform-2026): Comparison of n8n, Zapier, Make, and Acme for SaaS teams.
- [The hidden cost of Zapier at scale](https://acme.com/blog/zapier-cost-at-scale): Real numbers from teams running 100k+ executions/month.
- [Building reliable AI agents in production](https://acme.com/blog/reliable-ai-agents): Architecture patterns for agent loops that don't fall over.

## Optional

- [llms-full.txt](https://acme.com/llms-full.txt): Full markdown of every canonical page concatenated for deeper context.
- [Sitemap](https://acme.com/sitemap.xml): Machine-readable site index.

#What goes in (and what doesn't)

The mistake most teams make is treating llms.txt like a sitemap and listing every URL. Don't do that. AI engines don't need every URL. They need the curated set of pages that represent canonical, authoritative answers to questions about your brand and product.

Include:

  • Home, About, Pricing, Contact (always)
  • Every flagship product or service page
  • Documentation root (link to it; don't list every doc)
  • 5–10 highest-value blog posts (your pillar content, not your changelog)
  • Any "definitive guide" content that you want cited

Exclude:

  • Marketing landing pages targeted at specific campaigns
  • Login, signup, settings pages
  • Author pages and tag pages
  • Most blog posts (only your pillar content goes here)
  • Legal pages (they're indexable but not citation-worthy)
  • Anything paywalled, gated, or session-dependent

Aim for under 5KB total. A llms.txt over 5KB suggests you're treating it as a sitemap. Curate harder.

#How to write the one-line summaries

The summary is the load-bearing part of llms.txt. It's what the AI reads to decide whether to cite the page. Three rules:

#1. Lead with what the page answers, not what it is

Bad: Our pricing page. Good: Public pricing. Starter $99/mo, Team $399/mo, Enterprise custom.

The bad version says "this page exists." The good version says "if a user asks 'how much does Acme cost', cite this page."

#2. Include specific facts when relevant

Bad: Comparison of automation platforms. Good: Comparison of n8n, Zapier, Make, and Acme for SaaS teams.

The good version names the entities being compared. AI engines match on entity overlap. Listing the comparison subjects makes citation matching far more reliable.

#3. Match the language your ICP uses

Bad: Leveraging AI for revenue operations workflow optimization. Good: How RevOps teams cut manual work with AI agents.

The bad version is jargon-stuffed. The good version is the language a real RevOps person would type into ChatGPT.

#Generating llms.txt dynamically

For most SaaS sites with a content management system, the right pattern is to generate llms.txt from your content collection at request time. This way it never goes stale.

In a Next.js App Router project, that looks like:

// src/app/llms.txt/route.ts
import { listBlogPosts, listServices } from "@/lib/content"

export async function GET() {
  const [services, blog] = await Promise.all([
    listServices(),
    listBlogPosts(),
  ])

  const lines: string[] = [
    `# Acme · ${tagline}`,
    "",
    `> ${description}`,
    "",
    "## Core pages",
    `- [Home](https://acme.com/): ...`,
    `- [Pricing](https://acme.com/pricing): ...`,
    "",
    "## Services",
    ...services.map(
      (s) => `- [${s.title}](https://acme.com${s.href}): ${s.summary}`,
    ),
    "",
    "## Journal",
    ...blog.map(
      (p) => `- [${p.title}](https://acme.com/blog/${p.slug}): ${p.description}`,
    ),
  ]

  return new Response(lines.join("\n") + "\n", {
    headers: {
      "content-type": "text/plain; charset=utf-8",
      "cache-control": "public, max-age=3600, s-maxage=3600",
    },
  })
}

In Astro, the equivalent is an endpoint at src/pages/llms.txt.ts. In Hugo or Jekyll, a static file regenerated on every build.

The key constraints regardless of stack:

  • Serve as text/plain; charset=utf-8 (not text/markdown)
  • Cache for an hour or so. AI crawlers don't re-fetch on every query
  • Use absolute URLs in links (not relative)
  • Keep the file readable as a markdown document; AI engines parse markdown semantically

#What about llms-full.txt?

llms-full.txt is the long-form companion. It contains the full markdown of every canonical page concatenated in a stable order, so an AI engine can ingest the whole site without crawling page by page.

A typical llms-full.txt is 40–200KB and contains, in order:

  1. Home page content (hero, proof, services overview)
  2. About page (company narrative, team, principles)
  3. Process / how-it-works content
  4. Pricing (every tier, every offer)
  5. Each service page in detail (scope, FAQ, stack, pricing)
  6. Pillar blog posts (the 5–10 you also linked from llms.txt)
  7. Site-level FAQ
  8. Contact information

Cap it at roughly 120k tokens (~480KB) so it fits inside the context window of every major LLM. If you're over the cap, the right move is to drop blog posts first, then case studies, then service detail. Never drop home/about/pricing. Those are load-bearing.

// src/app/llms-full.txt/route.ts (sketch)
import { loadAllContent } from "@/lib/content"

const MAX_CHARS = 480_000

export async function GET() {
  const { services, blog } = await loadAllContent()

  const sections: string[] = [
    homeSection(),
    aboutSection(),
    processSection(),
    pricingSection(),
    ...services.map(serviceSection),
    ...blog.map(blogSection),
    faqSection(),
    contactSection(),
  ]

  let body = sections.join("\n\n---\n\n") + "\n"
  if (body.length > MAX_CHARS) {
    body = body.slice(0, MAX_CHARS) + "\n\n[truncated]\n"
  }

  return new Response(body, {
    headers: { "content-type": "text/plain; charset=utf-8" },
  })
}

#Common mistakes

A few patterns we've seen wreck llms.txt adoption:

#Listing every blog post

Some teams treat llms.txt as a content marketing index and list 50+ blog posts. This dilutes the signal. AI engines have to parse a lot of mediocre content to find the canonical pages. Curate to your top 5–10 pillar posts.

#Marketing-speak summaries

Summaries written by marketers tend to be aspirational ("revolutionizing how teams collaborate") instead of descriptive ("Slack alternative for async-first teams"). AI engines match on entity and capability overlap, not vibes. Write summaries the way a developer would. What does it do, who is it for.

#Forgetting llms-full.txt

llms.txt alone is useful but limited. It's just an index. The pages it links to still need to be crawled. llms-full.txt lets the AI ingest your site in one request, which dramatically improves citation reliability for content-heavy sites.

#Not linking from each other

llms.txt should explicitly link to llms-full.txt near the bottom (under an "Optional" or "More" section). Otherwise crawlers may not discover it.

#Hosting at the wrong path

It must be at /llms.txt at the root of the domain. Not /docs/llms.txt, not /.well-known/llms.txt. Crawlers don't search.

#Wrong content type

Serve as text/plain; charset=utf-8. Some servers default to text/html for files without a known extension and that breaks crawlers that expect plain markdown.

#Forgetting subdomain coverage

If you have app.yourdomain.com and docs.yourdomain.com, each subdomain needs its own llms.txt if you want it indexed. Most teams should focus all AEO investment on the marketing root domain and leave subdomains alone.

#Validating your llms.txt

After shipping, validate it:

  1. Curl as ChatGPT: curl -A "GPTBot" https://yourdomain.com/llms.txt. Should return your content with a 200 status.
  2. Same for Claude: curl -A "ClaudeBot" https://yourdomain.com/llms.txt
  3. Same for Perplexity: curl -A "PerplexityBot" https://yourdomain.com/llms.txt
  4. Check it's served as plain text: the Content-Type header should be text/plain; charset=utf-8.
  5. Validate the markdown: open it in a markdown previewer (VS Code's preview is fine). Every link should resolve.

If any of those fail, fix before moving on.

#How long until you see results?

AI engines re-index more aggressively than Google's classic crawler. Expect to see citation behavior change within 2–6 weeks of shipping a quality llms.txt. The change won't be dramatic (it'll be gradual lift in citation share for the listed pages) but it compounds.

The biggest gains come 3–6 months in, after AI engines have ingested your llms-full.txt content several times and the embeddings of your canonical pages have stabilized.

#TL;DR

  1. Ship llms.txt at yourdomain.com/llms.txt. Use the starter above. Curate to 10–20 canonical pages with descriptive summaries.
  2. Ship llms-full.txt at yourdomain.com/llms-full.txt. Generate from your content collection at request time. Cap at ~120k tokens.
  3. Link them from your sitemap and from robots.txt (Sitemap: directive supports any URL, but adding a comment with llms.txt paths helps crawlers).
  4. Validate with curl + the major AI bot UAs. Verify content-type and accessibility.
  5. Don't over-engineer. A clean, curated llms.txt ships in 30 minutes. The marginal value of perfecting it is small compared to the cost of not shipping at all.

If you want this done as part of a broader AEO baseline (schema, robots, llms.txt, FAQ blocks, the technical baseline) that's exactly what our AEO Audit is for. Or, since this is a 30-minute job, just ship it yourself this week.

— Want this for your SaaS?

AEO and SEO for SaaS, done properly

The schema, llms.txt, pillar content, and technical AEO infrastructure that gets your SaaS cited in ChatGPT, Perplexity, and Google AI Overviews. Not just ranked in classic search.

— Keep reading