How to Create an llms.txt File for AI Discovery

To create an llms.txt file, choose the few pages that best explain your site, describe them in plain Markdown, publish the file at /llms.txt, test that it is publicly readable, and keep it accurate over time. Treat it as a curated guide for LLMs and AI agents, not as a guaranteed way to earn citations, recommendations or visibility in ChatGPT, Google AI Overviews, Perplexity or any other AI search surface. The useful version is small, factual and maintained; the risky version is a keyword-stuffed sitemap dump pretending to be an AI visibility lever.

The Short Answer

A basic llms.txt file is a Markdown index for AI discovery workflows. It usually sits at the root of the site, such as /llms.txt, and lists the most important pages with short descriptions that explain what each page contains and when it is useful.

Use this workflow:

Select stable, canonical pages that help someone understand the site, product, documentation, policies or evergreen expertise.
Group those pages into clear sections, such as Product, Documentation, Support, Policies or Guides.
Write factual descriptions for each link in Markdown.
Put lower-priority resources in an Optional section when they can be skipped.
Publish the file at /llms.txt for the whole site, or inside a subdirectory when it only covers a specific documentation area.
Test the URL, HTTP status, content readability, linked pages, canonical paths and private URL exposure.
Monitor logs, crawler behavior, AI answer citations, brand mentions and source gaps before claiming impact.

The file is low-cost to create when your site already has useful pages. It is not a substitute for crawlability, indexing, internal links, structured data, good documentation, clear product pages or recurring AI visibility measurement.

Decision rule: create llms.txt when it can become a clean source map for important content. Skip or deprioritize it when the site lacks useful pages, cannot maintain the file, or expects a guaranteed AI citation boost from one text file.

What llms.txt Does And Does Not Do

The llms.txt proposal is commonly associated with Jeremy Howard and the llmstxt.org specification. Its core idea is practical: many LLMs and agents can read Markdown more easily than complex navigation, JavaScript-heavy pages or huge sitemaps. A short Markdown file can point them toward the pages that matter most.

That does not make llms.txt a crawl-control file. It does not allow or block crawlers. It does not replace robots.txt. It does not replace sitemap.xml. It is also not an AI training opt-out, a legal consent layer or a private instruction file. It does not give Google, OpenAI, Perplexity or any other platform a binding instruction to cite your site.

File	Main job	What it should decide	What it does not do
`llms.txt`	Gives LLMs and agents a curated Markdown guide to important resources	Which pages deserve concise context and descriptions	It does not control crawling, indexing, ranking or citation eligibility
`robots.txt`	Tells compliant crawlers which URL paths they may access	Which bots can crawl which areas of the site	It does not describe page meaning and should not be used as a private content vault
`sitemap.xml`	Lists URLs for search discovery and indexing workflows	Which URLs should be discoverable by search systems	It does not prioritize pages by usefulness for an LLM context window
Structured data	Clarifies entities and page facts that are visible on the page	Which facts need machine-readable support	It does not replace visible content or guarantee AI citations

The common phrase "robots.txt for AI" is misleading unless it is heavily qualified. robots.txt is about access rules for crawlers. llms.txt is about context guidance. If a vendor says llms.txt blocks AI crawlers, allows AI crawlers, opts you into AI answers or works as a ranking signal, ask for platform-specific documentation before acting on that claim.

As of May 6, 2026, Google Search guidance for AI features emphasizes normal Search eligibility and says there are no new AI text files or special markup required for AI Overviews or AI Mode. OpenAI crawler documentation describes robots.txt controls for OAI-SearchBot and GPTBot. Perplexity crawler documentation describes PerplexityBot for search results and Perplexity-User for user-initiated fetches, including access and WAF considerations. Those documents are useful for crawler access decisions, but they are not proof that any major AI search platform treats llms.txt as a ranking or citation file.

Red flag: do not call llms.txt "robots.txt for AI" in a technical brief unless the next sentence explains that it does not block, allow, index or rank crawlers. The distinction prevents bad implementation decisions.

Choose The Pages First

The quality of an llms.txt file is mostly decided before you write any Markdown. Page selection matters more than the exact wording of the file. A useful file is a curated map. A weak file is a copy of the sitemap with marketing labels.

Start with pages that are stable, canonical, publicly accessible and useful to someone trying to understand the site. For a software, service, ecommerce, documentation or content site, that often means:

Homepage or product overview, when it clearly explains the entity and category.
Pricing, plans or packaging pages, when they are public and current.
Documentation, API reference, setup guides or help center hubs.
Support, account, billing, shipping, returns or policy pages where relevant.
High-value evergreen guides that AI search can cite when they answer real category, comparison or implementation questions.
About, contact or company pages when entity clarity matters.
Security, privacy, terms or compliance pages when buyers or agents need factual constraints.

Do not include pages only because they target keywords. Include them because they help interpret the site. A page that is thin, outdated, duplicated, blocked, non-canonical or mainly promotional will not become more useful because it appears in llms.txt.

Page situation	Include it?	Why
Canonical product page with clear positioning and current facts	Yes	It helps an AI agent understand the core entity and offer
Stable documentation hub or API reference	Yes	It is a strong candidate for machine-readable discovery
Pricing or policy page that is public and current	Usually	It answers factual questions and reduces ambiguity
Blog post with a durable answer and clear evidence	Sometimes	Include only if it helps a real decision or recurring question
Tag archives, pagination, search pages or thin category pages	Usually no	They add noise without useful context
Staging, admin, private, checkout or account URLs	No	They create security and trust risk
Every URL from `sitemap.xml`	No	That turns the file into a bloated duplicate of a different tool

Generators can be useful for the first draft, especially on documentation sites. They should not make the final decision. Manually verify every URL, title and description before publishing. If the generated file includes old URLs, canonical conflicts, faceted pages, private paths or keyword-stuffed descriptions, fix the source list instead of shipping it quickly.

Decision rule: if a URL would not help a human understand the site, do not include it for an AI agent. The file should reduce context noise, not expose every possible path.

Use The Correct Markdown Format

The proposed llms.txt format is intentionally simple. It uses Markdown because it is readable by people, parsable by tools and familiar to LLM workflows. The usual site-wide location is /llms.txt, although a scoped file can also live in a subdirectory when it covers only that section, such as a documentation area.

The minimum structure is:

A required H1 inside the llms.txt file with the site, product or project name.
A short blockquote summary that explains the project and key context.
Optional notes with details that help interpret the file.
H2 sections that group related resources.
Markdown list items with annotated links.
An Optional section for lower-priority resources that can be skipped when context is limited.

Here is a compact pattern. Replace the paths with real canonical URLs from your own site before publishing:

# Product Or Site Name

> A concise summary of what the site, product or project is, who it serves, and what information the linked resources cover.

Important notes:
- Use the canonical domain and public URLs.
- Treat pricing, availability and policy pages as time-sensitive.

## Core Pages

- [Product overview](/): Explains the product category, primary use cases and main audience.
- [Pricing](/pricing/): Lists current plan structure, limits and billing information.
- [Documentation](/docs/): Provides setup, configuration and troubleshooting guidance.

## Policies

- [Privacy policy](/privacy/): Explains data collection, processing and user rights.
- [Terms of service](/terms/): Defines product usage rules and account obligations.

## Optional

- [Blog guides](/blog/): Evergreen educational articles for users who need more background context.

The Optional section has a special meaning in the proposal. It tells a reader or tool that these resources are secondary and can be skipped when a shorter context is needed. Use it for helpful but nonessential material, not for pages you secretly want promoted.

For most websites, the first version should be compact. A useful llms.txt file might contain a few dozen links, not thousands. Documentation-heavy sites can justify more structure, but they still need clear sections and descriptions.

Practical check: read the file from top to bottom as if you had never seen the website. If the file does not explain what the site is, which resources matter and how the sections differ, the structure needs another pass.

Write Descriptions That Help Discovery

Descriptions should explain the page, not advertise it. The best descriptions tell an AI agent what the page contains, who it is for and when it is useful. They should be short, factual and consistent with the linked page.

Avoid superlatives, unsupported claims and keyword stuffing. Do not write "the best AI visibility solution for every business" unless the page itself proves a narrower claim and that wording is defensible. Do not repeat the same keyword in every link. Do not describe a page as a guide if it is actually a sales page.

Use descriptions like this:

Weak description	Useful description
`Best platform for AI rankings and visibility.`	`Explains the product, supported monitoring surfaces and the visibility signals users can track.`
`Ultimate pricing page for AI search success.`	`Lists plans, included prompt capacity, refresh cadence and billing terms.`
`Everything about technical SEO, AI and marketing.`	`Covers crawlability, indexing and source-readiness checks for AI search monitoring.`
`Contact us now for amazing results.`	`Provides sales, support and general contact options for users who need help choosing a plan.`

Good descriptions are especially important when the file includes similar resources. If you list five guides, the note after each link should make the difference obvious. One might explain implementation. Another might explain measurement. Another might explain risks or troubleshooting. If every description sounds interchangeable, the file is not guiding discovery.

Also check for claim drift. If the linked page says a feature is available only on selected plans, the llms.txt note should not imply universal availability. If a policy page changes, the description should not preserve old wording. If a guide is dated, the note should not present it as current unless it has been reviewed.

Red flag: descriptions that read like meta descriptions written for keyword density. llms.txt should help a machine choose useful context, not repeat search copy with more buzzwords.

Publish And Test The File

For a site-wide implementation, publish the file at the root path: /llms.txt. For a scoped documentation implementation, a subdirectory file can make sense, but only when the scope is clear. A docs file should describe docs. A product file should describe the product. Do not scatter multiple files around the site unless someone can maintain them and explain which one is authoritative.

After deployment, test the file like a small technical release:

Open /llms.txt in a browser and confirm it loads without authentication.
Check that the URL returns a 200 status, not a redirect loop, soft 404, login page or CDN challenge.
Confirm the response is plain text or Markdown-readable, not an HTML template wrapped around the content.
Verify that every listed URL resolves successfully.
Confirm that every URL is canonical and uses the preferred protocol, host and trailing-slash pattern.
Remove staging, preview, admin, account, checkout and private URLs.
Check that the file is not blocked by hosting rules, CDN bot protection or accidental access controls.
Store ownership for updates so the file does not become stale after launches, migrations or pricing changes.

Content type is less important than readability, but do not serve a broken HTML page or a file download that agents cannot easily inspect. A clean text/plain; charset=utf-8 response is the safest approach when your hosting stack lets you set it. Also make sure compression, caching and redirects do not hide errors during testing.

The security check matters. llms.txt is public. If you list a private dashboard path, staging URL, unpublished documentation route, internal file name or customer-specific page, you have exposed a discovery hint. The file should contain only information you are comfortable making public.

Decision rule: publish only after a human has clicked every link and checked the file as a public visitor. If the file cannot pass a simple public access test, do not promote it as an AI discovery asset.

Should You Add llms-full.txt

llms-full.txt is a companion idea: instead of only listing important resources, it can provide a fuller Markdown corpus for a project, documentation set or technical reference. It is most useful when the content is structured, stable and likely to be consumed as context by agents or developer workflows.

That does not mean every website needs it. A full corpus creates maintenance, size and freshness problems. It can also duplicate content, expose private material or become too large to be useful in a context window.

Situation	Use `llms-full.txt`?	Reason
Developer docs, API references or framework guides with clean Markdown versions	Consider it	Agents may need enough context to answer implementation questions
Small product site with a few public pages	Usually no	A concise `llms.txt` file is enough
Large ecommerce catalog with changing inventory	Usually no	Freshness, duplication and size can become unmanageable
Site with private, gated or account-specific content	No	The exposure risk is higher than the discovery value
Mature docs site with automated Markdown generation and review	Consider it	Automation can keep the corpus current if ownership is clear

If you create llms-full.txt, define its scope before generating it. Decide whether it covers only docs, only API pages, only guides or the entire public site. Set a maximum size target. Exclude duplicate templates, navigation text, legal boilerplate where irrelevant, internal comments, private paths and outdated versions. Then regenerate it whenever the source content changes.

Red flag: copying an entire website into one stale, oversized llms-full.txt file. More content is not more useful if the result is duplicated, outdated or impossible to fit into a practical context window.

Measure Impact Without Guesswork

The most common measurement mistake is treating publication as proof. A crawler request to /llms.txt does not prove that an AI answer used the file. A brand mention does not prove your website was cited. A citation in one answer does not prove a durable visibility gain.

Measure the file as a supporting asset. Watch for signals that can be inspected, repeated and tied to decisions:

Server log requests to /llms.txt and, where relevant, /llms-full.txt.
User agents that request the file, including documented AI search crawlers where visible.
Requests to linked pages after the file is accessed.
AI answer mentions of the brand, product, category or documentation.
Visible citations to your own domain in AI answers.
Cited third-party URLs that frame your brand or category.
Competitor citations for the same prompts.
Source gaps where AI answers cite other pages while your relevant page is absent.
Changes by platform, prompt, country, language and date.

Keep the evidence separate. Log crawler access as crawler access. Log citations as visible cited URLs. Log mentions as mentions. Log recommendations as recommendations. These signals can move together, but they do not mean the same thing. When the goal is to prove whether your site is appearing as source evidence, track AI citations at the URL level instead of relying on screenshots or raw crawler hits.

This is also where the work leaves simple file setup and becomes AI visibility monitoring. If the question is "Did we publish the file correctly?", manual checks are enough. If the question is "Are AI systems mentioning, citing or recommending us across prompts and competitors over time?", the evidence needs recurring prompt-level tracking, cited URL history and source-gap analysis.

Decision rule: treat llms.txt as a supporting diagnostic asset unless repeated prompt-level evidence shows a change in mentions, citations, cited URLs or source gaps. Do not report impact from one crawler hit or one favorable answer.

Common Mistakes To Avoid

Most bad llms.txt implementations fail for predictable reasons. They either overstate what the file can do, or they publish a file that is too noisy to help.

Avoid these patterns:

Calling the file an AI ranking factor without platform evidence.
Presenting it as a replacement for robots.txt, sitemap.xml, schema or normal SEO fundamentals.
Dumping every sitemap URL into the file.
Linking to non-canonical, redirected, blocked or low-value pages.
Including staging, internal, admin, account or checkout URLs.
Writing descriptions full of slogans, superlatives or repeated keywords.
Letting a generator publish without human review.
Creating llms-full.txt without a freshness plan.
Measuring success only by whether a crawler requested the file once.

The practical fix is simple: keep the file small enough to review, useful enough to guide context and boring enough to be trusted. The best llms.txt file usually reads like well-maintained documentation, not like a growth experiment.

The Bottom Line

Create an llms.txt file when you already have stable pages that deserve a curated Markdown map. Put it at /llms.txt, use the proposed structure, describe each resource factually, test public access and maintain it after site changes. Consider llms-full.txt only when a fuller Markdown corpus is useful and manageable.

Do not treat the file as a guaranteed AI discovery, ranking or citation lever. It is a low-cost context file with uncertain platform adoption, not a magic layer above weak pages. The highest-value next step after publishing is measurement: check whether AI systems actually access the file, cite your pages, mention your brand, use third-party sources or leave source gaps that need separate work.