Crawling and indexing are not the same thing
Robots.txt controls crawling — whether a search engine is allowed to fetch a URL at all. A noindex tag controls indexing — whether a page that has been crawled is allowed to appear in search results. They act at different stages, which is why they are not interchangeable.
Getting this distinction right prevents the most common SEO own-goals, where a page meant to be hidden stays visible, or a page meant to be visible quietly disappears.
When to use robots.txt
Use robots.txt to stop crawlers wasting time on sections that do not need to be in search at all — internal search results, faceted filter URLs, admin areas, or large generated paths. It is a site-level traffic rule and a good place to point crawlers to your sitemap.
What it is not is a privacy control. The file is public, and disallowing a URL does not guarantee it stays out of the index; if other pages link to it, the bare URL can still be listed.
- Block large or irrelevant sections from crawling
- Point crawlers to your sitemap
- Do not rely on it to hide sensitive pages
When to use noindex
Use a noindex meta tag (or header) when you want a specific page kept out of search results — thank-you pages, thin tag pages, duplicate printer-friendly versions, and the like. It tells search engines that even though the page can be crawled, it should not be listed.
The key requirement is that the page stays crawlable. If a search engine cannot fetch the page, it never sees the noindex instruction, and the tag has no effect.
- Keep a specific page out of results
- Leave the page crawlable so the tag is seen
- Good for thin, duplicate, or utility pages
The classic mistake
The trap is using both at once: disallowing a URL in robots.txt and adding a noindex tag to it. Because robots.txt blocks the crawl, the engine never reads the noindex — so the page can still appear in results as a bare link with no description, which is the opposite of what you wanted.
If you need a page gone from search, allow it to be crawled and add noindex. Once it has dropped out of the index, you can block it in robots.txt later if you want to save crawl budget.