An XML sitemap is one of those technical SEO elements that sounds more intimidating than it is. At its core, it's just a list of URLs on your site that you want search engines to know about — formatted in a way that crawlers can read efficiently. Done right, it's a reliable way to ensure Google discovers and considers all your important pages for ranking. Done wrong — or left bloated with the wrong URLs — it can actively mislead crawlers and waste the crawl budget that should be spent on pages that matter.

This guide explains what XML sitemaps actually do, when they make a real difference to your rankings, what to include and exclude, and how to validate yours is in good shape.

What an XML Sitemap Actually Does

When Googlebot crawls the web, its primary method of discovering pages is following links — from your homepage to category pages, from category pages to individual posts, and so on. For most well-structured sites with strong internal linking, Google will eventually find most important pages this way.

An XML sitemap is a shortcut. Instead of relying entirely on link discovery, you hand Google a complete list of URLs and say "here's everything I want you to look at." This is particularly valuable in three situations:

  • New or recently launched sites with few external links pointing to them. Without inbound links, Googlebot has less reason to visit and crawl deeply. A sitemap submitted to Google Search Console gets pages into the crawl queue faster.
  • Large sites with thousands of pages where some content is difficult to reach through normal link crawling — deep category pages, archived content, or pages that aren't well linked internally.
  • Sites with frequently updated content like news publishers, blogs, or e-commerce stores with rotating inventory. The <lastmod> tag in a sitemap signals to Google when a page was last updated, potentially prompting faster recrawling of changed content.

What a sitemap does not do: guarantee indexing. Submitting a URL in your sitemap is a request, not a command. Google will crawl sitemap URLs at its own pace and make its own decision about whether to index each one. A page in your sitemap that has thin content, duplicate content problems, or a noindex directive will still be excluded from the index regardless.

What a Basic XML Sitemap Looks Like

XML sitemaps follow a standardized format defined at sitemaps.org. A minimal valid sitemap looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yourdomain.com/page-one/</loc>
    <lastmod>2025-11-15</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
  <url>
    <loc>https://yourdomain.com/page-two/</loc>
    <lastmod>2025-10-01</lastmod>
    <changefreq>yearly</changefreq>
    <priority>0.5</priority>
  </url>
</urlset>

A quick note on the optional tags: <lastmod> is genuinely useful when accurate — it helps Google prioritize recrawling updated content. <changefreq> and <priority> are largely ignored by Google in practice and not worth spending time on. Google has stated explicitly that it doesn't use these tags to determine crawl frequency.

What to Include in Your Sitemap

The goal is a sitemap that contains only canonical, indexable, working pages. That means:

  • Include: Your homepage, key landing pages, blog posts, product pages, category pages, and any other pages you actively want Google to index and rank.
  • Exclude: Pages with noindex directives — including them in the sitemap sends a contradictory signal (you're simultaneously asking Google to look at the page and not index it). Google will typically respect the noindex, but the contradiction wastes crawl budget.
  • Exclude: Pages that return 301 redirects. Your sitemap should list the final destination URLs, not URLs that redirect to them. Redirect URLs in sitemaps waste crawl budget and signal poor site hygiene.
  • Exclude: Pages that return 404 or other error codes. A sitemap full of broken URLs tells Google your site is poorly maintained.
  • Exclude: Duplicate content pages, pagination pages (usually), tag archives with thin content, and any URL that isn't the canonical version of its content.
Sitemap size limits: A single sitemap file can contain a maximum of 50,000 URLs and must be under 50MB uncompressed. Sites with more URLs than this need a sitemap index file — a sitemap of sitemaps — that references multiple individual sitemap files. Most CMS plugins handle this automatically once you cross the threshold.

Sitemap Index Files: When You Need One

Larger sites often split their sitemap into multiple files organized by content type — one for blog posts, one for product pages, one for category pages — and reference them all from a sitemap index. This makes it easier to diagnose indexing issues by content type in Google Search Console, since you can see coverage statistics for each sitemap file separately.

A sitemap index file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://yourdomain.com/sitemap-posts.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://yourdomain.com/sitemap-pages.xml</loc>
  </sitemap>
</sitemapindex>

How to Submit Your Sitemap to Google

There are two ways to make sure Google knows about your sitemap:

Google Search Console. Go to Search Console → Sitemaps, enter your sitemap URL, and submit. This is the most direct method and gives you visibility into how many URLs Google has discovered and indexed from the sitemap, plus any errors it encountered.

robots.txt. Add a Sitemap: directive to your robots.txt file pointing to your sitemap URL. As covered in the guide to robots.txt and how to test it, this means any crawler that checks your robots.txt — even before you've submitted the sitemap manually — will find and process it automatically.

Both methods together is the best approach: submit via Search Console for immediate processing and monitoring, and reference it in robots.txt as a reliable fallback for all crawlers.

How to Validate Your Sitemap

A sitemap that contains errors — malformed XML, incorrect namespace declarations, URLs with special characters that aren't properly encoded, or references to redirecting/broken URLs — can fail silently. Google may process part of it, skip problematic entries, or reject the file entirely, all without obvious notification.

The XML Sitemap Validator fetches your live sitemap and checks it for structural errors, URL issues, and common problems that prevent Google from processing it correctly. Run it any time you make significant changes to your site — after a migration, after adding new content types, or after updating your CMS — to confirm the sitemap is clean and Google is getting accurate information about your pages.

Keeping your sitemap accurate is one of the simplest ongoing ranking maintenance tasks available. A clean sitemap, combined with a correctly configured robots.txt file and solid crawlability across the site, ensures Google always has the clearest possible picture of what your site contains and what you want it to index.

For the full technical SEO checklist that ties all of these elements together, the guide to what technical SEO covers is the place to start.