What Is Crawl Budget and Does Your Site Have a Problem?

Google doesn't crawl every page on the internet every day. It can't. Its crawlers operate under practical constraints, and every website gets an allocation of crawl attention based on how valuable and well-maintained Google considers that site to be. That allocation is what the SEO industry calls crawl budget, and for smaller or newer sites, it's one of the less-discussed reasons why pages that should be indexed sometimes aren't.

This guide explains what crawl budget actually is, which sites genuinely need to worry about it, and what you can do, for free, to make sure Google is spending its crawl time on the pages that matter most.

What Is Crawl Budget?

Crawl budget is the number of pages Googlebot will crawl on your site within a given timeframe. Google's documentation breaks it into two components that work together: crawl rate limit and crawl demand.

Crawl rate limit is the maximum rate at which Googlebot will crawl your site without overloading your server. Google adjusts this automatically based on how your server responds. If your server is fast and stable, Googlebot crawls more aggressively. If it's slow or returns errors, Googlebot backs off to avoid causing further problems. This rate can be lowered manually in Search Console, though almost no site owners have a reason to do that.

Crawl demand is how much Google wants to crawl your site based on perceived value. Pages that are popular, frequently updated, and well-linked tend to get revisited more often. Newly discovered pages generate immediate crawl demand. Pages that are thin, rarely linked, or haven't changed in months see their crawl demand decline over time.

Your effective crawl budget is roughly the intersection of these two: how fast Google can crawl you without overloading your server, scaled by how much Google thinks it's worth doing so.

Does Your Site Actually Have a Crawl Budget Problem?

Crawl budget is a real constraint for large sites with tens of thousands of pages. For a typical blog or small business site with a few hundred pages, it's much less likely to be a meaningful bottleneck. That said, certain conditions can create crawl budget problems on sites of any size.

Your site may have a crawl budget problem if you're seeing any of these:

New pages take weeks or months to get indexed. If you publish a new post and it still doesn't appear in Google Search Console's coverage report after four to six weeks, Googlebot may not be visiting your site frequently enough to pick it up promptly. For comparison, a site with strong crawl demand typically sees new pages indexed within a few days to a week.

A significant percentage of your pages are not indexed. Open Search Console and check the Coverage report (or the Indexing report, depending on your view). If a large share of your pages sit in the "Discovered, currently not indexed" or "Crawled, currently not indexed" categories, Googlebot has either queued them and not gotten to them yet, or it has visited and decided not to index them. The distinction matters. "Discovered, not indexed" usually points to a crawl budget or priority issue. "Crawled, not indexed" points to a content quality issue.

Your crawl stats show Googlebot visiting mostly low-value pages. In Search Console, the Settings area includes a Crawl Stats report that shows you which pages Googlebot has visited recently, how often, and at what response times. If Googlebot is spending the bulk of its visits on URL parameter pages, paginated archives, or other low-value URLs instead of your core content, crawl budget is being wasted.

What Wastes Crawl Budget?

Several common site patterns cause Googlebot to spend crawl time on URLs that add no value, leaving less budget for the pages that matter.

URL Parameters

Faceted navigation, session IDs, tracking parameters, and sorting or filtering options often generate thousands of duplicate or near-duplicate URLs with the same content under different URLs. An e-commerce site with fifty products and ten filter combinations produces 500 unique URLs for what is essentially the same content. Googlebot will attempt to crawl many of these unless they're explicitly blocked.

Redirect Chains and Broken Links

Every hop in a redirect chain consumes crawl budget without contributing a successfully indexed page. A URL that redirects to a second URL that redirects to a third means Googlebot spent three crawl requests to index one page. At scale, this adds up. Broken links are similarly wasteful: Googlebot follows the link, receives a 404, and logs a wasted request. The Redirect & Header Checker lets you trace redirect chains on any URL, and the Broken Link Checker finds dead links on any page before Googlebot does.

Thin or Duplicate Content

Pages with little unique content, such as tag archive pages with one or two posts, paginated pages beyond page two, or automatically generated near-duplicate pages, attract crawl visits without earning index slots. Google may crawl these pages repeatedly, "deciding" not to index them each time, while spending less time on your high-value content. The pattern shows up in Search Console as pages stuck in "Crawled, currently not indexed."

Noindexed Pages Without Robots.txt Blocking

A common misconception: if a page has a noindex directive, you might assume Googlebot won't visit it. That's wrong. A noindex tag tells Google not to add the page to the index, but Googlebot still has to crawl the page to read that directive. If you have a large number of pages you never want indexed (admin pages, search result pages, thin utility pages), blocking them in robots.txt prevents Googlebot from even requesting them, which preserves crawl budget for pages that matter. The Robots.txt Tester lets you verify your rules are written correctly before they go live.

How to Check Your Crawl Budget Health (Free)

Start with Google Search Console. The Coverage or Indexing report shows you how many pages are indexed, how many are excluded, and the specific reason for each exclusion. This is your most direct window into whether Google is finding and indexing your important pages.

The Crawl Stats report, under Settings in Search Console, shows Googlebot's activity on your site over the past 90 days. Look at the "By response" section: a high percentage of 404 responses suggests too many broken links. A high count of "Not modified" responses is normal. Redirect responses point to redirect chains worth collapsing.

Check your XML sitemap using the Sitemap Validator. Your sitemap is a direct signal to Googlebot about which pages you want crawled and indexed. Including URLs that return 404s, redirect, or have noindex tags in your sitemap sends conflicting signals and wastes crawl requests on URLs that shouldn't be there.

Run an Indexability Check on your key pages to confirm they're not inadvertently blocked by a noindex tag or an X-Robots-Tag header. It's more common than you'd expect: pages get noindexed accidentally during development and the tag never gets removed.

How to Improve Crawl Budget Efficiency

Block genuinely valueless URLs in robots.txt. Any URL that you'd never want Google to index and that generates no ranking value should be blocked at the robots.txt level: admin paths, search result pages, login pages, URL parameter combinations. This keeps Googlebot focused on content that can actually rank.

Fix or consolidate redirect chains. Wherever you have A redirecting to B redirecting to C, update the original URL to redirect directly to C. Every intermediate hop is a wasted crawl request.

Improve your internal linking to surface important pages. Googlebot follows internal links to discover and revisit pages. A page with no internal links pointing to it, an "orphaned page," is crawled infrequently even if it's important content. Make sure every page you want crawled regularly is linked from at least one other indexed page. An internal linking strategy that connects related content also helps Google understand your site's topical structure, which tends to increase crawl demand for your best content.

Improve page speed. Googlebot's crawl rate is partially governed by how quickly your server responds. Fast servers get crawled more. The Page Speed Optimizer runs a real Lighthouse check via Google's API and shows you exactly where your pages are losing speed, with prioritized recommendations to fix the biggest issues first.

Keep your sitemap clean and current. Submit a sitemap to Search Console that contains only the URLs you want indexed. Remove URLs that return errors, have been redirected, or are intentionally noindexed. A clean sitemap helps Googlebot prioritize your most important pages.

Crawl Budget and Indexing Lag

It's worth separating crawl budget problems from indexing problems, because the fixes differ. A page can be crawled but not indexed (a content quality or duplicate content issue), or it can sit in the crawl queue for weeks before Googlebot gets to it (a crawl budget issue). Search Console's coverage statuses help distinguish between these: "Discovered, currently not indexed" means the URL is in the queue but hasn't been crawled yet. "Crawled, currently not indexed" means Googlebot visited but chose not to index.

Understanding how search engines crawl and index the full pipeline, from discovery through rendering and indexing, helps you diagnose which stage of that process is creating the delay. Crawl budget is just one variable in a longer chain.

For new or recently relaunched sites, indexing lag is often a patience issue as much as a technical one. Google takes time to build trust in a new domain. That said, making sure your technical foundation is clean, your sitemap is submitted, your internal links are in good shape, and your pages load quickly gives Googlebot every reason to prioritize your site and return frequently.

SEO Insights — Tips & Strategies

What Is Crawl Budget and Does Your Site Have a Problem?