Rogerbot

Frequently Asked Questions

  • No, we do not have an IP range for Rogerbot as we do not use a static IP address or range of IP addresses. The best option available is to identify our crawler by User-agent: rogerbot.
  • Yes, Rogerbot is built to obey robots.txt files, read more on how to do this below.
  • No, you cannot have the full domain in a disallow directive. This is parsed as a full Disallow: /
  • Keep an eye out for rogerbot/1.2 when digging through your logs.
  • Bots look for records that match their user-agent. If they don’t find a record, they will use the User-agent: * record which applies to all bots. To help them find their user-agent make sure this is above User-agent: *.

What's Covered?

In this guide you’ll learn more about our crawler, Rogerbot, which is used to crawl your site for your Campaign Site Crawl as well as On-Demand Crawl. For more information regarding our index crawler, Dotbot, please see our Dotbot guide.

Quick Links

The Moz Site Audit Crawler

Rogerbot is the Moz crawler for Moz Pro Campaign site audits. It is different from Dotbot, which is our web crawler that powers our Links index. Rogerbot accesses the code of your site to deliver reports back to your Moz Pro Campaign. This helps you learn about your site and teaches you how to fix problems that might be affecting your rankings. Rogerbot serves up data for your Site Crawl report, On-Demand Crawl, Page Optimisation report and On-Page Grader.

Telling Rogerbot What To Do With Your Robots.txt File

Rogerbot is built to obey robots.txt files. You can use this marvellous file to inform bots of how they should behave on your site. It's a bit like a code of conduct: you know, take off your shoes, stay out of the dining room, and get those elbows off the table, gosh darnit! That sort of thing.

Every site should have a robots.txt file. You can check this is in place by going to yoursite.com/robots.txt. You can also check the robots.txt file of any other site, just for kicks. For example: moz.com/robots.txt, facebook.com/robots.txt, and yes, even google.com/robots.txt. Anyone can see your robots.txt file as well; it's publicly available, so bear that in mind.

If your site doesn't have a robots.txt file, your robots.txt files fails to load, or returns an error, we may have trouble crawling your site. This can also cause an error that bloats up your server logs. You will want to have some content in the file, as a blank file might confuse someone checking to see if your site is set up correctly. They may think this is an error. A file configured with some content is preferable, even if you're not blocking any bots.

The Rogerbot User-agent

To talk directly to rogerbot, or our other crawler, dotbot, you can call them out by their name, also called the User-agent. These are our crawlers: User-agent: rogerbot and User-agent: dotbot. So far, so good.

Allowing Rogerbot

To tell rogerbot that it can crawl all the pages on your site, you need to say "user-agent: rogerbot, there are no pages on this site you are not privy to, go wild!"

If the field after disallow: is blank, that specifically means no URLs should be blocked.

It looks like this in bot language:

          
User-agent: rogerbot

Disallow:
        

Block Rogerbot From Crawling Your Site

If you’re sick of rogerbot crawling your site, you can block the crawler by adding a slash ("/") after the disallow directive in your robots.txt file. That's saying: "Rogerbot, you can't get to any of these pages, all pages on this site are not for you, stay away, buddy boy."

Blocking rogerbot with your robots.txt file looks a like this:

          
User-agent: rogerbot

Disallow: /
        

Note the slash denoting the root of the site. Adding this code will prevent our rogerbot from being able to crawl your website.

You can also exclude rogerbot from parts of your site, like subfolders.

          
User-agent: rogerbot

Disallow: */marketplace/*
        

This syntax tells only our Rogerbot crawler not to crawl any pages that contain this URL string like www.example.com/marketplace/

We also recommend checking your robots.txt file in this handy Robots Checker Tool once you make changes to avoid any nasty surprises.

Crawl Delay To Slow Down Rogerbot

We want to crawl your site as fast as we can, so we can complete a crawl in good time, without causing issues for your human visitors.

If you want to slow rogerbot down, you can use the Crawl Delay directive. The following directive would only allow rogerbot to access your site once every 10 seconds:

          
User-agent: rogerbot

Crawl-delay: 10
        

Bear in mind that when you consider that there are 86,400 seconds in a day, this would allow Rogerbot to access 8,640 pages in a single day, so it could mean it takes a while to crawl your site if you have many pages to crawl.

We don't recommend adding a crawl delay larger than 30 seconds, or rogerbot might not be able to finish the crawl of your site.

IP Range for Rogerbot

Unfortunately we do not use a static IP address or range of IP addresses, as we have designed our crawler to have a dynamic approach. This means we use thousands of dynamic IP addresses which will change each time we run a crawl. This approach gives us the best dynamic view of the web, but it can make us incompatible with some servers or hosting providers.

The best option available is to identify our crawler by User-agent: rogerbot.

Block Rogerbot From Dynamic Pages

The best way to prevent our crawler from accessing these dynamically tagged pages would be to block it from accessing them using the disallow directive in your robots.txt file. It would look something like this:

          
User-agent: Rogerbot

Disallow: /TAG TYPE
        

etc., until you have blocked all of the parameters or tags that may be causing these TYPE errors. You can also use the wildcard user-agent * in order to block all crawlers from those pages, if you prefer.

Block All Bots Except Rogerbot

Make sure you have the 'user-agent specific directive' above the 'all bots directive'.

          
User-agent: rogerbot

Disallow:

User-agent: *

Disallow: /
        

Does rogerbot support the 'allow' directive?

Yes, rogerbot does support the 'allow' directive.

To allow pages to be crawled within a directory, while disallowing rogerbot from the rest of the directory, you can add something like this to your robots.txt file:

          
User-agent: rogerbot

Allow: /category/illustration/

Disallow: /category/
        

Woo! 🎉
Thanks for the feedback.

Got it.
Thanks for the feedback.