Rogerbot
Frequently Asked Questions
What's Covered?
In this guide you’ll learn more about our crawler, Rogerbot, which is used to crawl your site for your Campaign Site Crawl as well as On-Demand Crawl. For more information regarding our index crawler, Dotbot, please see our Dotbot guide.
Quick Links
The Moz Site Audit Crawler
Rogerbot is the Moz crawler for Moz Pro Campaign site audits. It is different from Dotbot, which is our web crawler that powers our Links index. Rogerbot accesses the code of your site to deliver reports back to your Moz Pro Campaign. This helps you learn about your site and teaches you how to fix problems that might be affecting your rankings. Rogerbot serves up data for your Site Crawl report, On-Demand Crawl, Page Optimisation report and On-Page Grader.
Telling Rogerbot What To Do With Your Robots.txt File
Rogerbot is built to obey robots.txt files. You can use this marvellous file to inform bots of how they should behave on your site. It's a bit like a code of conduct: you know, take off your shoes, stay out of the dining room, and get those elbows off the table, gosh darnit! That sort of thing.
Every site should have a robots.txt file. You can check this is in place by going to yoursite.com/robots.txt. You can also check the robots.txt file of any other site, just for kicks. For example: moz.com/robots.txt, facebook.com/robots.txt, and yes, even google.com/robots.txt. Anyone can see your robots.txt file as well; it's publicly available, so bear that in mind.
If your site doesn't have a robots.txt file, your robots.txt files fails to load, or returns an error, we may have trouble crawling your site. This can also cause an error that bloats up your server logs. You will want to have some content in the file, as a blank file might confuse someone checking to see if your site is set up correctly. They may think this is an error. A file configured with some content is preferable, even if you're not blocking any bots.
The Rogerbot User-agent
To talk directly to rogerbot, or our other crawler, dotbot, you can call them out by their name, also called the User-agent. These are our crawlers: User-agent: rogerbot and User-agent: dotbot. So far, so good.
Allowing Rogerbot
To tell rogerbot that it can crawl all the pages on your site, you need to say "user-agent: rogerbot, there are no pages on this site you are not privy to, go wild!"
If the field after disallow: is blank, that specifically means no URLs should be blocked.
It looks like this in bot language:
User-agent: rogerbot Disallow:
Block Rogerbot From Crawling Your Site
If you’re sick of rogerbot crawling your site, you can block the crawler by adding a slash ("/") after the disallow directive in your robots.txt file. That's saying: "Rogerbot, you can't get to any of these pages, all pages on this site are not for you, stay away, buddy boy."
Blocking rogerbot with your robots.txt file looks a like this:
User-agent: rogerbot Disallow: /
Note the slash denoting the root of the site. Adding this code will prevent our rogerbot from being able to crawl your website.
You can also exclude rogerbot from parts of your site, like subfolders.
User-agent: rogerbot Disallow: */marketplace/*
This syntax tells only our Rogerbot crawler not to crawl any pages that contain this URL string like www.example.com/marketplace/
We also recommend checking your robots.txt file in this handy Robots Checker Tool once you make changes to avoid any nasty surprises.
Crawl Delay To Slow Down Rogerbot
We want to crawl your site as fast as we can, so we can complete a crawl in good time, without causing issues for your human visitors.
If you want to slow rogerbot down, you can use the Crawl Delay directive. The following directive would only allow rogerbot to access your site once every 10 seconds:
User-agent: rogerbot Crawl-delay: 10
Bear in mind that when you consider that there are 86,400 seconds in a day, this would allow Rogerbot to access 8,640 pages in a single day, so it could mean it takes a while to crawl your site if you have many pages to crawl.
We don't recommend adding a crawl delay larger than 30 seconds, or rogerbot might not be able to finish the crawl of your site.
IP Range for Rogerbot
Unfortunately we do not use a static IP address or range of IP addresses, as we have designed our crawler to have a dynamic approach. This means we use thousands of dynamic IP addresses which will change each time we run a crawl. This approach gives us the best dynamic view of the web, but it can make us incompatible with some servers or hosting providers.
The best option available is to identify our crawler by User-agent: rogerbot.
Block Rogerbot From Dynamic Pages
The best way to prevent our crawler from accessing these dynamically tagged pages would be to block it from accessing them using the disallow directive in your robots.txt file. It would look something like this:
User-agent: Rogerbot Disallow: /TAG TYPE
etc., until you have blocked all of the parameters or tags that may be causing these TYPE errors. You can also use the wildcard user-agent * in order to block all crawlers from those pages, if you prefer.
Block All Bots Except Rogerbot
Make sure you have the 'user-agent specific directive' above the 'all bots directive'.
User-agent: rogerbot Disallow: User-agent: * Disallow: /
Does rogerbot support the 'allow' directive?
Yes, rogerbot does support the 'allow' directive.
To allow pages to be crawled within a directory, while disallowing rogerbot from the rest of the directory, you can add something like this to your robots.txt file:
User-agent: rogerbot Allow: /category/illustration/ Disallow: /category/
Related Articles
Was this article helpful?
Yes! Amazing! Yes! It was what I needed. Meh. It wasn’t really what I was looking for. No, it wasn’t helpful at all.
Thanks for the feedback.