Search Engine Robots and Algorithms
Articles and resources about search engine robots, how to use robots.txt files and the behaviour of search engine algorithms.
-
Search Engine Robots - Archive
-
Archived articles and resources about search engine robots and how to use the robots.txt files.
- Introducing smartphone Googlebot-Mobile
- Posted by Yoshikiyo Kato, Software Engineer. Google Webmaster Central Blog, Thursday, December 15, 2011. "With the number of smartphone users rapidly rising, we’re seeing more and more websites providing content specifically designed to be browsed on smartphones. Today we are happy to announce that Googlebot-Mobile now crawls with a smartphone user-agent in addition to its previous feature phone user-agents. This is to increase our coverage of smartphone content and to provide a better search experience for smartphone users.
- Submit URLs to Google with Fetch as Googlebot
- Written by Jonathan Simon & Susan Moskwa, Webmaster Trends Analysts. Google Webmaster Central Blog, Wednesday, August 3, 2011. "The Fetch as Googlebot feature in Webmaster Tools now provides a way to submit new and updated URLs to Google for indexing. After you fetch a URL as Googlebot, if the fetch is successful, you’ll now see the option to submit that URL to our index. When you submit a URL in this way Googlebot will crawl the URL, usually within a day. We’ll then consider it for inclusion in our index. Note that we don’t guarantee that every URL submitted in this way will be indexed; we’ll still use our regular processes—the same ones we use on URLs discovered in any other way—to evaluate whether a URL belongs in our index..."
- Controlling crawling and indexing now documented on code.google.com
- Posted by Jonathan Simon, Webmaster Trends Analyst. Google Webmaster Central Blog, Wednesday, November 24, 2010. "Do you know how Google's crawler, Googlebot, handles conflicting directives in your robots.txt file? Do you know how to prevent a PDF file from being indexed? Do you know Googlebot's favorite song? The answers to these questions (except for the last one :)), along with lots of other information about controlling the crawling and indexing of your site, are now available on code.google.com..."
- 301 Redirect or Rel=Canonical - Which One Should You Use?
- Posted by Paddy Moogan. SEOmoz - The Daily SEO Blog, November 14, 2010. "There has been quite a lot of discussion lately about the use of rel=canonical and we've certainly seen a decent amount of Q&A from SEOmoz members on the subject. Dr. Pete of course blogged about his rel-canonical experiment which had somewhat interesting results and Lindsay wrote a great guide to rel=canonical. Additionally, there seem to be a few common problems that are along the following lines - When should I use a rel canonical tag over a 301? Is there a way that the rel canonical tag can hurt me? When should I not use the canonical tag? What if I can't get developers to implement 301s? I'm going to attempt to answer these questions here..."
- Serious Robots.txt Misuse and High Impact Solutions
- Posted by Lindsay. SEOmoz The Daily SEO Blog, October 11, 2010. "... The robots.txt protocol was established in 1994 as a way for webmasters to indicate which pages and directories should not be accessed by bots. To this day, respectable bots adhere to the entries in the file... but only to a point. Your Pages Could Still Show Up in the SERPs. Bots that follow the instructions of the robots.txt file, including Google and the other big guys, won't index the content of the page but they may still put the page in their index. We've all seen these limited listings in the Google SERPs..."
- Why You Should Prevent Certain Pages From Being Indexed
- by Mathieu Burgerhout. Search Engine Watch, October 7, 2010. "Every website has more important pages and less important pages. Unimportant pages are an unavoidable part of the hierarchy or structure of your website. It's only harmful when you don't recognize these kinds of pages. Here's how you can determine which of your pages are unimportant and prevent them from being indexed. The result will be a clean, mean, money-making online conversion machine..."
- Robots Exclusion Protocol Guide - in pdf format (204kb)
- (This document requires the use of Adobe Acrobat Reader). Bruce Clay Australia, 2010. "The Robots Exclusion Protocol (REP) is a very simple but powerful mechanism available to webmasters and SEOs alike. Perhaps it is the simplicity of the file that means it is often overlooked and often the cause of one or more critical SEO issues. To this end, we have attempted to pull together tricks, tips and examples to assist with the implementation and management of your robots.txt file. As many of the non‐standard REP declarations supported by Google, Yahoo and Bing may change, we will be providing updates to this in the future..."
- Crawl, Index, Rank, Repeat: A Tactical SEO Framework (Part 2)
- By Adam Audette, Search Engine Watch, June 8, 2010. "... Indexation is the next priority in this discipline, and duplicate content is far and away the largest issue to be addressed. It's probably not an exaggeration to state that all large sites have some sort of duplication, either intentional or otherwise..."
- An Illustrated Guide to Matt Cutts' Comments on Crawling & Indexation
- Posted by randfish. SEOmoz Blog, March 17, 2010. Rand provides interpretations in cartoon and graphical form to illustrate the key points Matt Cutts made in an interview with Eric Enge about how Google crawls and indexes web content.
- Exclusive: How Google's Algorithm Rules the Web
- By Steven Levy. Wired Magazine, February 22, 2010. "Want to know how Google is about to change your life? Stop by the Ouagadougou conference room on a Thursday morning. It is here, at the Mountain View, California, headquarters of the world's most powerful Internet company, that a room filled with three dozen engineers, product managers, and executives figure out how to make their search engine even smarter. This year, Google will introduce 550 or so improvements to its fabled algorithm, and each will be determined at a gathering just like this one..."
- Tips for Rapid Crawling and Indexing
- By Erik Dafforn, ClickZ, November 25, 2009. "You can add content to your site in two different ways: update an existing page (URL), or add content on entirely new URLs. While new content should eventually be found during the engines' crawling cycles, you can do many things to help engines notice the new content or new URLs more quickly than they would on their own..."
- See What Googlebot Sees On Your Site
- by Vanessa Fox. Search Engine Land, October 13, 2009. "Google Webmaster Tools has just launched a 'labs' section, where you'll find new features that may be early in the development cycle and not quite as robust as the rest of the tools. The features available so far are Fetch as Googlebot, which lets you see exactly what Googlebot is served when it requests a URL from your server and Malware Details, which shows you malicious code snippets from your site if it's been flagged as containing malware..."
- Prevent a bot from getting 'lost in space' (SEM 101)
- by Webmaster Center team. Bing community Blogs, August 20, 2009. Discusses how to make best use of the robots.txt file.
- Crawl delay and the Bing crawler, MSNBot
- by Webmaster Center team. Bing community Blogs, August 10, 2009. "Search engines, such as Bing, need to regularly crawl websites not only to index new content, but also to check for content changes and removed content. Bing offers webmasters the ability to slow down the crawl rate to accommodate web server load issues..."
- Whiteboard Friday - Matt Cutts on NoFollow
- Posted by great scott! SEOmoz Blog, August 13, 2009. "This week we've got a special treat! Live from the halls of SES San Jose, our own Jen Lopez sits down with the one-and-only Matt Cutts to discuss NoFollow..."
This category last updated: 16 December 2011