Search Engines: Support, Tools, and Services

SEO involves using tools that allow us to do a good job. There are a lot of tools available, and surprisingly enough or not, some are even provided by search engines, just reinforcing the idea that it’s a reciprocal relation. Some of the tools provided by search engines are particularly helpful, simply because they want webmasters to create easily accessible sites. Along with the tools, guidance and analytics are also provided for free, after all, without good sites search engines would never exist. At the end of the day, optimal conditions to exchange information between sites and search engines are created, unique business opportunities are born in this process.

Search Engines: Support and Services

The most important search engines, among which we find Google and Bing, work and support certain protocols that make the SEO relation with webmasters work more effectively. Here we present the most common of those elements.

There are three kinds of sitemaps: XML, RSS, and TXT

Site Maps are a well-known and self-explanatory expression, but it’s something different from the sitemap, so we should not confuse them. A sitemap is nothing more than the files that explain to the search engines how the crawling is made on that site. Those files are search engines friends, they help them find the content on the site, and they help the search engines to classify that content. Without those files, search engines would not be able to find the content, so this says a lot about their importance. Sitemaps have different formats according to the content they want to highlight. Full details regarding sitemaps for mobile, images, news or video can be found at sitemaps.org., and if we want to build our own sitemaps we can also do it at sml-sitemaps.com.

XML

  • XML stands for Extensible Markup Language and defines encoding rules for documents. Basically, it’s a format that is read by both humans and machines, people and search engines. One of the advantages of XML is that it’s the most accepted format for sitemaps.
  • Another advantage is that XML has the goal of making things simple, improving the usability of the internet. This textual data format has a focus on documents, but it’s also widely used to represent data structures used on web services.
  • The one disadvantage XML has is that it can generate very large files because of the open and close tag for each element. Naturally, the larger the file, the bigger the chances some problems might happen.

RSS

  • RSS stands for Rich Site Summary, but it’s often enough called Really Simple Syndication, and it’s nothing more than a web feed that enables users to access standard online content. They are more widely known as news aggregators that check for new content, what is known as web syndication.
  • An obvious advantage of RSS is the automatic update of RSS sitemaps. As a disadvantage, they are seen as harder to manage precisely because they are always updating.

TXT

  • TXT is actually an easy one, as it stands for text, or more specifically, it’s a text file like the ones we see on a notepad. The advantage of this kind of sitemap is that is very easy, being composed by one URL per line. Each file supports up to 50.000 lines. The disadvantage is, also precisely because of the simplicity, that is not possible to add meta-data do the pages.

Robots.txt

Robots.txt is a file stored in the site’s root directory (let’s say http://casinoseu.net/robots.txt) that gives instructions to the web crawlers that come to our site to index it.

Webmasters have this tool to communicate with search engines. It’s basically a map that tells the bots where to crawl and where not to crawl.

The commands are given in the format of Disallow (what prevents bots from accessing certain pages, the compliant bots we mean), Sitemap (shows the location of the sitemap(s) existing on that site), and Crawl-Delay (that gives directions relative to the speed the bot can crawl the server). Exciting and simple stuff actually!

When mentioning bots, we are talking about the good bots that follow the guidelines from search engines. Apart from the search engines professionals, there are also people that have bad intentions that make bots that not only not to follow those directions or protocols, but also have as a goal to scrape information from our site, like for example e-mail addresses that they can use later for spam.

This is the reason why bots are only allowed to crawl certain sections of the site. Certain parts of the site, where sensitive information is stored, are not publicly accessible, and therefore not mentioned in the robots.txt file.

Meta-Robots

Meta robots have the purpose of keeping search engines out of the sensitive content area of the site. Those meta-tags tell them to keep out. They are usually placed in the header of HTML documents, meaning they are instructions for each page.

Meta robots have several instructions to give to robots that can be generic for all the search engines, or specific to just some of them. For example, meta-robots can tell the bots from a search engine noindex nofollow what means that the particular page should not be indexed, and the links on that page should not be followed.

Pretty cool what is happening behind the pages we visit and what makes search engines work, all this hidden communication.

As mentioned above, there are robot meta-tags that work for all search engines, like index, no follow, archive, and so on, but there are also specific meta-tags for each search engine, like for example no dir (to Yahoo Directory) and snippet (specifically for Google). This is the language spoken between webmasters and search engine engineers, between sites and search engines.

Rel=”Nofollow”

On chapter 7 of this guide, we saw that links work pretty much as votes in a popularity contest or an election. They help to build trust on the site. You can refer to that chapter for more information on this.

Rel=nofollow basically allows a certain resource to be linked, but the vote is removed, meaning that it doesn’t count. The reason for this is that certain links have less value, and this attribute is precisely telling that to the search engine so that the votes don’t count, even if the new pages are crawled and indexed.

Rel=”canonical”

Rel=”canonical” is a very important element to communicate to the search engines that there are duplicate pages on our site, they just have different URLs.

Search engines see those copies as separate pages, and we know identical content is worth less to them than original content, so it’s important to communicate to bots which page is the version that should count for indexation, for ranking.

Search Engines: The Tools

We have already talked about the reciprocal relation between sites and search engines, and another proof of this is the fact that search engines do provide tools for webmasters to work on their SEO, on their search results. Naturally, search engines can only provide the tools for webmasters to work, it’s not their responsibility to make SEO, and we also encourage webmasters and marketers to learn SEO on their own apart from the tools provided by the search engines.

Google Search Console

The Google Search Console has some Key Features. One of these features is the Geographic Target. All sites have certain users as targets, and when the webmaster is targeting users from a specific location they can also give Google that information.

By making this, it will be easier to determine how well the site appears for the searches in that specific country, and also how to improve the results for that geographical area.

Another key feature from the Google Search Console is the Preferred Domain. This domain is used by webmasters to index the site’s pages, and there is an indication to Google on that sense.

URL Parameters (site parameters to help Google crawl more efficiently), Crawl Rate (the speed that the robots from Google have during the crawl process), and Malware (If malware is found, Google will let us know. Not only malware is a terrible user experience, but it can also hurt the site’s ranking) are some other key features from the Google Console.

The last key features from this console are Crawl Errors (if the bots find 404 errors or other significant ones, we get a report with that information), and HTML Suggestions (unfriendly HTML code is also reported to the webmaster so that it can be fixed).

There are other services provided by the Google Search Console, and one of them are the statistics given by a feature called Your Site on the Web. These statistics offer webmasters insights (click-through rates, linking statistics, keyword impressions, best search results pages, and so on) that allow them to understand better the results of their SEO on Google.

When moving the site domain, Google offers us something called Site Configuration that allows us easily to submit robot.txt files, sitemaps, requests of change of address, as well as site links adjustment.

+1Metrics is nothing more than Google’s efforts to get everyone more involved with their social network, Google+. +1Metrics will actually annotate the sharing of links on Google+ in the search results what obviously benefits the ranking of our site. This section is all about explaining the effects of +1 sharing on our site’s search results performance.

Google Search Console has another section called Labs. Like the name says, here we find experimental, but even so useful reports. Among those reports, we have one called Site Performance that indicates the loading times for visitors, something that certainly determines the quality of the usability.

Bing Webmaster Tools

Bing doesn’t have a Search Console. Instead calls it Webmaster Tools, and also here we find some Key Features that include Sites Overview (it’s, in fact, an overview of the performance of all our sites on Bing search results, including metrics like indexed pages, clicks, pages crawled per site, and impressions), and Crawl Stats (where we can view reports the number crawled pages, but also on the errors found on those pages. Similarly to what happens with the Google Console, sitemaps can be submitted here to help Bing explore organize our content).

The Index is where webmasters can understand how Bing makes indexing of their pages, how Bing organizes our content, add/remove URLs from search results, and adjust other parameters. Traffic is where data reports exist. These reports combine the data from Bing and Yahoo! Searches, indicating the site average position and costs for those wanting to buy ads for specific keywords.

Given all the advantages we get from doing SEO, as well as all the available tools, there is really no excuse for our sites do not appear at the top of the search results.