When you enter any word in Google’s search engine, even though the result might not be spot on at the first go, there is a whole set of tools and processes designed to find the information you want. They start working long before you enter your search word, and they finish when the results are displayed on screen.
A key part of this software is Googlebot, a system of Web ‘spiders’ or ‘crawlers’ that scan the Internet continuously in search of new pages to add to the immense library from which it draws those that best match your search. Its role is to include new references, update any changes and delete obsolete links.
To carry out their job, Google’s digital arachnids access and analyze the content of websites in just a few seconds. Theoretically, there’s nothing suspicious about any of this, but what if Google’s robot spiders are really imposters?
According to a study of Googlebot, more than 50 million fake crawlers visit 10,000 websites every month, and the investigation has revealed that 4% of all of these are not what they claim to be.
Moreover, out of all these imposters, some 23.5% are being used by hackers to carry out denial of service attacks (DDoS). Thanks to this, the fake spiders can access the servers hosting the Web files through the same port as the legitimate ones.
As with everything that circulates on the Web, these crawlers (whether they’re good or bad) enter the Web servers through a connection with a certain bandwidth.
The difference is that Google’s spiders access folders and files taking care not to saturate the service, while those responsible for DDos attacks do just the opposite: sending large amounts of data over a short period of time to use the server’s full data transmission capacity and cause it to crash.
This recent research is just one indication that these types of attacks are becoming very common on the Web. One reason for this is that if the creator of a Web page wants to have any kind of impact on the Internet, it is impossible to avoid Google’s Web crawlers. If these spiders can’t access the content of a website, it will no longer be indexed in the search engine, and, as we have mentioned before, “if you’re not in Google, you don’t exist”.
If however a webmaster would still prefer to avoid such intrusion, it’s possible to do so using the file robots.txt. By saving this in the site’s main directory, you can block access to Google’s Web crawlers, though of course you become practically invisible on the Web.
It is also true however, that there are now an increasing amount of security tools that can identify genuine Google crawlers by cross-checking the source IP address –a set of numbers that act as a kind of ID card for each computer- of the crawler. This helps establish whether it has really come from Google or if it is an attack from an unknown source.