Since many are still confused about what spider traps is, in the section below we will help you understand and identify spider traps that often occur in the past and then advise on how to address the problem. The issues below may be difficult to identify and fix especially on large sites, but there are still solutions for this issue. Hence, if you are working in SEO service, web developer or people who wish to maximize their search engine performance, we recommend this article for you.
Keyword Search Spider Trap
Generally, a website would have a search function, but many developers often forget that these kinds of pages will likely to not be crawled and indexed by search engines. As a result, it causes the worst spider trap issues as it allows others to easily add index-able contents to your website without even being logged in.
The example above shows how information is displayed on the page and how a unique URL is generated. Therefore, you can manipulate the data in the URL to allow Google to crawl and index the page like what can be seen in the example below.
Even though Google doesn’t have to worry about search engine rankings, these things really bother smaller business who cares about their rankings on the SERPs. Not to mention how high the risk of someone with malicious intent could exploit search engine rankings.
Besides being indexed intentionally as a result of this spider trap, you can also index your pages automatically by search engines without any human intervention.
How to Identify Keyword Search Spider Traps
When you are auditing an established website with a keyword search function, there is a high possibility that this problem may already exist. To identify a keyword search spider trap easily, you can use Google search operators.
There are several things that you have to identify during a site audit, such as whether a website’s search function generates unique URLS and identify if there is any common character or phrase that is included in that URL.
For example, after a search is carried out, the word ‘search’ may appear in a URL, so, the result that you type into Google will look like this ‘site:websiteaddress.com inrul:search’.
If you aren’t satisfied with the result or if you are still feeling skeptical that there may still be an issue, you can discover the result with Google Webmaster Tools to try and index a search result page.
How to Fix a Keyword Search Spider Trap
Fortunately, in many cases, spider trap is easy to fix and the techniques below will help you fix it:
- Add noindex, nofollow meta data to search result pages to get the site re-crawled. By applying this method, hopefully you can see the results from the search engine result pages. You also have the option of manually removing pages via Google Webmaster Tools.
- To prevent further crawling, once the site has been re-crawled and the offending pages have dropped out of the index, you can block the pages via Robots.txt.
Dynamically Inserted Content Spider Trap
When you visit a URL that should be 404 but returns a 200 OK status code, then it is likely that you have a dynamically inserted content spider trap. There are so many reasons why this could happen and this could often be an oversight during the development process.
How to Identify a Dynamically Inserted Content Spider Trap
Another requirement that will cause spider trap is when you are working on a website that dynamically populates content based on the URL path. For instance, you may come across websites that have extremely similar (boilerplate) content across a section of the website.
Here is an example of how it might look:
Website URL1: www.websiteaddress.com/shop/SEO/t-shirts
Content example: Shop online today for a range of SEO t-shirts at incredibly low prices.
Website URL2: www.websiteaddress.com/shop/PPC/t-shirts
Content example: Shop online today for a range of PPC t-shirts at incredibly low prices.
You can change the URL to whatever you want and the information in the URL will be dynamically inserted into the content area.
How to Fix a Dynamically Inserted Content Spider Trap
Unfortunately, you may need a skilled developer to fix this. Generally, you surely want to be able to do specifying URL that you want to return a 200 status code, make sure that any pages that shouldn’t exist return a 404.
Product Category Filter Spider Traps
Spider trap is common on e-commerce websites which surely causes many problems commonly found for websites, especially in a search engine ranking perspective. For instance, spider traps can result in search engine spiders crawling and indexing a huge volume of duplicate and near duplicate pages which may waste crawl equity and dilute the amount of authority directed to important pages.
How to Identify Product Category Filter Spider Traps
Your website will likely have spider traps if your site has any type of filtering option that changes the page URL. Here is an example on the John Lewis website:
You can see how the ‘size’ filtering option is included in the URL which is crawl-able and has been indexed by Google. In the example below, the ‘international’ filter has also been included in the URL.
To identify this issue, you have to know how URLs are formed during the filtering process typed ‘site:johnlewis.com inurl:size=10 inurl:size=11’ into Google. Therefore, there will be a huge number of URL variations to be indexed by Google which could have detrimental effect on the website’s rankings.
How to Fix Product Category Filter Spider Traps
One thing that you should know is that fixing the problem could actually cause you to drop some rankings. Therefore, before measuring the volume of traffic landing on filtered pages, you have to conduct a careful research.
Here are some options for fixing this problem:
- The first option comes from Google which suggests that you do canonicalization on these pages using the real canonical tag. However, sometimes some people find that not working and actually being ignored by Google especially on e-commerce sites. Maybe, this is because the mixed signals being presented to Google such as large volumes of internal links.
- Using noindex, this method has worked for many people, but you have to bear in mind that if you have a large site, then the pages may still be crawled with crawl equity.
- The last option is to use Robots.txt, but you will see the URLs may still appear in the index. In fact, blocking in Robots.txt in the past had been a quick fix which resulted in positive results but it seems Google has changed their opinion on blocking pages.
Fixing spider traps may seem complex, especially on a well-established website. Therefore, it is best to tackle this during the development stage. However, if you can fix it, the rewards will never be disappointing, for it can improve your search performance.