Preventing Duplicate Content in Search Engines With WordPress

  • Buffer
  • Sharebar
  • Buffer

I was recently looking at Google’s Webmaster Tools, and more specifically the robots.txt section. As I reviewed the robots.txt for my blog, I thought about whether I can make it better. For those who aren’t familiar with the file, it provides information to web robots about what files/directories they shouldn’t access. Search engines use web robots to add pages to their search results.

As I was reviewing the robots.txt file, I was thinking about a more efficient method of preventing duplicate content from appearing in the search results without having to modify the robots.txt file. After performing a search I found a good way of handling it within WordPress.

Preventing Duplicate Content

wordpress logo Preventing Duplicate Content in Search Engines With Wordpress

One of the concerns of many who manage web sites/blogs is preventing duplicate content from appearing in the search engine results. For WordPress blogs, the content of a post can appear in the actual post, a category listing, or an archive listing on the blog. For those concerned with duplication issues, they would only like the actual post content to be listed, and not other pages that may contain that post.

As mentioned earlier, editing the robots.txt file is one method of preventing the duplicate content. You can simply disallow the web robots from indexing the category or archive pages. If you add a new date to the archive, you will need to remember to edit the robots.txt file. A better solution is to have WordPress do this for you.

Besides the robots.txt file, search engine web robots also use specific meta tags, that are located in the head section of a webpage, to determine which pages they can index. By editing the header.php file of your WordPress template, you can allow only specific pages to be indexed by search engines.

Here is the code:

<?php if(is_single() || is_page() || is_home()) { ?>    
<meta name="googlebot" content="index,follow" />    
<meta name="robots" content="index,follow" />    
<meta name="msnbot" content="index,follow" />

<?php } else { ?>    
<meta name="googlebot" content="noindex,follow" />    
<meta name="robots" content="noindex,follow" />    
<meta name="msnbot" content="noindex,follow" />

<?php }?>

As you can see, an if statement is used to determine the page type. In the above case, the if statement checks to see if the current page is either of the following:

  1. A single post page.
  2. A static web page.
  3. The home page of the blog.

If current page matches any of the above page types, then the HTML meta tags shown in green are written to the head section of the web page. The meta tags in green indicate that the page should be indexed and all the links should be followed. You can see this in the content attribute. The name attribute indicates the web robot. The name "robots" is a generic catch-all value.

For pages that don’t meet the above criteria, such as the category and archives pages, the HTML code indicated in red is applied to the web page. In this case, the web robots are told to not index the page, but still follow the links.

After updating the header.php file with the above code, I verified the meta tag values on various pages. Static pages, post pages, and the home page all indicated that they should be indexed. All other pages told the robots to not index the page.

Making a simple change like this allows me to control which pages are indexed by search engines, while at the same time prevents duplicate content from my blog from appearing in the search results.

For Blogger users, I’ll write a post in the future that shows how to accomplish the same task with Blogger.

You may also like:

Wordpress Logo

Creating Printable Post Pages in WordPress

No thumbnail image

How to View Your Web Site as Search Engines See It

No thumbnail image

Finding Your Blogger Blog’s Sitemap

No thumbnail image

Preventing Directory Browsing with .htaccess

4 people had something to say about “Preventing Duplicate Content in Search Engines With WordPress”:

Comments


  1. Thanks great post.


  2. thanks, perfect it worked for me..


  3. Thanks it worked. :) I will visit again. Thank you
    Ayush Gupta recently posted…Download files from file hosting sites with great speedMy Profile


  4. Thanks for sharing this. It’s a very interesting article. Going to be a very useful one i believe.
    Andrew Walker recently posted…Five Great Cities to watch hockeyMy Profile

Do you have something to say? Let everyone know!

Commenting policy: All comments are moderated for spam. You must use your real name and not your website name or keywords. If a comment is deemed to be spam, then it will be deleted or edited. Links to your website within the comment body is not permitted, but you are free to use CommentLuv to add a link to your latest post. If you wish to add a link to your website, you can always contact me about submitting a guest post.






CommentLuv badge
This blog uses premium CommentLuv which allows you to put your keywords with your name if you have had 5 approved comments. Use your real name and then @ your keywords (maximum of 3)

Previous Post:

Next Post: