Duplicate Content and SEO: The Complete Guide

Technical SEO

  • SEO Marketing Hub 2.0
  • Technical SEO
  • Duplicate Content
Robots.txt
Link Building for SEO

Duplicate Content

What Is Duplicate Content?

Duplicate content is content that’s similar or exact copies of content on other websites or on different pages on the same website. Having large amounts of duplicate content on a website can negatively impact Google rankings.

In other words:

Duplicate content is content that’s word-for-word the same as content that appears on another page.

Duplicate content

But “Duplicate Content” also applies to content that’s similar to other content… even if it’s slightly rewritten.

Similar content

How Does Duplicate Content Impact SEO?

In general, Google doesn’t want to rank pages with duplicate content.

In fact, Google states that:

“Google tries hard to index and show pages with distinct information”.

So if you have pages on your site WITHOUT distinct information, it can hurt your search engine rankings.

Specifically, here are the three main issues that sites with lots of duplicate content run into.

Less Organic Traffic: This is pretty straightforward. Google doesn’t want to rank pages that uses content that’s copied from other pages in Google’s index.

(Including pages on your own website)

For example, let’s say that you have three pages on your site with similar content.

Three pages with similar content

Google isn’t sure which page is the “original”. So all three pages will struggle to rank.

Three duplicate pages will all struggle to rank

Penalty (Extremely Rare): Google has said that duplicate content can lead to a penalty or complete deindexing of a website.

Rare penalty

However, this is super rare. And it’s only done in cases where a site is purposely scraping or copying content from other sites.

Super rare penalty

So if you have a bunch of duplicate pages on your site, you probably don’t need to worry about a “duplicate content penalty.”

Fewer Indexed Pages: This is especially important for websites with lots of pages (like ecommerce sites).

Sometimes Google doesn’t just downrank duplicate content. It actually refuses to index it.

Google refuses to index pages with duplicate content

So if you have pages on your site that aren’t getting indexed, it could be because your crawl budget is wasted on duplicate content.

Best Practices

Watch For Same Content on Different URLs

This is the most common reason that duplicate content issues pop up.

For example, let’s say that you run an ecommerce site.

And you have a product page that sells t-shirts.

Example product page

If everything is setup right, every size and color of that t-shirt will still be on the same URL.

Good example where all variations are on one URL

But sometimes you’ll find that your site creates a new URL for every different version of your product… which results in THOUSANDS of duplicate content pages.

Bad example where all variations have own URL

Another example:

If your site has a search function, those search result pages can get indexed too. Again, this can easily add 1,000+ pages to your site. All of which contain duplicate content.

Check Indexed Pages

One of the easiest ways to find duplicate content is to take a look at the number of pages from your site that are indexed in Google.

You can do this by searching for site:example.com in Google.

Backlinko index search

Or check out your indexed pages in the Google Search Console.

Google Search Console – Indexed

Either way, this number should line up with the amount of pages that you manually created.

For example, Backlinko has 112 pages indexed:

Backlinko – Indexed Pages SERP

Which is the amount of pages that we made.

If that number was 16,000 or 160,000 we’d know that lots of pages were getting added automatically. And those pages would likely contain significant amounts of duplicate content.

Make Sure Your Site Redirects Correctly

Sometimes you don’t just have multiple versions of the same page… but of the same SITE.

Although rare, I’ve seen it happen in the wild many times.

This issue crops up when the “WWW” version of your website doesn’t redirect to the “non-WWW” version.

(Or vice versa)

This can also happen if you switched your site over to HTTPS… and didn’t redirect the HTTP site.

In short: all the different versions of your site should end up on the same place.

All versions of your site should end up in the same place

Use 301 Redirects

301 redirects are the easiest way to fix duplicate content issues on your site.

(Besides deleting pages altogether)

So if you found a bunch of duplicate content pages on your site, redirect them back to the original.

Use 301 redirects to fix duplicate content issues

Once Googlebot stops by, it will process the redirect and ONLY index the original content.

(Which can help that original page start to rank)

Keep An Eye Out For Similar Content

Duplicate content doesn’t ONLY mean content that’s copied word-for-word from somewhere else.

In fact, Google defines duplicate content as:

Match content or similar

So even if your content is technically different than what’s out there, you can still run into duplicate content problems.

This isn’t an issue for most sites. Most sites have a few dozen pages. And they write unique stuff for every page.

But there are cases where “similar” duplicate content can crop up.

For example, let’s say you run a website that teaches people how to speak French.

And you serve the greater Boston area.

Well, you might have one services page optimized around the keyword: “Learn French Boston”.

Optimized around the keyword "Learn French Boston"

And another page that’s trying to rank for “Learn French Cambridge”.

Optimized around the keyword "Learn French Cambridge"

Sometimes the content will technically be different. For example, one page has a location listed for the Boston location. And the other page has the Cambridge address.

But for the most part, the content is super similar.

Technically different pages may have very similar content

That’s technically duplicate content.

Is it a pain to write 100% unique content for every page on your site? Yup. But if you’re serious about ranking every page on your site, it’s a must.

Use the Canonical Tag

The rel=canonical tag tells search engines:

“Yes, we have a bunch of pages with duplicate content. But THIS page is the original. You can ignore the rest”.

Use a canonical tag to differentiate between duplicate and original pages

Google has said that a canonical tag is better than blocking pages with duplicate content.

Canonical tag

(For example, blocking Googlebot using robots.txt or with a noindex tag in your web page HTML)

So if you find a bunch of pages on your site with duplicate content you want to either:

  • Delete them
  • Redirect them
  • Use the canonical tag

Use a Tool

There are a handful of SEO tools that have features designed to spot duplicate content.

For example, Siteliner scans your website for pages that contain lots of duplicate content.

Siteliner

Consolidate Pages

Like I mentioned, if you have lots of pages with straight up duplicate content, you probably want to redirect them to one page.

(Or use the canonical tag)

But what if you have pages with similar content?

Well, you can grind out unique content for every page… OR consolidate them into one mega page.

For example, let’s say that you have 3 blog posts on your site that are technically different… but the content is pretty much the same.

Different But Similar Blog Posts

You can combine those 3 posts into one amazing blog post that’s 100% unique.

Combine Similar Posts Into One Unique

Because you removed some duplicate content from your site, that page should rank better than the other 3 pages combined.

Noindex WordPress Tag or Category Pages

If you use WordPress you might have noticed that it automatically generates tag and category pages.

WordPress tag name

These pages are HUGE sources of duplicate content.

So they’re useful to users, I recommend adding the “noindex” tag to these pages. That way, they can exist without search engines indexing them.

You can also set things in WordPress up so these pages don’t get generated at all.

Learn More

How does Google handle duplicate content?: A video from Google’s Matt Cutts on how Google views duplicate content.

The myth of the duplicate content penalty: This post outlines why most people don’t need to worry about a “duplicate content penalty”.

Next Link Building for SEO
Previous Robots.txt
Next Link Building for SEO
More Topics
All Topics
8 ResourcesSEO Fundamentals
4 ResourcesKeyword Research Strategies
8 ResourcesContent Optimization Strategies