If you read SEO guides for blogging, you inevitably come across the term “duplicate content.”
For years, this term has been tossed around by bloggers and digital marketers alike. It’s also a term that’s frightening since it’s often associated with Google, our traffic overlord, penalizing blogs and tanking organic traffic.
But duplicate content and its potential SEO implications are very misunderstood. In reality, there is a time and a place for duplicate content on the internet, provided it’s implemented correctly.
Time to learn exactly what is duplicate content, when it can harm your SEO efforts, and how to fix any duplicate content issues your blog has.
What Duplicate Content is NOT
Let's start with what duplicate content is not.
It's important to know that you do not create duplicate content when you write about the same topic. Let's say you have a post on your site called “10 Tips to Get Your Kids Ready for School” and you get an opportunity to write a guest post on how to get your kids ready for school.
Will this new article be duplicate content? No, not if you write a fresh post — even if you use basically the same tips. If you start from a blank page and write a new post it will not be duplicate content.
It would only be duplicate content if you copied and pasted your existing blog post as your guest post.
What Is Duplicate Content?
Duplicate content is content that’s identical or nearly identical to other content on the web. This can involve smaller blocks of content or even entire web pages that are duplicates of one another.
There are also two types of duplicate content:
- Internal: This means a single domain, like your blog, has multiple URLs where page content is identical or basically the same.
- External: When two or more domains have duplicate content
What’s Considered Duplicate Content By Google?
According to Google’s duplicate content guidelines, “Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin.”
There are two takeaways here.
The first is the word “substantive.” Duplicate content isn’t just a few identical words or a sentence; it has to be a significant amount of exact or near-exact duplicate content. And it certainly isn't content that just happens to be on the same topic.
Secondly, Google states most duplicate content isn’t deceptive. I’m going to get into how and why duplicate content exists, but the bottom line is, you don’t have to worry about a duplicate content penalty from Google unless you’re gaming the system.
Why Duplicate Content Hurts Your Website
You don’t have to worry about a direct Google penalty for the majority of duplicate content issues.
But, internal and external duplicate content can still hurt your blog traffic and SEO efforts, so it’s important to understand the potential risks.
1. Confuses Search Engines
When Google identifies duplicate content, it decides which unique URL it ranks on its search engine.
But it’s often difficult to tell what the original source of content is or what page you want to rank in the first place. Ultimately, this means your pages tend to rank and perform worse since Google doesn’t know how to treat your content.
This can also result in fewer indexed pages; Google sometimes decides to not even rank pages with duplicate content at all, meaning you can’t get organic traffic to impacted pages.
The last thing Google wants is to rank a scam website that is just scraping legitimate content.
2. Eats Up Your Crawl Budget
Another SEO term you hear when tackling the topic of what is duplicate content is your crawl budget.
According to Google, crawl budget is “the set of URLs that Googlebot can and wants to crawl.” In layman's terms, your crawl budget is how many URLs on your website Google indexes within a period of time.
Most bloggers don’t need to worry about their crawl budget. This is because Google usually identifies and indexes new content on your site quickly. If you published a blog post and it gets organic traffic and impressions a few days later, this is why.
But if you have an immense amount of duplicate content, your unique content can index more slowly since Google has to waste time looking through duplicate content.
3. Backlink Dilution
Another risk of having duplicate content on your website is diluting your backlinks.
Backlinks are a major ranking factor. So, if you want an article to rank well, you need stellar content that matches user intent and high-quality backlinks for that URL.
But problems occur if your website has duplicate content and other webmasters don’t know which page to link to. If two pages with duplicate content both start getting backlinks, they’re essentially cannibalizing one another’s efforts to gain authority.
This is backlink dilution since one of the pages could theoretically have every single backlink instead of splitting them.
Related: How to Write the Perfect Blog Post
4. Scraped and Syndicated Content Issues
Scraped content is when a website posts content from other websites on their own site. This is common for sites that aggregate news for certain blog niches.
Syndicated content is similar. With syndicated content, you repost your content on other websites. Some bloggers repost guest posts they write on their own blog, as an example.
In most cases, scraped and syndicated content don’t hurt. Google is smart enough to ignore scraped content as a ranking factor. Similarly, with syndicated content, you add a canonical tag to the duplicate content that tells Google “hey, this is duplicate content, don’t rank it and make sure you rank the original content instead.”
But in rare cases, scraped content can outrank your content if Google makes a mistake. Additionally, if you forget to add a canonical tag to syndicated content, Google has no idea what’s original and what’s syndicated.
For example, my friend Tom syndicates some of his blog content on Medium.com, which lets you import stories and adds a canonical tag to keep things tidy.
But after making an importing mistake, Medium started ranking the duplicate content, stealing organic traffic from Tom’s blog. Here's a screenshot he sen me:
Again, this is quite rare, but it’s a risk of duplicate content to be aware of.
5. Google Duplicate Content Penalties?
As Google states, you shouldn’t worry about a duplicate content penalty unless you’re trying to game the system.
But it’s always better to be safe than sorry. Besides, removing duplicate content from your website typically means a better user experience and better organic performance anyways.
Just note, if you somehow get dinged for duplicate content, you can submit a reconsideration request to have Google review its penalty.
Common Causes of Duplicate Content (& How to Fix Them)
So we know that duplicate content can hurt your indexing, ranking efforts, and how much organic traffic you get.
But what causes duplicate content in the first place, and how can you fix it?
1. Tags and Categories
Organizing posts with tags and categories can cause duplicate content, especially for beginner bloggers.
For example, let’s say your blog had two URLs like this:
- https://www.mysuperhealthyblog.com/tag/vegan/
- https://www.mysuperhealthyblog.com/category/desserts/
You publish a vegan brownie recipe in the desserts category. You also add the vegan tag. Since your site doesn’t have much content, both of these relatively empty pages now display the same link and snippet about your brownie recipe. This is duplicate content.
The fix: Don’t use tags on your blog or noindex tags and category pages you don’t want ranking.
2. Poor Web Server Configuration
There are several variations of a website’s URL, including:
- https://yourblog.com
- https://www.yourblog.com
- http://www.yourblog.com
- http://yourblog.com
This is normal, but you want to redirect the versions you don’t want readers using to the main URL version. Otherwise, people can access your content from unique URLs, which is duplicate content.
The fix: Redirection. Adding a SSL certificate with your host usually redirects HTTP to HTTPS.
3. Image URLs
Some WordPress plugins, like Yoast, let you automatically create pages for image attachments.
In other words, you can create unique pages on your blog whenever you upload an image to your blog that display the image. Obviously, in most scenarios, this is useless for readers and duplicate content.
But Yoast had an image URL bug in 2018 that toggled this feature on for thousands of users, causing massive duplicate content issues.
As Yoast explains in its blog post, you need to ensure you redirect attachment URLs to the attachment in your settings:
The fix: Check your CMS’ settings to ensure you aren’t creating image attachment pages.
4. Comment Pagination
Another culprit of duplicate content is comment pagination, or creating unique pages for different comment sections.
Usually, this looks like:
- https://www.yourblogname.com/yourpost/
- https://www.yourblogname.com/yourpost/comments-page-2
- https://www.yourblogname.com/yourpost/comments-page-3
And, while the comment section is different on each page, your article is still duplicate content.
The fix: You can usually disable comment pagination with your comment plugin or noindex extra pages.
5. URL Variations
Did you know that URLs are case sensitive?
This means the following URLs are duplicate content:
- https://www.yourblogname.com/about/
- https://www.yourblogname.com/About/
Now, while it’s unlikely you’ll publish two about pages with duplicate content, be cautious when adding internal links in your posts and make sure you use the right case!
6. “Friendly” URLs
Bloggers who implement AMP, or accelerated mobile pages, need to check that they aren’t accidentally creating duplicate content.
For example, you don’t want myblog.com/amp/postname to be indexed and accessible alongside myblog.com/postname.
The fix: Add a rel=amphtml tag to your non-amp and amp page to pair them. This AMP guide explains the process.
7. Product Descriptions
If you sell products on your blog, you naturally have pages with products and product descriptions.
However, if you aren’t careful, your product pages might be ripe with duplicate content. This can occur when you post your product in multiple locations on your website and use the same product description.
Alternatively, if you’re importing a product SKU from a third-party seller, like Amazon or AliBaba, you should write unique descriptions.
This last case is a major cause of duplicate content. I mean, if thousands of bloggers are all selling the same Amazon products, there’s a decent chance many of them simply import the product snippet from Amazon for their posts.
The fix: Always write your own product descriptions
8. URL Parameters
URL parameters are another cause of duplicate content, especially if you start email marketing or online advertising and try setting up parameters to track performance.
For example, many bloggers track the social media source visitors come from. In this case, URLs with duplicate content might look like:
- https://www.mysuperhealthyblog.com/brownie-recipe/
- https://www.mysuperhealthyblog.com/brownie-recipe/?utm_source=twitter
URL parameters are also used in other instances, like filtering price ranges, colors, sizes, and adding search terms to URLs.
According to Search Engine Journal, one fix is to add canonical tags to parameter versions of URLs to point to the original base URL.
The fix: Use canonical URLs.
How to Find Duplicate Content on Your Blog
One way to check if your blog has been copied by someone else is to use Copyscape.
Once you enter your URL, this free plagiarism checker scours the web to find duplicate content. If you find a copycat, you can file a takedown request with Google to remove the content.
You can also manually check articles by copy-pasting sections into Google Search and adding the “Verbatim” filter to only show exact-matching text.
I wouldn’t spend much time here unless you think someone is stealing your content since Google is smart enough to recognize that behavior.
Summary
Duplicate content issues sound scary, but the good news is, search engines are incredibly smart. You usually don’t have to worry about a manual penalty because of a few internal and external duplicate content issues.
Really, you should think about user experience first. What makes for a better reading experience, and how can you clean up your blog from empty pages, duplicate content, and articles that don’t really provide value?
If you focus on creating the best content possible, you can grow your blog. The technical stuff still matters, but in terms of duplicate content, I wouldn’t lose too much sleep.