Search Engine Optimisation

Some more on that guardian.co.uk duplicate content

I’m still trying to figure out what is going on here and why the Guardian is publishing all this duplicate content. As a major UK publisher, it appears to be on its own - nobody else seems to be doing this.

OK, here are some facts about it:

The peak for all this appears to be early in the morning and mainly involves content that has appeared in the newspaper overnight. Most of this morning’s stories appearing on guardian.co.uk portal seem to have been published an average of seven times. From mid morning onwards, the average falls back to two or three.

The most heavily duplicated stories tend to be listed as main articles on Google News in the morning.

Duplicated articles carry the same Hitbox code - Hitbox is the statistics provider which tracks the usage of guardian.co.uk’s content and is used to determine the number of uniques, page views etc. Here’s the code on seven versions of the same story about Israel planning to free Palestinian prisoners:

What is going on here? Is this just a story taken from an RSS feed and duplicated repeatedly because of some quirk in the Guardian’s CMS? Or is it something more calculated, a tactic that has something to do with how a story is displayed in Google News? I wonder about the latter because Google News is not nearly so fussy about duplicate content as the main Google search and the higher frequency of guardian.co.uk’s republishing early in the morning seems to affect Google News rankings.

Perhaps somebody from guardian.co.uk would care to clear all this up?

some posts that may be related

1 Comment

Your view

Add your comment below, or trackback from your own site.

Subscribe to these comments.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

*Required Fields