How Social News Sites Work… Probably!

Just Posted: Apple Likely to Give Education Rather than Resolution for iPhone 4 Antenna Issues

A lot of people use social news sites like Digg, Reddit, DZone, HN, Slashdot blah blah blah, and I mean a lot! The basic premise of these sites is to allow users to contribute by submitting news articles, and the community then votes these up (or down in some cases), promoting the articles with the hope that more votes equal a quality article.

digg-logo-heart-lg1jpg

The holy grail for submitters like myself of these articles is usually to get onto the homepage of that site, giving your article maximum exposure. “The Digg Effect” for example is a well known term which ultimately means your article got onto the homepage, andHammertime ensued… Hammertime being a few thousand hits in a very short space of time. Back in the early days of Digg (arguably, when content was of a little more quality than nowdays) it wasn’t unusual for a article to get promoted to the homepage, and The Digg Effectbringing the hosting server to its knees.

Nowdays though, The Digg Effect has little impact on the sites that make it to the Digg homepage. Most articles are from well established web properties, usually with a hosting backbone to support Hammertime with scope for more, yet people persist in submitting their articles to the likes of Digg with the vague hope of some homepage action. Digg are now starting to goto lengths to try and facilitate getting this kind of content back onto the homepage, so I thought I’d explain here how Digg and similar sites probably do this.

Probably?

I say probably simply because it’s usually a tight lipped secret as to how any of these sites push content to the homepage based on user interactions. Why? So that people cannot game it. As soon as people learn how a site can get to the homepage, they can manipulate that (as people did in the early days of Digg), which meant that more and more mitigating actions had to be taken to stop it, ultimately making it so hard for good original content to get on the homepage that Digg ended up as it currently is.

Anyway, social news sites as I said work on the basic premise that user interactions with the content give weight to how much quality an article is or isn’t. These interactions could be one or many of the following (I’ll go into detail on some further down):

  1. Voting up or down against an article, giving it a vote score
  2. Number of click-throughs an article achieved
  3. Number of comments an article received
  4. Voting up or down against said comments in (3)
  5. Time since article was submitted
  6. Number of times an article’s page with comments was viewed
  7. The power of the user doing the voting, commenting, clicking

Voting

Voting is prevalent in all of the social news sites. Most implement a vote up or down scenario, however I have to say that I am totally against the down vote, at least until perhaps an article has a couple of up votes. Example: I submit an article to Reddit, and some cowboy comes along and decides he already hates my stuff, and votes me down instantly. All other potential readers see a nice big 0 and steer clear.

This scenario for me is pretty lame – and it happens, even to me on content I know is quality because other places are promoting it. The instant that article hits 0 its like a closed door – goodbye users. If you ask me, you should be able to vote down, but to the point of hitting 1 and that’s it. Don’t let one person write off potentially great content because they’re having a bad day…

Click-Throughs

Click-throughs are a difficult one. Whilst they are definitely valuable to see if an article is attracting coverage, it’s easily manipulated, so measures have to be taken to ensure tight lock-down on this functionality as a measure. For me, click-throughs should only count from registered users, and only be recorded once per user. Unregistered is a bit more of a game, you could do it if you’re happy that IP logging/locking out is a good enough step, but it’ll probably get messy to the determined.

Comments

Comments is probably a small driver for measuring the quality of content. You could easily have a scenario where an article is all negative because the article is lame, so not necessarily a great measure of quality. Again, voting up and down against comments too could also be used as a factor against an article.

User Power

The last factor which was (and might still be) used at Digg is the distinguished “power” of that user. Example, a long-time user who submits alot of articles, writes a lot of comments, and has a large number of up votes against his/her name would perhaps be someone that you could say has a heavier impact on the promotion of content through the site. They are a trusted source of quality (you know that by the number of up votes), so if they’re voting for something, it may be a good KPI!

Mix it up

Ultimately though, all of these are great but cannot be used on their own or on equal measure. Sites like Digg have built complex algorithms over time which take a bit of this, and a bit of that, whilst setting some key rules in place to measure the quality of an article.

Whilst the end users see the number of up votes, or Diggs or whatever, the reality is that those aren’t the main driver for this content being pushed to the homepage. Each user interaction likely holds it’s own “weight”. Therefore each article actually has:

  1. Number of votes
  2. Actual Score

The “Actual Score” being a value which is calculated by the weight of each user interaction. So Votes Up could have a weight of 1.o, Votes down 0.8, click-throughs at 0.2, comments against an article at 0.1 etc etc – you get the idea. So actually, the score of an article is probably quite different to the actual number of votes you see as an end user, something like 10 Votes Vs 13.6 score for example.

It’s the score that is measured by the back-end. Thinking about it, you could even measure the score against votes too… you can see it can get quite complex!

Over time (as in point (5) I made above), this score probably decays – as well as the weight score of a vote since submit time. On most sites that have a “Number 1″ slot, you have to have this, otherwise some really popular articles would never disappear. So what they likely have is processes running in the background every x minutes, analyzing the scores of each article and dropping them down by y amount so that the charting system can work. This will mean that articles with 2000 Up votes will have their score depreciated and slowly drop down the list into nothing.

Conclusion

Anyway – I can’t profess to knowing it all on this subject, a lot of it is subjective and as I said earlier, a bit of a secret. I thought it would be interesting to map it out so that people can understand how social news sites work. Things aren’t as obvious as they seem, that’s why you might see an article with only 2 votes on the homepage, because it’s probably been up voted within a short space of time by a power user and had quite a few click-throughs already. Or maybe not :) Who knows….

Trackbacks/Pingbacks

Leave a Reply