Subscriber Numbers Are Meaningless part duex

I took a break from coding statzen this morning to read some blogs and I saw a post from Dave Winer talking about Yet Another Feedburner Problem. The gist of which is described here. Basically, you can send a single http request to Feedburner that will drastically inflate (or likely deflate) the subscriber count for a feed. The reason is that some web-based feed readers like Bloglines, MyYahoo, and others put the number of subscribers a request represents in the User Agent string of their bots. Forge that request (with whatever subscriber number you want) and the count changes.

If you look on the right of my blog you will see that I have a widget displaying (in part) the number of subscribers I have. That count is not coming from Feedburner but rather from statzen a service I am getting awfully close to launching some alpha users will get their passwords today). That number is calculated similarly to Feedburner’s and would suffer from the same problem (though there are some fairly easy fixes for the different ways this problem can manifest itself).

The thing is, subscriber numbers are meaningless as a single measure. I wrote about this a little last June in Why Feed Subscriber Numbers are Meaningless. There are some very interesting numbers that are related to subscriber count that would be great measures. A good example of that would be to look at subscriber numbers in relation to the average number of times an item in your feed is opened. Who cares how many people have subscribed to you in Bloglines or Google Reader if they never actually read anything you write.

Part of the problem is basing rankings on metrics that don’t really convey what they are claiming, and part of the problem is that any time a single metric is used for a linear ranking system (i.e. Top 100 lists), that metric is going to be easy to game.

Lots of people want to become “The Measure” for blogs. I honestly don’t think most people who are trying are looking at the right data. Then again, I am biased.

So, back to the Feedburner problem. Want to fix it? Here is how.

  • Don’t count subscriber numbers reported from a User Agent until the service and IPs of the bot have been validated by someone at the company doing the tracking.
  • Don’t trust subscriber numbers from a known bot at an unverified IP address.
  • If developing a web-based feed reader, fetch the feed once per subscriber, including the internal subscriber ID (or some other consistent but unique identifier for the subscriber) in the user agent.
  • Take subscriber numbers with a large grain of salt. There are lots of inactive subscribers in Bloglines, Google Reader, News Gator, etc. There is also lots of duplication across those services.