Interesting Information About UserAgent Distribution

I was doing some work on the statzen database tonight when I needed to look at the frequency of UserAgent recurrence.

For the less technical readers out there, UserAgent strings are how browsers (or any HTTP clients) identify themselves to web servers. There are only a handful of common browsers (IE, Firefox, Safari, Opera, etc). The UserAgent string also usually contains some information about your operating system (Windows, OSX, Linux, etc). The browser and operating systems each declare their version (major and minor version numbers). Still, we are not talking about too many possibilities here. Then various plugins and toolbars add a little something else to the UserAgent string. At the end of the day you have a distribution that looks like this:

frequency sessions useragents
10,000 + 151698 7
1,000 - 10,000 178073 74
100 - 999 105586 368
10 - 99 48458 1705
< 10 38545 22436
total: 522,360 24,590

That should be a fairly representative random sample. Less than 100 unique UserAgent strings (smaller than .5 percent of the UserAgents in my sample) make up over 60% of the sessions. Whereas 90% of the UserAgent strings are so rare that they combine to make up 7% of the sessions.

The sample data is just looking at people reading blog posts, but it does not include feed bots from aggregators (which would exaggerate this even more).

In case you are curious, here are those 7 most prevalent UserAgent strings:

  • Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
  • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
  • Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11
  • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
  • Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9
  • Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)
  • Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)

Yahoo! seems to have a real thirst for content.

Also, some of the data in this pool is older than IE7.

On a related note this probably means I will get around to rolling another release of my ParseUserAgent Ruby Gem.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*