When you write something that connects to a web server, what user agent do you use?
Far too often have I seen things like:
curl_setopt($c, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
In my opinion you should always use a nice descriptive user agent that explains to the server exactly what your client may be trying to achieve, or at least a unique identifier. Unless you’re trying to achieve some kind of web scraping client (which probably contravenes some terms of service agreement somewhere, so I certainly don’t advocate that!), there is no reason not to provide a useful and descriptive UA string.
A good UA string from a little-known client should provide some way of contacting you. When I say little-known, I mean something like your new web app that you’ve just made that queries Last.fm for user data. In this instance, I’d give a nice descriptive UA string with contact e-mail, e.g.:
curl_setopt($c, CURLOPT_USERAGENT, "MyLastFmClient (v0.1) myemail@address.com");
As your client becomes more used, or you already have a decent way of contacting you on your website, perhaps just put a URL:
curl_setopt($c, CURLOPT_USERAGENT, "MyLastFmClient (v1.2) www.address.com");
Of course, when you’re Google for example, everyone knows who you are, so for example the UA string “Mediapartners-Google” yields 200k-odd results, revealing that this is the AdSense content bot.
Why do I think this is important? It helps servers identify you and help you in most instances. If your client goes wrong and gets itself stuck in a loop because you forgot to increment $i for example, that server can see that MyLastFmClient for example is spamming the server with 1,000+ requests a minute. They can then see your UA string and contact you about it.
Another reason is that some servers might actually block access, or provide different content depending on the client. I know that Google serves up a completely different search results page if you’re on IE5 than in IE8 for example. Another server might block all known browsers for example from accessing a web service (e.g. with the message “this page cannot be accessed using a web browser”). I’m not saying this is good or bad practice as that is a WHOLE other kettle of fish – but I’m just saying it can happen, and that sort of thing can be pretty hard to track down.
Although this all might seem pretty trivial, it is useful, and I think any HTTP(S) client should identify itself properly using a clear and descriptive user agent string. It’s no harder to do and it just makes everyone’s lives easier!
OK, so far all that I’ve managed to do is install it and have a dabble with the config pages and go “oooh that looks pretty”, so this isn’t a hardcore review or anything.