Why Parse the User-Agent?

Kaufman, Jeff T.

Why Parse the User-Agent?	November 22nd, 2014
	tech

People really hate it when web servers decide what to do based on User-Agent:

"UA strings need to die a horrible horrible death."
  — UnoriginalGuy
"The UA string is flawed by itself. it shouldn't even be used anymore. The fact that browser manufacturers have to include all sorts of stuff is proof that this system doesn't work."
  — guardian5x

"Well written sites use feature detection, not user-agent detection."
  — Strom

"We'd all be better off if they just stopped sending the UA string altogether"
  — nly

The modern advice is to use feature detection. Instead of the server intepreting the User-Agent header to guess at what features the browser supports, just run some JavaScript in the browser to see if the specific feature you need is supported. When this fits your situation this is great, but it's almost always slower. Many times it's not enough slower to matter, just a few more lines of JavaScript, but let's look at a case where the performance issues are substantial.

Let's say I want to show you a picture of a kitten:

How big is that? [1] It depends how we encode it:

Unoptimized JPEG	290.0 kB
Optimized JPEG	38.6 kB
Optimized WebP	20.2 kB

Optimizing the JPEG makes it 7.5x smaller, and WebP gets it another 1.9x smaller on top of that. (Update 2014-11-23: These sizes are off because of mismatched quality settings. More details.) So why doesn't everyone use WebP? The problem is not all browsers support it. For now, the best we can do is send WebP to the browsers that support it, and optimized JPEG to ones that don't. With feature detection this would look like, roughly:

    <img id=img>
    <script>
       var img = document.getElementById("img");
       if (SupportsWebP()) {
          img.src = "image.webp";
       } else {
          img.src = "image.jpg";
       }
    </script>

First the browser downloads the HTML, then it runs the javascript, and then depending on the value of SupportsWebP() [2] it either loads image.webp or image.jpeg. What's inefficient about this? How is this worse than just handling

<img
src="image.webp">

The problem is this breaks the preload scanner. Technically, the browser is supposed to make its way through the web page piece by piece, handling each bit as it comes to it. For example, if it gets to some external JavaScript, it's supposed to fetch and run that script before continuing on with anything else. To load your page faster, however, your browser cheats. While it's waiting for that script to load, it looks ahead through the rest of the page for resources it thinks it's going to need and fetches them. And, critically, that scanner doesn't run javascript.

Here are two example pages, containing a single external script and an image: one loads the image with JavaScript and one uses an ordinary img tag. Running these both through WebPageTest (1, 2), here are charts showing how the browser loaded the page:

loaded with JavaScript

loaded with img tag

You can see that in the JavaScript case the browser loaded everything in order while when using an img tag the two files could be loaded in parallel. [3]

To emit html that references either a JPEG or a WebP depending on the browser, you need some way that the server can tell whether the browser supports WebP. Because this feature is so valuable, there is a standard way of indicating support for it: include image/webp in the Accept header. Unfortunately this doesn't quite work in practice. For example, Chrome v36 on iOS broke support for WebP images outside of data:// urls but was still sending Accept: image/webp. Similarly, Opera added image/webp to their Accept header before they supported WebP lossless. And no one indicates in their Accept whether they support animated WebP.

This leaves us having to look at the User-Agent header to figure out what the browser is, and then look up what features that browser supports. The header is ugly, I hate having to do this, but if we want to make pages fast we need to use the UA.

(The full gory details: kernel/http/user_agent_matcher.cc.)

[1] I uploaded this picture to my server as a poorly optimized jpeg, but I'm running PageSpeed. You should be seeing WebP if your browser supports it, or an optimized JPEG if it doesn't.

[2] Which would be a bit of an awkward function.

[3] This only is a problem because of the external script reference. If there were nothing to block the regular parser then both versions would be just as good. (1, 2) Most pages do reference external scripts, however, so in practice the preload scanner helps a lot and you don't want to disable it.

Referenced in: WebP Quality Settings

←

Compiling SSIM on Mac Gendered Behavior

→

Comment via: google plus, facebook, hacker news, substack

Why Parse the User-Agent?

Recent posts on blogs I like:

Tuberculosis Considered As Dating Strategy

Retrospective on life tracking and effectiveness systems

Elixir's Last Dance