Careful with Caching

August 7th, 2022
tech
A friend recently shared this graphic on Facebook:

(I've added the "wrong" overlay.)

This is clearly incorrect: there's no way Massachusetts has more prisons than colleges. (MA actually has the largest ratio of colleges to prisons in the US.) After putting a link to the original source in the Facebook discussion, however, we found something pretty weird: people on mobile were seeing the incorrect map, but people on desktop were seeing a corrected one:

It turns out that Facebook was appending a tracking parameter, ?fbclid=... on desktop, but not on mobile. Normally this wouldn't do anything, because the site would ignore that in determining what page to return, but this site is apparently configured with a cache.

Many sites use caches to make it easier to serve pages. If you ask them for a page they'll give it to you (which might require a lot of work to generate) and then save a copy. Then when someone else asks for the same page, they can return the saved copy instead of putting in all that work to regenerate it. The site has a cached copy of /usa-prison-v-college in its cache with outdated content, but since ?fbclid=... is always followed by a new token those requests will never be found in the cache, and they return the current, corrected, page.

If you have a cache, what do you do when you change the page, like they did here? There are two main approaches:

  • Have a way to tell the cache the page has changed and it should forget its copy.

  • Always cache for a short time. Even just one minute can take a lot of load off a server that is getting thousands of requests for a hot page.

In this case they apparently didn't do either: we were running into this yesterday, and as of right now the site is still returning incorrect data.

(On the original question, comparing the number of colleges to the number of prisons is pretty silly: if one state runs large prisons and small colleges is that any better than a state that does the reverse? Comparing the number of people in prison vs college would make much more sense.)

Comment via: facebook, lesswrong

Allison (via fb):link

My bigger complaint is with Facebook, appending that annoying parameter to every link, than with poorly configured web servers! 😠

Andrew (via fb):link

MA used to have more colleges, but they all became universities when they were renamed by marketing departments.

Ben (via fb):link

I seem to recall Hawaii imports prison capacity from other states for detention, and obviously Massachusetts exports a lot of education. If all states had to be self-sufficient I'm guessing Hawaii would flip colour here, holding prison and college size constant?

Elizabeth (via fb):link

wait are they counting by institution and not headcount or capacity?

Jan (via fb):link

Insane, what we're being fed. Or conversely, very crafty.

Julia (via fb):link

At the bottom of the source page, it does have a graph of more like the thing you want: incarceration rate and degree rate (where MA is basically the best state on both, Maine might be a little lower on incarceration.) Ironically they only count bachelor's degrees and not associates' degrees, which seems kind of irrelevant given that 25% of people who have been incarcerated don't have a high school diploma or GED. If you care about reducing incarceration, better access to GED programs, vocational programs and associates' degrees are going to be much more useful than increasing the number of bachelor's degrees. https://www.prisonpolicy.org/reports/education.html

Kiran (via fb):link

The “original source” you link to has numbers for MA of 88 colleges and 107 prisons, which matches the info in the infographic.

Marcus (via fb):link

On a completely unrelated subject, the Globe had a piece this morning that said (among other things) that people with higher levels of education were more likely to make logical and mathematical errors than those with less education when the correct answer contradicted their political beliefs.

Paul (via fb):link

Marcus This summary is entirely backwards, if this is the research I think it is. They gave people data which on a quick skim appeared to support one position, but if you dig in you'll find it supports the opposite. When the quick skim contradict…

See more

Marcus (via fb):link

Paul I think this was a different study. They had a math problem with gun laws and crime data. The higher the education level the more likely liberals were to get an incorrect answer that the gun laws reduced crime and conservatives incorrectly…

See more

Jeff Kaufman (via fb):link

Marcus perhaps just link the study you're summarizing?

Marcus (via fb):link

Jeff Kaufman https://www.bostonglobe.com/.../us-vs-them-paradox.../ It didn't give enough information to link to the actual study unfortunately.

Marcus (via fb):link

I think it's this one based on Googling though: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2319992

Paul (via fb):link

Yep that is exactly the study I was talking about.

Marcus (via fb):link

Paul Why do you think it was deceptive? The questions seemed quite clear to me. Furthermore, if the problem was people digging into the data more carefully you would see the same phenomenon with the apolitical skin rash question.

Michael (via fb):link

I wouldn't call it deceptive (nor did Paul). Here's how they describe it: "Correctly interpreting the data was expected to be difficult. Doing so requires assessing not just the absolute number of subjects who experienced positive outcomes (“rash bett…

See more

Marcus (via fb):link

Michael I don't understand how that disproves their point in anyway though. It still shows that people who are otherwise good at solving this sort of math problem, fail at doing so when confronted with data that contradicts their political belie…

See more

Paul (via fb):link

Marcus Right, whereas people who are less good at it fail at solving it whether it contradicts their political beliefs or not.

Šime (via fb):link

I don’t understand this. The HTML document has a `cache-control: no-cache, no-store, must-revalidate` header, so why doesn’t the response contain the latest version? Where is the outdated document cached? Not in the browser because the browser fetched the document every time from the network. Then on Cloudflare’s edge servers? Why would Cloudflare cache document URLs with and without the query string separately?

Dagon (3y, via lw):link

Famously, the two hardest problems in computer science are cache invalidation and picking names for things.

I'm curious what's actually doing the caching here.  Most modern servers and CDNs are fairly sophisticated about what components of the URL go into the cache keys, and know that tracking IDs should be ignored.  

gjm (3y, via lw):link

No, famously the two hardest problems are cache invalidation, naming things, and off-by-one errors.

JBlack (3y, via lw):link

You're saying that Dagon was off by one problem?

Recent posts on blogs I like:

Against Lyman Stone On Animal Welfare

Demographer Lyman Stone writes:

via Thing of Things March 21, 2025

Product in the age of AI

We’re seeing AI features pop up in every product we use. Slack, Google Drive, etc.

via Home March 18, 2025

How I've run major projects

focus • maintain a detailed plan for victory • run a fast OODA loop • overcommunicate • break off subprojects • have fun • bonus content: my project management starter kit

via benkuhn.net March 16, 2025

more     (via openring)