You may remember a short while ago Twitter had some downtime; connections seemed to drift into the ether and time out after a number of seconds (or even minutes).
Even the API was affected, and as a result so was I – pretty dramatically.
A while ago I threw together a quick little site idea called Tweetbars. It got some interest & coverage at the time but eventually I left the site to one side – happily serving Twitter status images to a few punters. At the moment about 500 bars are under active use with around 4000 hits per minute (these are rough figures – the stats tracking provided by my last host was rudimentary at best).
The code is simple and hacky; when a “bar” is displayed for the first time a call is made to the public Twitter API for the requested user to recover the latest tweet. This is cached on disk and, when the image is requested subsequent times the code checks the disk cache and either uses that or refreshes from the API again. Requests are made using PHP cURL.
Unfortunately in my hurried creation of Tweetbars I made a crucial engineering mistake: Relying on Twitters API
When Twitter started to time so did my cURL connections – but very slowly. I’d never considered this scenario and, so, hadn’t set CURLOPT_TIMEOUT (the default for which is “never timeout”). As a result some of those connections were hanging around for up to a minute; any bar with an expired cache kept opening connections on every call. Ouch.
To compound matters, for some inexplicable reason, my host hadn’t set up PHP to kill slow executing pages.
Cue a locked up server (the stats they showed me had about 3000 open connections and 89% CPU assigned to my php processes). Youch.
Understandably they shut down my account; taking 20 websites (including this one) with it. Which was a pretty major blow.
Never rely on a third party API.
Twitter are in no way to blame here; it was entirely my fault but when Tweetbars was built I assumed performance and features would be consistent with what I was seeing at that moment in time. An error caused this issue, but a number of other factors could have resulted in the same problem (for example slight changes to how the API worked or returned data).
Always figure out a way for your app or site to fail gracefully when it hits unexpected problems – it is better to slightly annoy your users for a short while than have to wait 12 hours for your provider to unlock a hosting account