Rachel Kroll

Feedback on feed stuff and those pesky blue screens

Wow, okay, things are cooking today. I posted last night with the results of the feed readers which were still actively pinging the test feed. I've heard from several feed reader authors, and they're cranking away at stuff! They obviously care a lot about doing the right thing.

One of them even put up a post already, and so I feel obliged to share it:

NetNewsWire and Conditional GET Issues

That post actually has a link to the report page, so if you want to see what a real batch of data looks like, there you go.

There are some non-feed-reader bits of feedback I'd like to address, too.

...

Hand-drawn analog clock pointing at 3:00 on Friday the 5th but with the 3 replaced with "SEV"

One reader asked for my take on the whole CrowdStrike thing. This is where they shipped some update to a bunch of Windows machines and nuked them. I get the impression that this was something that ran rather low in the stack and had the ability to pretty much take control of the whole machine... or kill it entirely, as it turned out.

When stuff like this happens, I fall back to pointing out something that we tried to make into a "thing" at FB in the fall of 2014. There had been *far* too many outages in a really short period of time. It had started with "Call the Cops" (10 years ago now - August 1st!), and a month later we were getting tired of outages every Friday at 3 PM - I started calling it "SEV O'Clock".

The notion we tried to get known far and wide was "nothing goes everywhere at once". This means that code pushes, config changes, and even flag flips should happen piecemeal. Don't go from 0 to 100% in a single step. Take a bit and space it out. If you can't space it out, ask yourself why, then see if you can make it happen.

I've seen some amazing things almost happen. One company was in the quiet period prior to their IPO when the execs know when it'll happen but the rest of us are in the dark. Basically, it could happen *any day*, and the bigwigs are out schmoozing with the press. It's the last time you want to tank the site, service, or whatever.

Once, I overheard some people saying they were going to convert the entire site's user database from goofy not-SQL db flavor A to goofy not-SQL db flavor B in one fell swoop on Sunday night. It was Thursday. I asked if they had a way to back out. Nope, it was a one-way trip. I asked if they could do it in stages. Nope, it was all or nothing.

"So you want to do something that will affect everyone at once, with no way back, on a weekend... during the IPO road show?"

They reconsidered.

To be clear, I have no idea if the CrowdStrike thing rolled out as fast as possible or if it had stages. It seems like if it had a phased rollout, then it might've nuked a few machines in a few places, but then the alarms would have gone off and they would have hit the big red STOP button... right? All of those hosts disappearing right after getting our update means something, doesn't it?

Well, look at that. "Breach hull--all die." Even had it underlined.