> The title is pretty misleading. They're not even running Postgres, but AWS Aur...

scottlamb · on Jan 29, 2025

> If a query takes a bit longer to respond, I don't think that counts as downtime. From the perspective of the user, they couldn't distinguish this migration event from some blip of slightly slower queries.

It comes down to defining Service Level Objectives (SLOs) that are meaningful to your users. For one system I worked on, latency was important, and so one SLO was "99.999% of <a certain class of> requests with a deadline >=1s should succeed with latency <1s", so if this affected more than 0.0001% of requests in <time interval defined in our SLO>, we'd have called it an outage. But I've also worked on systems with looser SLOs where this would have been fine.

nijave · on Jan 29, 2025

Not only that but I think you also need to take upstream systems into account. With a reasonably robust frontend that handles transient issues and retries reasonably, I think it's ok to say "no downtime"

RadiozRadioz · on Jan 29, 2025

Completely depends on what the "user" is. Are they a human, or a machine that explicitly requires timings within a particular threshold?

lionkor · on Jan 29, 2025

It depends if it feels like an outage

awesome_dude · on Jan 29, 2025

> If a query takes a bit longer to respond, I don't think that counts as downtime

"We're sorry that your query took 7 hours to be responded to, but it wasn't an outage - honest"

stopachka · on Jan 29, 2025

We would count 7 hours as downtime too. Our pause was less than 5 seconds.

libraryofbabel · on Jan 29, 2025

Nice job, then! Technical downtime that’s virtually undetectable to users is a big win. In fact, “less than 5 seconds of downtime” in the title would actually make me want to read the article more as I tend to be suspicious of “zero downtime” claims for database upgrades, whereas <5s is clearly almost as good as zero and actually quantified :)

_flux · on Jan 30, 2025

On the other than "less than 5 seconds of downtime" might give the impression that new queries sent within that time period would be rejected, while zero implies this doesn't happen, i.e. that it's undistinguishable from normal operation for the client.

And being even more precise in the title would just make it less titley :).

awesome_dude · on Jan 29, 2025

Yeah - a quantifiable amount in the headline would change the likelihood of the article being taken seriously - it goes from "No downtime? I call BS" to "Less than 5 seconds, that seems reasonable, and worth investigating"

ElijahLynn · on Jan 29, 2025

Less than 5 seconds seems pretty reasonable to me to call it zero down time.

tossandthrow · on Jan 29, 2025

5 seconds pause on queries would make our app server drop connections and throw errors under cyclical high load - which would result in a incident.

paulddraper · on Jan 29, 2025

Strong energy of "someone brushed up against me and that's assault" going on here