Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The title is pretty misleading. They're not even running Postgres, but AWS Aurora, which is Postgres compatible, but is not Postgres.

For what it's worth, every command ran works on normal Postgres. Hence we didn't think it mattered to mention Aurora specifically in the title.

> Also, pausing queries does count as downtime.

If a query takes a bit longer to respond, I don't think that counts as downtime. From the perspective of the user, they couldn't distinguish this migration event from some blip of slightly slower queries.



> If a query takes a bit longer to respond, I don't think that counts as downtime. From the perspective of the user, they couldn't distinguish this migration event from some blip of slightly slower queries.

It comes down to defining Service Level Objectives (SLOs) that are meaningful to your users. For one system I worked on, latency was important, and so one SLO was "99.999% of <a certain class of> requests with a deadline >=1s should succeed with latency <1s", so if this affected more than 0.0001% of requests in <time interval defined in our SLO>, we'd have called it an outage. But I've also worked on systems with looser SLOs where this would have been fine.


Not only that but I think you also need to take upstream systems into account. With a reasonably robust frontend that handles transient issues and retries reasonably, I think it's ok to say "no downtime"


Completely depends on what the "user" is. Are they a human, or a machine that explicitly requires timings within a particular threshold?


It depends if it feels like an outage


> If a query takes a bit longer to respond, I don't think that counts as downtime

"We're sorry that your query took 7 hours to be responded to, but it wasn't an outage - honest"


We would count 7 hours as downtime too. Our pause was less than 5 seconds.


Nice job, then! Technical downtime that’s virtually undetectable to users is a big win. In fact, “less than 5 seconds of downtime” in the title would actually make me want to read the article more as I tend to be suspicious of “zero downtime” claims for database upgrades, whereas <5s is clearly almost as good as zero and actually quantified :)


On the other than "less than 5 seconds of downtime" might give the impression that new queries sent within that time period would be rejected, while zero implies this doesn't happen, i.e. that it's undistinguishable from normal operation for the client.

And being even more precise in the title would just make it less titley :).


Yeah - a quantifiable amount in the headline would change the likelihood of the article being taken seriously - it goes from "No downtime? I call BS" to "Less than 5 seconds, that seems reasonable, and worth investigating"


Less than 5 seconds seems pretty reasonable to me to call it zero down time.


5 seconds pause on queries would make our app server drop connections and throw errors under cyclical high load - which would result in a incident.


Strong energy of "someone brushed up against me and that's assault" going on here




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: