Revert the permanent mark caused by jailings as a direct result of the v2 emergency upgrade, given that there is yet to be in place an official emergency security alerting policy.
Description:
It has been suggested to implement a security alerting emergency policy, but given the chain is still young, it has yet to be implemented.
When the v2 security emergency upgrade was released, a few of the bigger validator were messaged privately and then it was announced to the other validators, but because the allowed downtime on dYdX chain is low, many validators who were not fortunate enough to be in a timezone where they were awake for the announcement were jailed, this has caused all these validators to be permanently marked as slashed in the x/distribution module.
Currently this only effects inclusion in the LST sets such as stkDYDX from persistence, but in the future it may also be included in various metrics regarding validators which will further effect fair competition.
This proposal seeks to remedy this, by proposing to remove these specific jailings from the v2 upgrade from the state in the next binary upgrade.
There is already precedent for removing these, when caused directly up upgrades, such as the untombstonings we saw in the SecretNetwork, Persistence and Chihuahua chains.
Next Steps
We propose to submit a governance proposal onchain in the first week of January 2024 so that enough validators are back from holidays to be able to participate in the governance effectively.
This is a non-binding text proposal expecting someone (who is not named but I imagine you expect dydx trading to do this) to prepare a software upgrade and new binary that alters the state of the chain seemingly only to your own benefit.
This proposal does not highlight which validators and which exact slashes are to be removed from the state of the chain exactly. The lack of specifics makes this proposal to ambiguous to support.
The proposal goes against the “spirit” of the existing slashing parameters on the dydx chain - namely: Punish operators and thereby stakers for lasting downtime by missing rewards through quicker jailing, but avoid excessive punishment to stakers specifically for operator mistakes by keeping slashing %s at 0. A smaller signing window and quicker jailing action is a feature, not a bug.
It’s common for emergency security upgrades to focus on the top voting power. It is, after all, a security upgrade and the chain comes first. Not having monitoring set up to catch a node going down is an operator fault, NOT a chain fault. One such network that does this is Osmosis, and those validators of lower VP then have to react accordingly.
edit:
Re-writing chain state should be a last-resort for networks, as it raises all sorts of regulatory questions. Just because secretnetwork, chihuahua, and persistence have rewritten their chain state doesn’t justify doing it here.
Potentially bringing on regulatory scrutiny to solve an operator mistake seems, at best, unwise.
1 + 2 - We have already been in communication with both dYdX trading and the foundation and highlighted how and which jailing specifically, apologies for not making this clear in the forum.
It would just be an addon to the next upgrade, not expecting a special upgrade just for this.
3 - You can tell by the announcement, it was very much expected these jailing would occur,
which can and will be avoided in the future when a proper process has been established;
we have tried communicating this many times and only recently broke through to a relevant department regarding this, the Operations subDAO, which are going to implement our suggestion on an emergency alerting system.
Also completely agree a small allowable downtime is fine with proper processes in place
4 - We do have monitoring, but unfortunately due to it being a binary swap with no notice in the beginning of the night it took our backup node for monitoring as well as the external third party nodes we use for additional monitoring - and jailed all before waking up.
1 + 2 - We have already been in communication with both dYdX trading and the foundation and highlighted how and which jailing specifically, apologies for not making this clear in the forum.
Which teams are your proposing?
3 - You can tell by the announcement, it was very much expected these jailing would occur…
Sure, but that doesn’t mean chain state should be re-written.
4 - We do have monitoring …
I’m certain you do have monitoring, but the reality is being jailed means the monitoring was insufficient; ultimately that’s where almost all jailings come from.
This is insufficient reasoning for bringing on regulatory scrutiny.
I agree with @schultzie on this. I understand that validation is a business and that things like this may impact those business models to an extent, but hard forking / changing state of the chain in order to help a subset of validators market themselves better is a pretty disproportionate measure.
Changes to state should be taken seriously, for the regulatory reasons mentioned but also because forking increases operational overhead and increases the chances of more things breaking. Imagine if a validator was jailed because of the upgrade required to fix these jailings. Should we upgrade to undo those jailings as well?
I understand that this was a bit of an extreme situation, but it’s foreseeable that future emergency upgrades will be needed that will result in further jailings if validators don’t take steps to prepare for them. Imo, it’s important to not set this precedent now.
A better path forward in my mind is to increase the downtime jailing window to a number of blocks that will give validators 8-12 hours of downtime allowance before a jailing occurs. It won’t help those that were impacted by this upgrade, but it should help better align incentives between the protocol and validators moving forward.
Let us start with the positive—we’re aligned on the need for a robust framework to navigate upgrades smoothly. It’s a critical aspect that demands attention, this event clearly proved that an upgrade framework needs to defined.
On a different note, we’re leaning towards a NO vote on this proposal. While the intention is commendable, there are aspects giving us pause.
When it comes to altering the chain state—our stance is a bit cautious. We shall consider changes only when absolutely necessary. In this instance, the reasons outlined in the proposal don’t quite meet our threshold. The lack of monitoring that lead certain validators to default appears more of an operator oversight than a systemic flaw. It’s a lesson to be learnt rather than a glitch. Nevertheless, we do agree there’s room for refining the update framework.
We’re fully on board with initiating a discussion in the forums to sculpt a more effective update framework, covering both routine updates and those unexpected emergency cases. A collective brainstorming session could yield valuable insights.
Thanks for putting this forward, and we look forward to more collaborative discussions and enhancements.
Hey, thanks for your comments, we do also agree with the points, both sides have merit IMHO.
And yes, thankfully since the incident they now understand why we were pushing so hard for an alerting system prior.
We have shown them the exact processes Axelar uses, they liked it and they are now circulating this internally, so hopefully we will have a formal process sooner rather than later now