Hello @BritAus,
We’ve delved into the block timeout issue and conducted some quick experiments. To gain insights, we developed an application that generates latency graphs for each block and validator.
CONTEXT:
Our tests highlighted four relevant graphs:
- ‘Country proposer’: the country of the validator issuing a proposal block.
- ‘Miss by country’: the number of validators missing a block, sorted by geographical location, using data from the observatory site.
- ‘Signers’: the number of validators signing the block (all 60 have signed).
- ‘Validator latency’: the time between a validator proposing a block and sending a pre-commit message.
A second tool precisely determines latencies between different network nodes from our Rpc located in France and Singapour. All the following latencies have already been doubled to simulate a real TCP communication.
Singapour
America
CA min=230ms avg=233ms max=252ms peers=8
US min=172ms avg=234ms max=285ms peers=17
CL min=329ms avg=329ms max=329ms peers=1
Europe
FI min=182ms avg=185ms max=190ms peers=26
DE min=154ms avg=174ms max=250ms peers=45
PL min=164ms avg=196ms max=253ms peers=7
FR min=150ms avg=175ms max=288ms peers=12
CZ min=194ms avg=219ms max=244ms peers=2
NL min=160ms avg=176ms max=209ms peers=4
CH min=163ms avg=178ms max=190ms peers=3
GB min=155ms avg=202ms max=249ms peers=2
AT min=152ms avg=152ms max=152ms peers=2
IE min=162ms avg=177ms max=188ms peers=6
Asia
JP min=68ms avg=74ms max=85ms peers=38
SG min=0ms avg=0ms max=3ms peers=23
KR min=79ms avg=92ms max=100ms peers=3
HK min=37ms avg=37ms max=37ms peers=2
IN min=39ms avg=60ms max=67ms peers=4
TW min=49ms avg=49ms max=49ms peers=1
Australia
AU min=118ms avg=118ms max=118ms peers=1
France
America
CA min=78ms avg=81ms max=101ms peers=8
US min=8ms avg=91ms max=145ms peers=17
CL min=240ms avg=240ms max=240ms peers=1
Europe
FI min=28ms avg=28ms max=40ms peers=26
DE min=8ms avg=11ms max=25ms peers=45
PL min=27ms avg=27ms max=28ms peers=7
FR min=0ms avg=2ms max=10ms peers=12
CZ min=17ms avg=17ms max=17ms peers=2
NL min=6ms avg=8ms max=12ms peers=4
CH min=13ms avg=15ms max=21ms peers=3
GB min=3ms avg=3ms max=3ms peers=2
AT min=22ms avg=22ms max=22ms peers=2
IE min=14ms avg=17ms max=22ms peers=6
Asia
JP min=217ms avg=231ms max=258ms peers=38
SG min=150ms avg=174ms max=266ms peers=23
KR min=250ms avg=268ms max=285ms peers=3
HK min=238ms avg=251ms max=264ms peers=2
IN min=145ms avg=199ms max=218ms peers=4
TW min=241ms avg=241ms max=241ms peers=1
Australia
AU min=292ms avg=292ms max=292ms peers=1
__
ANALYSIS:
After several days of monitoring, we can see that, in the majority of cases, everything is running smoothly. However, regularly (1 to 2% of the time), a large number of validators fail to sign a block. The block shows no particular transaction, and the number of transactions is constant (between 4 and 5). On the other hand, we can see that the majority of validators who fail to sign are located on another continent, particularly when the proposers are signing from Japan and the UK.
European validators proposing a block see many Asian validators not signing, and vice versa. The default timeout of 1s for pre-vote or pre-commit may be too short given the latency table from France or Singapore.
__
CONLUSIONS:
To minimize block misses, we propose the following measures:
-
Extend Consensus Round 0 windows: Increase both pre-vote and pre-commit timeouts to 1,250s (adding 250ms to the existing value). Consequently, elevate timeout_prevote_delta and timeout_precommit_delta to 625ms (adding 125ms due to the extension of the pre-vote and pre-commit timeouts). These adjustments aim to facilitate better communication among validators during periods of high latency without significantly altering the final block time. This change should be applied across the entire validator network, possibly during an upcoming update.
-
Setup Hardware Recommandations: For optimal reaction time, we recommend high-performance hardware, such as 8 core / 16 threads, 32GB RAM, and a fast disk like NVMe or RAID NVMe. Regarding our dYdX validator, we made some modifications to the config.toml file (as we previously explained in this post).
-
Distribute the Validator Set: Enhance the distribution of block proposals across a wider range of locations. Currently, 90% of proposal blocks originate from Asia or the UK. Various solutions can be explored, and we encourage the community to initiate discussions. Possible considerations include targeted Foundation delegations and setting geographically based vote power caps.
-
Establish a Transit Network: Create a dedicated “IP transit” network with institutional partners (he.net, cogentco.com, akamai, …) connecting diverse geographic areas of consensus. Prioritize access to submarine fiber-optic cables, boosting connectivity between key consensus locations. This strategic move aims to minimize average latency effectively.
To mitigate block misses, we recommend implementing these measures gradually over time. Initially, the most immediate and cost-effective solution involves extending consensus’ round 0 windows and setting minimum hardware recommendations, while concurrently addressing the centralization of block production geography. Simultaneously, efforts should be directed toward developing a longer-term solution, such as the proposed “transit network.”
Opensourcing tools
We’re willing to open source our Python exporter/dashboard for Grafana and our Cosmos-scanner if needed. Additionally, we can provide a dump of our investigation database to the foundation and the community.
__
On behalf of the entire team, thank you all for your attention.