[FIX] Stabilizing the Validators' Block Miss

After two weeks of rigorous performance testing on our validator, we regrettably found the initial situation to be below acceptable standards for production based on our criteria. In response, we conducted production tests at two of our sites, one in Europe and the other in Asia, utilizing the same type of machine—a bare metal server with an Intel Xeon-E 2386G CPU, 64 GB DDR4 ECC RAM, and RAID 10 on NVMe hard drives.

During the one-week testing period at each site, we consistently observed unacceptable performance with the out-of-the-box validator configuration. Following numerous configuration tests, we eventually reached an acceptable situation with the following adjustments in the config.toml file under the p2p section:

config.toml

max_num_inbound_peers = 60
max_num_outbound_peers = 60
flush_throttle_timeout = "10ms"
send_rate = 20480000
recv_rate = 20480000
mempool_version = "v0"
consensus_timeout_propose = "2s"

These changes were made to optimize gossip communication both in and out due to the increased number of inbound and outbound peers. Additionally, we reverted to the deprecated version “v0” for mempool. To accommodate the maximum block time, we adjusted the consensus timeout_propose to “2s”.

To enhance security measures, we also bound the p2p port on our firewall. With this revised configuration, we successfully reduced our missing block rate by an impressive 75%. It’s essential to note that while these adjustments proved effective for our hardware setup, they may require fine-tuning based on different hardware solutions.

We share these conclusions not as an ultimate solution but as a foundational reference for the community to iterate upon. Our goal is to contribute to the broader validator set’s efficiency improvement. If you have any alternative suggestions or insights, we welcome your input. Please feel free to contact us with any further recommendations.

3 Likes