Skip to content

Casper node failover #13

@przemyslaw

Description

@przemyslaw

Context
Current Casper node operations do not include an automatic failover system. This leads to potential service disruptions if the primary node fails. The concept involves a primary (main) and secondary (slave) node system, where each node is monitored, and failover is triggered based on specific conditions.

Goal
The goal of this task is to implement or use a failover module for the Casper node. This module should ensure high availability of the network services by automatically switching to a backup node if the primary node fails. The failover mechanism should be efficient, with a minimal performance drop during the switch and should avoid double-signing to prevent penalties.

Requirements

  • Dual-node architecture: one main and at least one slave.
  • Each node should have two public keys: one for main operations and the second for failover scenarios.
  • Nodes must regularly ping each other, with intervals and monitoring duration configurable through settings.
  • The failover process should activate the slave node as the primary if the main node becomes unresponsive for a specified period.
  • Include internal replication between main and slave nodes to prevent performance degradation during failover.
  • The system to avoid double-signing, leveraging experience from existing solutions like Horcrux in the CosmosSDK ecosystem.
  • The solution should seamlessly revert to the original configuration once the main node is operational again.

References:

  • Review existing failover mechanisms in blockchain systems, such as the custom Tendermint fail-tolerance applications by Farbole, Figment, and CertusOne.
  • Analyze the Horcrux threshold Tendermint signer as a model for the Casper node failover system.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions