Contributed by Bobby Bissett
EDB Postgres Failover Manager (EFM) supports multiple ways to connect applications with the new master database after a failover or promotion. In many cases, a virtual IP address (VIP) can be used, and Failover Manager will move the VIP as needed. On cloud providers where the VIP is not an option, script hooks can be used to move a resource (e.g. an Elastic IP Address on AWS) to the new master node. With pgPool and other load balancers, additional script hooks can be used for updating the balancer configuration. However, not all failure cases could be handled by the existing options.
In EFM v3.2, we have added two new hooks that will allow more general load balancer operations. The two properties use the pgPool terms “attach” and “detach” for adding/removing a node from the load balancer configuration:
# Absolute path to load balancer scripts
#
# The attach script is called when a node should be attached to the load
# balancer, for example after a promotion. The detach script is called
# when a node should be removed, for example when a database has failed
# or is about to be stopped. Use %h to represent the IP/hostname
# of the node that is being attached/detached.
#
# Example:
# script.load.balancer.attach=/somepath/attachscript %h
script.load.balancer.attach=
script.load.balancer.detach=
With these two hooks, it will be easier to support cases like switchover, standby failure, and master failure that does not end with a promotion (e.g. when the cluster is already at the minimum number of standbys if set).
The “detach” hook will be called in the following cases:
- After a database failure. In this case the script will be called by the agent local to the database that has failed.
- After a database node failure. In this case another agent will call its script, passing in the ip/host of the failed node. Since another agent is calling the script, it is important to specify scripts on every node, including witness nodes, if using this feature.
- When a master agent detects that it has been isolated from the majority of the cluster and is fencing itself off. In this case, the agent may not be able to reach the load balancer, but it makes the attempt anyway. To the rest of the cluster, this is the same as if the node failed (see entry above).
- When a master agent stops monitoring the database and shuts it down before a switchover.
In summary, any condition that would cause a master or standby agent to enter ‘IDLE’ mode, or if the rest of the cluster sees a node disappear, will call the ‘detach’ script hook.
The “attach” hook will be called in these cases:
- A ‘promoting’ node becomes a master. This is the case for any promotion where a standby becomes a master database: failover, manual promotion, or switchover.
- An ‘IDLE’ agent resumes monitoring a master or standby database after a failure. This is when a failed database has been fixed/restarted, and the agent resumes monitoring through the efm resume command or the auto.resume.period property.
- An ‘IDLE’ agent resumes monitoring after startup. An example of this case is a node boots, efm starts, the database starts, and the efm agent is instructed to resume monitoring. The database should not already be in the load balancer configuration because it was not previously running.
- An ‘IDLE’ agent resumes monitoring the original master database that has been restarted as a standby after a switchover.
In summary, any condition that would cause an ‘IDLE’ or promoting agent to become a master or standby agent will call the ‘attach’ script hook.
For example, consider a node running on address 1.2.3.4 with the following set:
script.load.balancer.attach=/somepath/attach_script.sh %h
script.load.balancer.detach=/somepath/detach_script.sh %h
If the database on that node dies, the following would be run as the user running Failover Manager (typically the ‘efm’ user), passing in the IP address of the database node that should be removed from the load balancer configuration:
/somepath/detach_script.sh 1.2.3.4
A single script could also be used:
script.load.balancer.attach=/somepath/lb_script.sh attach %h
script.load.balancer.detach=/somepath/lb_script.sh detach %h
In the same case as above, the following would be run as the failover manager user, where ‘detach’ and ‘1.2.3.4’ are the two parameters passed to the script:
/somepath/lb_script.sh detach 1.2.3.4
If a node fails, then another node would call its local detach script as above, passing in the address of the failed node. This is why it is important to specify scripts on every node, including witness nodes, if using the new load balancer hooks.