DHCP Failover protocol for achieving load balancing while PXE boot

DHCP protocol can be used as bootp protocol for assigning IP address to PXE clients which can further download OS image from DHCP server and boots using downloaded operating system image.

Following are initial messaging between DHCP Client and DHCP server to assign IP address from address pool.

DHCP Client DHCP Server

---------------------> DHCP Discover

<--------------------- DHCP Offer

---------------------> DHCP Request

<--------------------- DHCP Ack

Following are the DHCP message types

DHCPDISCOVER - Client broadcast to locate available servers
DHCPOFFER - Server to client response offering configuration parameters
DHCPREQUEST - Client broadcast requesting offered parameters
DHCPDECLINE - Client to server notification that IP address is in use
DHCPACK - Server to client response confirming a request
DHCPNAK - Server to client response denying a request
DHCPRELEASE - Client to server request to relinquish IP address
DHCPINFORM - Client to server request for configuration parameters

Need of DHCP Fail-over protocol:

When there are large number of DHCP clients whose IP address would be assigned from DHCP server and which further downloads OS image from same DHCP server for booting. In such case,single DHCP server will be overloaded and total time for assigning IPs to all DHCP clients will be more. To balance the load, DHCP failover protocol came into picture.Here there are two DHCP servers, one is primary and other one is secondary server.Based on DHCP configurations, some of DHCP clients' IP would be assigned from primary DHCP server and rest DHCP clients' IP address would be assigned from secondary DHCP server. In case anyone of DHCP server is down, all DHCP clients's IP address would be assigned from active DHCP server.

Prerequisites for DHCP failover for proper functioning as bootp protocol

NTP servers running on both DHCP server machines should be same i.e time syn is required for proper working of DHCP failover.
ISC DHCP version on both primary and secondary DHCP server should be same.

Configurations for Enabling DHCP Failover protocol :

Below are key parameters in configurations.

 failover peer “name” { }
Where name is the name you define for the pairing (and include the quotes). The curly brackets enclose the statements below providing a full definition of the failover configuration for the server.

 mclt seconds: The maximum client lead time (mclt) declaration indicates the number of seconds a recovering primary must wait after it has received its peer’s lease database before it can assume the primary role and begin processing DHCP packets. This is because the mclt value is the maximum time a lease may be extended by a server when its partner is down. Thus by waiting mclt seconds, the leases provided by this server prior to failure and extended by its peer should be expired.

 split index: This declaration defines a load balancing split between two peers. If a hash of the client’s MAC address within a DHCP packet is less than index, this server processes the DHCP packet; otherwise, it drops it assuming its partner will handle it. The hash value is between 0 and 256, so a value of 256 means no load balancing, while 128 means a 50-50 load balance split.

 hba bitmap: This declaration can be used in lieu of the split statement to provide more granular delineationof the split from a 50/50 vs. 100/0 choice to a bitwise delineation. The bitmap is formatted as a 32-byte hexadecimal string, with each byte separated by a colon. For example, the following statement is equivalent to split 128. hba ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00;

Configuration at primary DHCP server:

#
# /etc/dhcpd.conf for primary DHCP server
#

authoritative;
ddns-update-style none;

failover peer "dhcp-failover" {
primary; # declare this to be the primary server
address 192.168.200.2;
port 647;
peer address 192.168.200.3;
peer port 647;
max-response-delay 30;
max-unacked-updates 10;
load balance max seconds 3;
mclt 1800;
split 128;
}

subnet 192.168.200.0 netmask 255.255.255.0 {
option subnet-mask 255.255.255.0;
option broadcast-address 192.168.200.255;
option routers 192.168.200.1;
option domain-name-servers 192.168.200.1;
pool {
failover peer "dhcp-failover";
max-lease-time 1800; # 30 minutes
range 192.168.200.100 192.168.200.254;
}

}

Configuration at secondary DHCP server:

MODE DHCP FAILOVER CONFIGURED WITH PARTNER DOWN (MAINTAINANCE) :

Manually place a server in a partner-down state in order to perform maintenance on the partner. This state suspends lease binding updates and heartbeats. Setting a server in the partner-down state may be performed using the DHCP API or by updating the DHCP configuration file using the failover peer “name” state { } expression, then restarting the server.The syntax of this statement is as follows:

failover peer “name” state {
my state partner-down;
peer state state at date;
}
OR
#!/bin/sh
omshell << EOF
connect
new failover-state
set name = "foo"
open
set local-state = 1
update
EOF

Here we have placed this server (my state) to partner-down, omitting the date, and placed the peer into another state (valid states)

failover.h:

num failover_state {
unknown_state = 0, /* XXX: Not a standard
state. */
startup = 1,
normal = 2,
communications_interrupted = 3,
partner_down = 4,
potential_conflict = 5,
recover = 6,
paused = 7,
shut_down = 8,
recover_done = 9,
resolution_interrupted = 10,
conflict_done = 11,
}

CHALLENGES WITH PARTNER DOWN CONFIGURATION: As server both servers are initially enters into partner down state and then move to potential-conflict state.So possibilities of unusual behavior on Potential-conflict state as:

POTENTIAL-CONFLICT state: This state indicates that the two servers are attempting to re-Integrate with each other, but at least one of them was running in a state that did not guarantee automatic reintegration would be possible. In POTENTIAL-CONFLICT state the servers may determine that the same IP address has been offered and accepted by two different DHCP clients.It is a goal of this protocol to minimize the possibility that POTENTIAL-CONFLICT state is ever entered.

Upon entry to POTENTIAL-CONFLICT state: When a primary server enters POTENTIAL-CONFLICT state it should request that the secondary send it all updates of which it is currently unaware by sending an UPDREQ message to the secondary server.A secondary server entering POTENTIAL-CONFLICT state will wait for the primary to send it an UPDREQ message.

Operation in POTENTIAL-CONFLICT state: Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming DHCP requests.

Transitions out of POTENTIAL-CONFLICT state : Communications fails with the partner while in POTENTIAL-CONFLICT state, then the server will transition to RESOLUTION-INTERRUPTED state.Whenever either server receives an UPDDONE message from its partner while in POTENTIAL-CONFLICT state, it MUST transition to a new state.The primary MUST transition to CONFLICT-DONE state, and the secondary must transition to NORMAL state. This will cause the primary server to leave POTENTIAL-CONFLICT state prior to the secondary, since the primary sends an UPDREQ message and receives an UPDDONE before the secondary sends an UPDREQ message and receives its UPDDONE message. When a secondary server receives an indication that the primary server has made a transition from POTENTIAL-CONFLICT to CONFLICT-DONE state, it SHOULD send an UPDREQ message to the primary server.

Primary Secondary
Server Server

Figure : Transition out of POTENTIAL-CONFLICT

RESOLUTION-INTERRUPTED state:

This state indicates that the two servers were attempting to re-integrate with each other in POTENTIAL-CONFLICT state, but communications failed prior to completion of re-integration.If the servers remained in POTENTIAL-CONFLICT while communications was interrupted, neither server would be responsive to DHCP client requests, and if one server had crashed, then there might be no server able to process DHCP requests.

Upon entry to RESOLUTION-INTERRUPTED state:

When a server enters RESOLUTION-INTERRUPTED state it should raise an alarm condition to alert administrative staff of a problem in the DHCP subsystem.

Transitions out of RESOLUTION-INTERRUPTED state:

If an external command is received by a server in RESOLUTION-INTERRUPTED state informing server that its partner is down, it will transition immediately into PARTNER-DOWN state.If communications is restored with the other server, then the server in RESOLUTION-INTERRUPTED state will transition into POTENTIAL-CONFLICT state.

SOLUTION: Initially both DHCP servers are intensionally put into partner-down state.It is required for handling those scenariors where only single DHCP server is present or healthy while initialization.It handles those scenariors where another DHCP server is under maintainence while initialization.So,single DHCP server can remain in partner down and serves all DHCP clients.

Even if both DHCP servers are healthy and up & running then also both DHCP servers are initially in partner down and then go to potential-conflict state as both are in partner down state.But after communicating with each other they quickly transist into normal state after resolving the conflicts.As soon as they come to normal state , they start serving the DHCP clients.

If by any chance any DHCP server is down while other DHCP server is in potential-conflict state, they are unable to communicate and recover from the conflict and then transits into resolution –interrupted state.Although it is rare scenario but once it goes to resolution-interruted state it won’t recover from that and donot serve any DHCP clients. If other server is crashed(down permanently) then administrator shouls be known about it and put intensionally dhcp Server currently in resolution-interrupted state to partner-down state.

Note: If other DHCP server will up while other DHCP server is in resolution-interrupted state then they will transits into potential-conflict state and then recovers from the conflict.

Possibility of incoming problem with current configuration of DHCP Failover: May have issues with dynamic bootp leases.Need some investigation.It is not desirable to intensionally put server in potential-conflict state that is indirectly forced by us due to putting servers in partner-down state (initially).

If negative pool request is seens on console. To resolve this we have to move server in partner-down intensionally if it transition into communication-interrupted state.

Happy Reading!

DHCP Failover protocol for achieving load balancing while PXE boot

DHCP protocol can be used as bootp protocol for assigning IP address to PXE clients which can further download OS image from DHCP server and boots using downloaded operating system image.

Need of DHCP Fail-over protocol:

Prerequisites for DHCP failover for proper functioning as bootp protocol

Configurations for Enabling DHCP Failover protocol :

RESOLUTION-INTERRUPTED state:

Upon entry to RESOLUTION-INTERRUPTED state:

Transitions out of RESOLUTION-INTERRUPTED state:

References

Your comment on this post:

Related Articles

What is Diameter Load Balancing

Criteria of Load Balancing

Open Source Load Balancer