top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

DHCP Failover protocol for achieving load balancing while PXE boot

+2 votes
736 views

DHCP protocol can be used as bootp protocol for assigning IP address to PXE clients which can further download OS image from DHCP server and boots using downloaded operating system image.

Following are initial messaging between DHCP Client and DHCP server to assign IP address from address pool.

DHCP Client          DHCP Server 

---------------------> DHCP Discover

<--------------------- DHCP Offer

---------------------> DHCP Request

<--------------------- DHCP Ack

Following are the DHCP message types

  •  DHCPDISCOVER - Client broadcast to locate available servers
  • DHCPOFFER - Server to client response offering configuration parameters
  • DHCPREQUEST - Client broadcast requesting offered parameters
  • DHCPDECLINE - Client to server notification that IP address is in use
  • DHCPACK - Server to client response confirming a request
  • DHCPNAK - Server to client response denying a request
  • DHCPRELEASE - Client to server request to relinquish IP address
  • DHCPINFORM - Client to server request for configuration parameters

Need of DHCP Fail-over protocol:

When there are large number of DHCP clients whose IP address would be assigned from DHCP server and which further downloads OS image from same DHCP server for booting. In such case,single DHCP server will be overloaded and total time for assigning IPs to all DHCP clients will be more. To balance the load, DHCP failover protocol came into picture.Here there are two DHCP servers, one is primary and other one is secondary server.Based on DHCP configurations, some of DHCP clients' IP would be assigned from primary DHCP server and rest DHCP clients' IP address would be assigned from secondary DHCP server. In case anyone of DHCP server is down, all DHCP clients's IP address  would be assigned  from active DHCP server.image

Prerequisites for DHCP failover for proper functioning as bootp protocol

  • NTP servers running on both DHCP server machines should be same i.e time syn is required for proper working of DHCP failover.
  • ISC DHCP version on both primary and secondary DHCP server should be same.

 Configurations for Enabling DHCP Failover protocol :

Below are key parameters in configurations.

 failover peer “name” { } 
Where name is the name you define for the pairing (and include the quotes). The curly brackets enclose the statements below providing a full definition of the failover configuration for the server.

 mclt seconds: The maximum client lead time (mclt) declaration indicates the number of seconds a recovering primary must wait after it has received its peer’s lease database before it can assume the primary role and begin processing DHCP packets. This is because the mclt value is the maximum time a lease may be extended by a server when its partner is down. Thus by waiting mclt seconds, the leases provided by this server prior to failure and extended by its peer should be expired.

 split index: This declaration defines a load balancing split between two peers. If a hash of the client’s MAC address within a DHCP packet is less than index, this server processes the DHCP packet; otherwise, it drops it assuming its partner will handle it. The hash value is between 0 and 256, so a value of 256 means no load balancing, while 128 means a 50-50 load balance split.

 hba bitmap: This declaration can be used in lieu of the split statement to provide more granular delineationof the split from a 50/50 vs. 100/0 choice to a bitwise delineation. The bitmap is formatted as a 32-byte hexadecimal string, with each byte separated by a colon. For example, the following statement is equivalent to split 128. hba ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:ff:
00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00;

  • Configuration at primary DHCP server:

#
# /etc/dhcpd.conf for primary DHCP server
#

authoritative;
ddns-update-style none;

failover peer "dhcp-failover" {
  primary; # declare this to be the primary server
  address 192.168.200.2;
  port 647;
  peer address 192.168.200.3;
  peer port 647;
  max-response-delay 30;
  max-unacked-updates 10;
  load balance max seconds 3;
  mclt 1800;
  split 128;
}

subnet 192.168.200.0 netmask 255.255.255.0 {
  option subnet-mask 255.255.255.0;
  option broadcast-address 192.168.200.255;
  option routers 192.168.200.1;
  option domain-name-servers 192.168.200.1;
  pool {
    failover peer "dhcp-failover";
    max-lease-time 1800; # 30 minutes
    range 192.168.200.100 192.168.200.254;
  }

}

  • Configuration at secondary DHCP server:

MODE DHCP FAILOVER  CONFIGURED WITH PARTNER DOWN (MAINTAINANCE) :

Manually place a server in a partner-down state in order to perform maintenance on the partner. This state suspends lease binding updates and heartbeats. Setting a server in the partner-down state may be performed using the DHCP API or by updating the DHCP configuration file using the failover peer “name” state { } expression, then restarting the server.The syntax of this statement is as follows:

failover peer “name” state {
my state partner-down;
peer state state at date;
}
OR
#!/bin/sh
omshell << EOF
connect
new failover-state
set name = "foo"
open
set local-state = 1
update
EOF

Here we have placed this server (my state) to partner-down, omitting the date, and placed the peer into another state (valid states) 

failover.h:

num failover_state {
unknown_state = 0, /* XXX: Not a standard
state. */
startup = 1,
normal = 2,
communications_interrupted = 3,
partner_down = 4,
potential_conflict = 5,
recover = 6,
paused = 7,
shut_down = 8,
recover_done = 9,
resolution_interrupted = 10,
conflict_done = 11,
}

CHALLENGES WITH PARTNER DOWN CONFIGURATION: As server both servers are initially enters into partner down state and then move to potential-conflict state.So possibilities of unusual behavior on Potential-conflict state as:

 POTENTIAL-CONFLICT state: This state indicates that the two servers are attempting to re-Integrate with each other, but at least one of them was running in a state that did not guarantee automatic reintegration would be possible.  In POTENTIAL-CONFLICT state the servers may determine that the same IP address has been offered and accepted by two different DHCP clients.It is a goal of this protocol to minimize the possibility that POTENTIAL-CONFLICT state is ever entered.

Upon entry to POTENTIAL-CONFLICT state: When a primary server enters POTENTIAL-CONFLICT state it should request that the secondary send it all updates of which it is currently unaware by sending an UPDREQ message to the secondary server.A secondary server entering POTENTIAL-CONFLICT state will wait for the primary to send it an UPDREQ message.

 Operation in POTENTIAL-CONFLICT state: Any server in POTENTIAL-CONFLICT state MUST NOT process any incoming DHCP requests.

 Transitions out of POTENTIAL-CONFLICT state : Communications fails with the partner while in POTENTIAL-CONFLICT state, then the server will transition to RESOLUTION-INTERRUPTED state.Whenever either server receives an UPDDONE message from its partner while in POTENTIAL-CONFLICT state, it MUST transition to a new state.The primary MUST transition to CONFLICT-DONE state, and the secondary must transition to NORMAL state.  This will cause the primary server to leave POTENTIAL-CONFLICT state prior to the secondary, since the primary sends an UPDREQ message and receives an UPDDONE before the secondary sends an UPDREQ message and receives its UPDDONE message. When a secondary server receives an indication that the primary server has made a transition from POTENTIAL-CONFLICT to CONFLICT-DONE state, it SHOULD send an UPDREQ message to the primary server.

Primary                                 Secondary
Server                                   Server

| |
POTENTIAL-CONFLICT POTENTIAL-CONFLICT
| |
| >--UPDREQ--------------------> |
| |
| <---------------------BNDUPD--< |
| >--BNDACK--------------------> |
... ...
| |
| <---------------------BNDUPD--< |
| >--BNDACK--------------------> |
| |
| <--------------------UPDDONE--< |
CONFLICT-DONE |
| >--STATE--(CONFLICT-DONE)----> |
| <---------------------UPDREQ--< |
| |
| >--BNDUPD--------------------> |
| <---------------------BNDACK--< |
... ...
| >--BNDUPD--------------------> |
| <---------------------BNDACK--< |
| |
| >--UPDDONE-------------------> |
| NORMAL
| <------------STATE--(NORMAL)--< |
NORMAL |
| >--STATE--(NORMAL)-----------> |
| |
| <--------------------POOLREQ--< |
| >------POOLRESP-(n)----------> |
| addresses |

Figure : Transition out of POTENTIAL-CONFLICT

RESOLUTION-INTERRUPTED state:

This state indicates that the two servers were attempting to re-integrate with each other in POTENTIAL-CONFLICT state, but communications failed prior to completion of re-integration.If the servers remained in POTENTIAL-CONFLICT while communications was interrupted, neither server would be responsive to DHCP client requests, and if one server had crashed, then there might be no server able to process DHCP requests.

Upon entry to RESOLUTION-INTERRUPTED state:

When a server enters RESOLUTION-INTERRUPTED state it should raise an alarm condition to alert administrative staff of a problem in the DHCP subsystem.

Transitions out of RESOLUTION-INTERRUPTED state:

If an external command is received by a server in RESOLUTION-INTERRUPTED state informing server that its partner is down, it will transition immediately into PARTNER-DOWN state.If communications is restored with the other server, then the server in RESOLUTION-INTERRUPTED state will transition into POTENTIAL-CONFLICT state.

SOLUTION:  Initially both DHCP servers are intensionally put into partner-down state.It is required for handling those scenariors where only single DHCP server is present or healthy while initialization.It handles those scenariors where another DHCP server is under maintainence while initialization.So,single DHCP server can remain in partner down and serves all DHCP clients.

Even if both DHCP servers are healthy and up & running then also both DHCP servers are initially in partner down and then go to potential-conflict state as both are in partner down state.But after communicating with each other they quickly transist into normal state after resolving the conflicts.As soon as they come to normal state , they start serving the DHCP clients.

If by any chance any DHCP server is down while other DHCP server is in potential-conflict state, they are unable to communicate and recover from the conflict and then transits into resolution –interrupted state.Although it is rare scenario but once it goes to resolution-interruted state it won’t recover from that and donot serve any DHCP clients. If other server is crashed(down permanently) then administrator shouls be known about it and put intensionally dhcp Server currently in resolution-interrupted state to partner-down state.

Note:  If other DHCP server will up while other DHCP server is in resolution-interrupted state then they will transits into potential-conflict state and then recovers from the conflict.

Possibility of incoming problem with current configuration of DHCP Failover: May have issues with dynamic bootp leases.Need some investigation.It is not desirable to intensionally put server in potential-conflict state that is indirectly forced by us due to putting servers in partner-down state (initially).

If negative pool request is seens on console. To resolve this we have to move server in partner-down intensionally if it transition into communication-interrupted state.

 

Happy Reading!

posted Aug 17, 2016 by Yogesh Kumar

  Promote This Article
Facebook Share Button Twitter Share Button LinkedIn Share Button
Good one, keep it up. May be you like to beautify to make it more readable.
sure :)


Related Articles

What is Diameter Load Balancing

When multiple machines can share the load based on some logic or without the logic one can put a node to distribute the traffic among these machine called load balancer and this process is called load balancing.
When these machines are diameter server then load balancer in this case has to be diameter aware and the process is called diameter load balancing and node is called diameter load balancer.

Diameter Load Balancer

Criteria of Load Balancing

Diameter load balancer can have various criteria of the load balancing while one has to keep in mind that the messages related to the same session should go to same server in case of stateful call flow else loadbalancer is free to distribute the load. The criteria of load balancing can be following and these can be clubbed with others.
1. Equipment Configuration
2. Session Configuration
3. Interface Configuration
4. Application Configuration
5. PLMN configuration
6. IMSI/MSISDN configuration
7. Transport Configuration (TCP/SCTP/IPv4/IPv6)
8. Server Availability and Server Congestion

Open Source Load Balancer

Opensource diameter like freediameter and Mobicents Diameter both provides the load balancing capability which can be enhanced based on the need (open source can be changed as per need) however all commercial grade diameter stack has a support of Load Balancing based on different criteria. Let the vendor comment on this article to get a back link :)

As usual hiding my identity :)

READ MORE
...