Thursday, December 04, 2014

Cisco EEM applet to monitor and repair broken DHCP leases

Dirty hack to renew DHCP lease on Cisco 881 if Internet access is lost. This can happen when Cisco has valid DHCP lease from ISP and then cable modem or DSL router is power cycled but switch between Cisco and upstream device keeps link-state up. Options are either to fix it manually (pull cable / change config / reboot Cisco) or hack something like below. I'm also pinging over VPN tunnel to Intranet as it would be shame to break this due Level 3 and Google blocking ICMP which might happen one day.


Below is setup so it releases/renews DHCP only if both our own VPN server, Level 3 and Google are down. There's no point activating repair if Internet access is fine but our own VPN is unreachale as problem is elsewhere and flapping connection on client side will only make troubleshooting harder.

My first attempt was based on just track statements like all examples I've found are. Problem with that is if link is broken when Cisco is rebooted repair part never activates. Using ipsla directly in applet solves this, but value of _ipsla_condition is not updated while applet is running. To solve this there's track statements which are only used to break away from infinite loop... If someone has better solution to this I'd love to hear since I'm bit worried about f.e. race conditions that might occur with current approach.

Depending on your setup rather than renew DHCP you might want to take some other action such as taking Fa4 temporarily to shutdown mode.

! Ping our Level 3 DNS every 13 seconds
ip sla 1
 icmp-echo 4.2.2.2 source-interface fa4
  threshold 3000
  timeout 3000
  frequency 13
  exit
!
! Ping Google backup DNS every 13 seconds
ip sla 2
 icmp-echo 8.8.4.4 source-interface fa4
  threshold 3000
  timeout 3000
  frequency 13
  exit
!
! Ping our VPN internal IP every 3 seconds
ip sla 3
 icmp-echo 192.168.200.1 source-interface Tunnel200
  vrf INTRA
  threshold 3000
  timeout 3000
  frequency 3
  exit
!
! Run checks 24/7
ip sla schedule 1 start-time now life forever
ip sla schedule 2 start-time now life forever
ip sla schedule 3 start-time now life forever
!
! After around 30s of packet loss we considere Internet broken
ip sla reaction-config 1 react timeout threshold-type consecutive 2
ip sla reaction-config 2 react timeout threshold-type consecutive 2
ip sla reaction-config 3 react timeout threshold-type consecutive 10
!
! Don't miss this, otherwise your EEM applet won't trigger
ip sla enable reaction-alerts 
! We need old style track statements to abort infinite repair loop later
track 1 ip sla 1 reachability
 exit
!
track 2 ip sla 2 reachability
 exit
!
track 3 ip sla 3 reachability
 exit 
! EEM applet to take corrective actions based on IP SLA probes
event manager applet WAN-monitor
 ! Check all three and run forever (default is 20s lifetime)
 event tag ping-1 ipsla operation-id 1 reaction-type timeout
 event tag ping-2 ipsla operation-id 2 reaction-type timeout
 event tag ping-3 ipsla operation-id 3 reaction-type timeout maxrun 0
 trigger
  ! Only activate if all three are down indicating issue is on our end
  correlate event ping-1 and event ping-2 and event ping-3
  exit
 ! We need "track" to break from loop as $_ipsla_condition is set only once
 action 010 comment // Uplink ipsla triggered, if it went down trigger repair
 action 020 comment // Do nothing if we're here because link just came back
 action 030 if $_ipsla_condition eq "Occurred"
 action 040  syslog msg "### HAXOR ### Internet connection lost ###"
 action 050  comment // Wait 30 seconds to synchronize track and ip sla states
 action 060  syslog msg "### HAXOR ### Delaying to avoid flapping ###"
 action 070  wait 30
 action 080  set retries 1
 action 090  cli command "enable"
 action 100  comment // Infinite loop is Infinite
 action 110  while 1 eq 1
 action 120   comment // Exit loop if ICMP echo track 1 succeeds
 action 130   track read 1
 action 140   if $_track_state eq "up"
 action 150    syslog msg "### HAXOR ### Internet connection resumed ###"
 action 160    exit
 action 170   end
 action 180   comment // Exit loop if ICMP echo track 2 succeeds
 action 190   track read 2
 action 200   if $_track_state eq "up"
 action 210    syslog msg "### HAXOR ### Internet connection resumed ###"
 action 220    exit
 action 230   end
 action 240   comment // Exit loop if ICMP echo track 3 succeeds
 action 250   track read 3
 action 260   if $_track_state eq "up"
 action 270    syslog msg "### HAXOR ### Internet connection resumed ###"
 action 280    exit
 action 290   end
 action 300   comment // Release DHCP IP
 action 310   syslog msg "### HAXOR ### Releasing DHCP IP on Fa4, attempt $retries ###"
 action 320   cli command "release dhcp fa4"
 action 330   comment // Wait 10 seconds before asking for new lease
 action 340   wait 10
 action 350   syslog msg "### HAXOR ### Renewing DHCP IP on Fa4, attempt $retries ###"
 action 360   cli command "renew dhcp fa4"
 action 370   comment // Keep counter on how many times we've retried this
 action 380   increment retries 1
 action 390   comment // Wait 90 seconds before re-checking connection status
 action 400   syslog msg "### HAXOR ### Waiting 90 seconds before retrying ###"
 action 410   wait 90
 action 420   comment // Almost Infinite loop is Almost Infinite
 action 430  end
 action 440 end
 exit

No comments:

Post a Comment

Got something to say?!