Nagios and (multiple service) downtime scheduling

Lets say you’ve got a fairly extensive Nagios configuration, and have multiple nagios services depending on specific services, such as a network link.

Occasionally, external providers schedule outages, or internally you arrange for outage periods for services or infrastructure to go offline.

In nagios, the best way to handle this is to use the “schedule downtime” feature on the affected service.

However, sometimes there are multiple services, and it can be tedious to schedule them all for downtime – and doing it that way doesn’t accurately show that there is a single outage.

Matt’s solution? Lets create a ‘downtime’ service, that is ‘OK’ normally, but goes into ‘WARNING’ when downtime is scheduled. We can do that with the following pieces of Nagios configuration.


define host {
use                     generic-host
host_name               downtime
alias                   downtime
check_command           return-ok
}


define service {
host_name               downtime
service_description     downtime 1
check_command           return-numeric!$SERVICEDOWNTIME$
use                     generic-service
max_check_attempts      1
normal_check_interval   5
}

That ‘check_command’ basically means “If we haven’t scheduled downtime for this service, everything is good”.

Now, we can use a servicedependency from your normal services, to depend on the “downtime” service… and “bingo” – scheduling an outage on the ‘downtime’ service will have a cascading effect.

I’m currently using this for some external provider network links (when I get a Planned Maintenance Event Notice I can schedule that in nagios, then forget about it – nagios will remember it for me, and if I look during the outage, it’ll show me that it is in downtime) and for some power circuits.

One of the main reasons I currently like is approach is it agrees with my “can we see what is going wrong and why” view, and can show more clearly in nagios-dashboard applications what the cause of a problem is.

It would be sensible to extend this further – I have replaced the 'return-numeric' check_command with a check script that checks $SERVICEDOWNTIME$ as well as checking for upcoming scheduled downtime in the nagios database.

Reverse DNS for IPv6 PTR Ranges

One of the (many) challenges facing those who are looking at deploying IPv6 is Reverse DNS (PTR records).

Under IPv4, a single DNS Zone for PTR records might be at most 300 lines long.

With IPv6, a single DNS Zone for PTR records for a single subnet would have 2^16 (18,446,744,073,709,551,615) entries.

Therefore, under IPv4, it was common to pre-populate the PTR records. Under IPv6, that gets prohibitive.

So, what is the solution?

The solution I first came across is by Kazunori Fujiwara, who has written a DNS server in Perl specifically for this task. The server basically does some simple pattern matching to convert PTR and AAAA records.

Query 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.b.d.0.1.0.0.2.ip6.arpa
Response 1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.b.d.0.1.0.0.2.ip6.arpa IN PTR 20010db8000000000000000000000001.user.example.jp.
Query 20010db8000000000000000000000001.user.example.jp
Response 20010db8000000000000000000000001.user.example.jp. IN AAAA 2001:db8::1

This means pre-population of zones is not required – the entries are generated on the fly. The downside to this is an extra name-server daemon needs to be run.

Kazunori Fujiwara’s server can be found here: http://member.wide.ad.jp/~fujiwara/v6rev.html. Some small customisation may be required to match the requirements at your site.

Another solution I have found is implemented a similar way, however using PowerDNS’s “pipe” backend to generate the data on the fly, much like the previous example. I haven’t tried this myself, as I do not use PowerDNS, but more information can be found here: http://hyse.org/v6rev/.

Using nagios to verify firewall

Here’s a suggestion – use Nagios to validate/verify your firewall rules.

You can use “check_tcp“/”check_udp” to check that a port is open, and pair either of those commands up with “negate” to make sure the firewall is blocking traffic.

This could be combined with firewall ruleset tools, to ensure what you *think* is blocked, is.

Another tip: If you’re going to do this, make sure your nagios check_interval is something pretty high.

Views on Email Etiquette

Subject Lines

Keep them relevant, do not be afraid to change them if the discussion changes course – especially if you want to involve other people.

Top Post vs Inline

If the discussion is using inline posting,keep using inline, but if everyone is top-posting, then keep top posting. Do not switch an email discussion between inline and top posting, even if you have your own views on which style to use. Personally, I dislike top-posting, but it is more harmful for everyone to suddenly start replying inline in a long top-posted discussion.

HTML: Colours, Fonts, Italics, Bold

Bad. Don’t do it. It makes the emails hard to read (especially inline colours), and does not always turn out on all email clients. Do not assume everyone is using the same email client as you.

When to Forward, what to forward and to where

Does everyone in the department, and everyone you’ve ever had coffee with need to receive that forward? If it affects a whole team, maybe use the team’s email list address. If it is a request or FYI, maybe there is a more appropriate place – such as an issue tracking system, a news page, or a heading in a regular bulletin.

Keep it Simple (Summarise!)

No one likes to have to spend an hour reading a long history before they even understand what the topic is about. Please post a summary – especially when involving a new person in the discussion. They can refer to the long history if they need to, but let them get involved quickly by providing a relevant summary, hopefully including why they are now involved.

 

Stolen Identity?

I keep getting somebody trying to use my email address to sign up for information and email announcements. I’m pretty sure its not me doing it whilst I’m asleep, because I’m not really interested in football teams or unit rental prices on the other side of the world.

And now I’m seeing a few password reset attempts for various different websites, such as Twitter.

I wonder who out there thinks they are me?

Using IPv6 on the desktop

It seems that there are still a lot of barriers to adopting IPv6 at the desktop-level. I believe all the network-level hurdles are solved or identified (routing, firewall, subnetting and basic device support), however there appear to still be a number of hurdles left.

For example, it appears that DHCPv6 (DHCP for IPv6) support is inconsistent and/or lacking on a number of platforms. For administrators who’ve come from IPv4, DHCPv6 is probably the first thing they’ll look at – and it doesn’t work “right” (as defined by an administrator who still thinks the IPv4 way).

There are only two ways to get IPv6 working at the moment – either static configuration, or using SLAAC (stateless address autoconfiguration). SLAAC initially sounds like a replacement for IPv6, right? “Address autoconfiguration” sounds great – but ultimately, SLAAC is not a replacement for DHCP. SLAAC works by receiving ‘Router Advertisements’ (RA) which announce the IPv6 prefix. The host then converts the interface MAC address (48 bits) into a 64 bit address, and appends that to the end of the (64bit) prefix that was announced, to give a 128 bit address.

Where IPv4-thinking admins start having problems with this is that with SLAAC there is no logging kept on the DHCP server (there isn’t one), so we’ll need to keep logs of neighbour tables (“ARP Entries”). Because the RA doesn’t hand out DNS recursive server addresses, or NTP servers, or any of the other options that DHCP (for IPv4) hands out, they need to be configured staticly.

There is still some conversations ongoing about DNS recursive servers – the typical deployments seem to be either to do DNS requests over IPv4, or to run DHCPv6 to only hand out DNS details, and use RA for the rest.

The fact that this discussion is still going on, and that there still appears to be significant impediments to DHCPv6 is scary. The NANOG and IPV6-OPS email lists both regularly have discussions about this – subscribe to the email lists or check out the archives if you’re interested.

We are expected to adopt this IPv6 stuff now – but we still don’t have some of the basics down pat.

 

Gnome zoom, driving me crazy

Just now I’ve found the solution to something thats been bugging me a lot with recent versions of Gnome.

Occasionally, I’d accidentally hit a key combination and I’d invite some “zoom” function – but it didn’t appear to have anything to do with gnome-orca or gnome-mag – I couldn’t find any keybindings that had anything to do with it.

I’ve now found out that this is due to compviz. To turn this off, install the compizconfig-settings-manager package,

sudo apt-get install compizconfig-settings-manager
/usr/bin/ccsm

and turn off “Enhanced Zoom Desktop“. The key combination I was accidently hitting was “Super + Button2”.

Using apt-cacher-ng to handle deb files instead of squid

An interesting idea – instead of configuring a custom Debian Package Repository to point to a local mirror, proxy or cache, we can team up with Squid (or similar) to redirect package-ish stuff to the local package cache. This works therefore for transparent proxies, as well as for machines with a local HTTP_PROXY (or equivalent) set.