ZeroMQ libraries & stability / radarwatch update

I’ve been hacking with ZeroMQ for some time now…  reading through the excellent guide and working on the preliminary steps to incorporate it into the new version (migrated to GitHub from Google Code) of my Doppler radar analysis application.  Multi-threading support was on the list of original criterion and considering many of the other analysis features I’d like to eventually add, full distributed processing would be the best design.  I’m not an expert hand at SMP application design and although the implementation needs here would be fairly trivial, choosing a good library to handle the nasty bits would make things vastly easier.

So I tinkered with several of their samples designs, implemented a few of the patterns in my code and tested things out.  Initial results with just a simple multithreaded service seemed work OK at first glance but eventually dropped data caused issues so I decided to try something with reliability such as the paranoid pirate pattern (robust reliable queuing).  Thinking back, I should have realised something was up when it exhibited that issue on both inproc IPC and tcp/localhost.  No matter what I was trying to do porting examples into my code nothing remained stable; either it crashed immediately or processed a few messages then died.  C and C++ code showed similar results.  Running the examples on their own worked fine.  The front-end/client-side Python application worked fine.  I gave it a fresh shot with another example – the Majordomo pattern (service-oriented reliable queuing) again showed similar symptoms of instability.  Commenting out huge sections of my code thinking it may of had something to do with a broken pointer or some other oversight on my part had little to no results.  Eventually I started comparing the GCC invocations and noticed there were some minor differences.  After 5 minutes of testing the weeks of tinkering finally yielded some fruit… Google confirmed my suspicions: linking ZeroMQ with -pg for gprof support crashes the library.

It’s amazing how quick it can be to find what you’re looking for when you know exactly what you want.  At least now I can get back to implementing 0MQ in my code.

Provisioning Ubuntu VM’s with Cobbler

I’ve been playing with Cobbler at home lately now that my server was upgraded to an quad core with the magic vmx flag and ran into an issue deploying Ubuntu VM’s with it.  The install itself and import of the distro is pretty straight-forward, Canonical has some documentation on the process.  Koan can be used for provisoning VM’s (as is mentioned in the docs), however I have some prior Cobbler experience with CentOS and would like to develop this further with Ubuntu.

The problem I encountered was the following: during the install process Ub would not detect the virtual disks and an error is thrown, “no root filesystem is defined”.  I’m using the default KVM virtio disk bus type here and apparently the debian-installer will not detect these with the default configuration.  If you launch a shell and check, /dev/vda exists and running fdisk on it seems to suggest all is good.  Also, running the install via a CD/ISO works just fine as well.  The problem lies with Cobbler.

Eventually I narrowed it down to the preseed file.  When you import the distro it’s mentioned in the Ubuntu Cobbler preseed docs that a default preseed file is generated.  What it doesn’t mention is that something in this configuration is not compatible with virtio disk types.  I haven’t narrowed it down, instead I just copied the Ubuntu profile to a new one and changed the kickstart/preseed to /var/www/cobbler/ks_mirror/[ubuntu-distro-name]/preseed/ubuntu-server-minimalvm.seed.  I’ve got some more work to do on the preseed, I’m not all too familiar with them yet but planning to change that.

Migrating mail server VM to a new host

I’ve been working on migrating a virtual host over to Rackspace which mainly runs a mail server among a few other small items.  I wasn’t 100% sure how smooth the process would be, expecting to hit at least a few road bumps along the way.  The first one I encountered was issues surrounding MX entries and the simplistic nature of the DNS record editor at Rackspace – most of my emails sent from my home PC were bouncing back 550 failed recipient verification.  This was just a dry run however as when the domain was with my previous hoster I just used my registrar’s DNS, when I switched back the problem seemed to be resolved.

However the second issue I hit had me stumped for a few days.  One of the reasons I migrated (besides price) was greater flexibility; Rackspace gave me more options for distros to choose from and I thought their overall interface was cleaner and designed better.  So when I provisioned the new VM I gave Ubuntu a shot since I run it on my home network I’m a bit more familiar with how I want to configure the box for the software I run at least.  After the DNS/mail issue was resolved everything seemed solid except for a random, albeit fairly minor problem.  For some odd reason hostname resolution replied with “hostname: temporary failure in name resolution” randomly.  I was getting emails from cronjobs running with this error which I found a bit strange.  While I was tinkering with the mail problem I also built a CentOS VM real quick and didn’t notice the error occurring with that host.  I double-checked and made sure the resolv.conf was identical, then /etc/hosts, then nsswitch.conf and so on, all the files seemed the same or at least close enough that I didn’t think it would be a problem.  I made sure DNS resolution worked on the machine and ensured any iptables rules were not in place.  What caught me as the strangest part was the fact it randomly worked and randomly didn’t, there did not seem to be any sort of reproducibility in the issue.  I even ran an strace and compared logs from instances it worked and when it didn’t.  ‘hostname -f’ also took a second or two to reply rather then an immediate response.

Eventually I figured I’d just add an alias to /etc/hosts with the local non-FQDN hostname.  I also noticed then that the /etc/hosts didn’t seem to have an extra carriage return at the end, I put one in and bingo!  Problem fixed.  Looking back through the strace logs I saw upon closer inspection that it didn’t actually read in the second line which had the FQDN hostname, the first for localhost was OK but then it stopped further parsing.  For some reason CentOS behaves differently as I saw – the hosts file was identical (except for the IP’s of course) – it too was missing a carriage return but strace revealed that it parsed the file just fine.  Just in case any one is wondering I was testing this on Ubuntu Lucid 10.04.2 LTS and CentOS 5.5.

::sigh::  Ah well at least I can cancel the plan with my original hoster now. 🙂

Linux containers and ufw

I’ve been playing with Linux containers (LXC) recently thanks to the SysAdvent calendar and ran into a small issue where network traffic was blocked to the VM  (using a bridged interface) when ufw was running on the host. Granting an ufw allow rule to the guest IP did not seem to help either, it seems there is a bug filed for this.

Two solutions are presented on the Launchpad page:

  • Disable netfilter on the bridge on sysctl by adding the following lines to /etc/sysctl.conf, /etc/ufw/sysctl.conf or /etc/sysctl.d/ufw:
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
  • Configure iptables to permit forwarded traffic across the bridge by adding the following line (before the last COMMIT command) to /etc/ufw/before.rules:
-I FORWARD -m physdev --physdev-is-bridged -j ACCEPT

Whichever you choose the relevant changes must be made active (reloading sysctl or restarting ufw).  I used the ufw/iptables solution and seemed to work fine.

I’m hoping to have another post up here soon about LXC itself, I’ve made a custom VM creation script for Debian and trying to get one working for Ubuntu as well.  From what I’ve seen so far it’s a very nice package although still much under development; I’m not so sure if I’d recommend it to be used in production environments but I see it maturing significantly in the near future.

Intrepid upgrade done!

Overall I am fairly impressed with Intrepid.  They seemed to have fixed a number of bugs I encountered (especially that annoying menu pulldown delay with dual screen configurations), and I’ve actually switched back to full compiz mode.  Most things work fairly smoothly and performance seems a bit better as well.  There is one strange bug I’ve found where there is a vertical black bar on the right-hand side of my 2nd monitor, which cuts off the gnome-panel at the top of my screen.  What doesn’t make much sense to me is that maximized windows do not extend over into the dark limbo, unlike the panel:

Odd desktop/resolution bug

Odd desktop/resolution bug

Strange.  Maybe it didn’t auto-detect my old 2005FPW right, pretty sure I had a manually configured xorg.conf just prior to the upgrade  I’ll have to look into it further.

Regardless, now it comes time to design/implement more items that have been held back too long.  Firstly will entail setting up a basic SSH key repository for my various machines and accounts.  There are a number of good articles about this topic.  What I find particularly useful are the ‘from’ and ‘command’ options within OpenSSH (referenced in the last two URL’s).  I have been wondering about a solution to this for some time; I wanted to create passphrase-less SSH keys used for automation & scripting but did not want full access to anyone who may have compromised the keys.  This solution seems like it will fit perfectly!  Already have a few sample ideas to use this with.

After setting up SSH keys, next will come full NIS implementation.  I already have it installed on my server, but not configured or running at the current time.  NFS-mounted home directories always seem like a good option, and I would normally agree but I am not sure if I want the usability of my desktop system tied dependent upon the server being up and running.  Right now if the server goes down, I can still function fine on my desktop – just without access to my RAID storage array.  Also due to come will be a post on performance logging & monitoring, I love my GkrellM but something with metrics storage would be nice.