ZeroMQ libraries & stability / radarwatch update

I’ve been hacking with ZeroMQ for some time now…  reading through the excellent guide and working on the preliminary steps to incorporate it into the new version (migrated to GitHub from Google Code) of my Doppler radar analysis application.  Multi-threading support was on the list of original criterion and considering many of the other analysis features I’d like to eventually add, full distributed processing would be the best design.  I’m not an expert hand at SMP application design and although the implementation needs here would be fairly trivial, choosing a good library to handle the nasty bits would make things vastly easier.

So I tinkered with several of their samples designs, implemented a few of the patterns in my code and tested things out.  Initial results with just a simple multithreaded service seemed work OK at first glance but eventually dropped data caused issues so I decided to try something with reliability such as the paranoid pirate pattern (robust reliable queuing).  Thinking back, I should have realised something was up when it exhibited that issue on both inproc IPC and tcp/localhost.  No matter what I was trying to do porting examples into my code nothing remained stable; either it crashed immediately or processed a few messages then died.  C and C++ code showed similar results.  Running the examples on their own worked fine.  The front-end/client-side Python application worked fine.  I gave it a fresh shot with another example – the Majordomo pattern (service-oriented reliable queuing) again showed similar symptoms of instability.  Commenting out huge sections of my code thinking it may of had something to do with a broken pointer or some other oversight on my part had little to no results.  Eventually I started comparing the GCC invocations and noticed there were some minor differences.  After 5 minutes of testing the weeks of tinkering finally yielded some fruit… Google confirmed my suspicions: linking ZeroMQ with -pg for gprof support crashes the library.

It’s amazing how quick it can be to find what you’re looking for when you know exactly what you want.  At least now I can get back to implementing 0MQ in my code.

Provisioning Ubuntu VM’s with Cobbler

I’ve been playing with Cobbler at home lately now that my server was upgraded to an quad core with the magic vmx flag and ran into an issue deploying Ubuntu VM’s with it.  The install itself and import of the distro is pretty straight-forward, Canonical has some documentation on the process.  Koan can be used for provisoning VM’s (as is mentioned in the docs), however I have some prior Cobbler experience with CentOS and would like to develop this further with Ubuntu.

The problem I encountered was the following: during the install process Ub would not detect the virtual disks and an error is thrown, “no root filesystem is defined”.  I’m using the default KVM virtio disk bus type here and apparently the debian-installer will not detect these with the default configuration.  If you launch a shell and check, /dev/vda exists and running fdisk on it seems to suggest all is good.  Also, running the install via a CD/ISO works just fine as well.  The problem lies with Cobbler.

Eventually I narrowed it down to the preseed file.  When you import the distro it’s mentioned in the Ubuntu Cobbler preseed docs that a default preseed file is generated.  What it doesn’t mention is that something in this configuration is not compatible with virtio disk types.  I haven’t narrowed it down, instead I just copied the Ubuntu profile to a new one and changed the kickstart/preseed to /var/www/cobbler/ks_mirror/[ubuntu-distro-name]/preseed/ubuntu-server-minimalvm.seed.  I’ve got some more work to do on the preseed, I’m not all too familiar with them yet but planning to change that.

mdadm: Device or resource busy error

I recently needed to rebuild a RAID1 array after a reboot for some odd reason and afterwards I was unable to assemble the array.  mdadm came back and reported “Device or resource busy” on one of the drives.  I couldn’t figure out what the issue was originally as it wasn’t mounted and no other processes were using the drive via lsof.  Eventually I tracked it down to a changed UUID – my fstab was trying to mount the old mdadm array and it locked the resource.  I checked it by doing ‘ls’ on /dev/disk/by-uuid/.  Updating fstab with a new UUID, rebooting to clean up things and reassembling the array solved it.  Just a useful item to keep in mind.

Debugging logcheck rules and security events

I’ve been a big fan of logcheck for monitoring my servers, when properly configured it works very well and is pretty flexible.   Unless you are using a centralized logging system such as Splunk most of us are guilty of not thoroughly checking our logs.  I like to use logcheck to perform a simple audit of what my systems are up to, it’s not perfect but certainly better then nothing.

My configuration has been tweaked a bit, adding some custom regex’s for ignoring a few common items.  I found a nice debugging tip on a old posting from the logcheck-devel mailing list which mentions using egrep to test new rules:

cat <logfile> | egrep -v -f /etc/logcheck/ignore.d.workstation/regex

This has saved me a lot of time and frustration when making the final tweaks to a regex.  However, recently I had some difficulties ignoring what seemed to be particularly stubborn security events.  From best I could tell, grep suggested that my expression was filtering properly, yet logcheck was still reporting on these events.  Finally reading through the README I discovered patterns cancelling security alarms must be places in violations.ignore.d, not ignore.d.workstation/server.  Something to be mindful of.

Verizon/Samsung 4G MiFi and Ubuntu

I recently got a MiFi card for my on-call rotation with work and had some issues getting it to play nice with the Ubuntu install on my laptop.  It’s a Samsung SCH-LC11 and various iMac’s in the office connected to it just fine.  My laptop would connect, then almost immediately disconnect.  Pretty much unusable.  I connected to the office wifi just fine so I know that wasn’t an issue.

A quick search found a solution on the ubuntu forums.  Basically, you need to connect to the device (obviously using a Windows or Mac) and log into the web admin page (default of 192.168.1.1).  Check the wifi configuration security; the encryption protocol is probably set to WPA with TKIP.  You need to set it to WPA2 with AES.

 

Samba shares multple connections to a shared resource

I don’t normally post much about Windows here as I have a limited presence with it at home – only a VM for Lightroom – but I came across this counter-intuitive problem earlier in the week.  When building a new VM I discovered for some odd reason Windows would not allow me to map multiple shares to my Samba server under different paths with the same credentials.  I kept getting the error:

Multiple connections to a server or shared resource by the same user,
using more then one user name, are not allowed.

Now I was using the same user name on all of these and it worked perfectly fine on my old 2k3 VM and I changed nothing on the Samba side.  No attempts at removing the shares and recreating them or reboots (even 3 of them!) seemed to help.  Sigh…  A bit of Googling suggested deleting all the shares explicitly on the command line, then try recreating them.  This seemed to fix it!

net use * /delete

Migrating mail server VM to a new host

I’ve been working on migrating a virtual host over to Rackspace which mainly runs a mail server among a few other small items.  I wasn’t 100% sure how smooth the process would be, expecting to hit at least a few road bumps along the way.  The first one I encountered was issues surrounding MX entries and the simplistic nature of the DNS record editor at Rackspace – most of my emails sent from my home PC were bouncing back 550 failed recipient verification.  This was just a dry run however as when the domain was with my previous hoster I just used my registrar’s DNS, when I switched back the problem seemed to be resolved.

However the second issue I hit had me stumped for a few days.  One of the reasons I migrated (besides price) was greater flexibility; Rackspace gave me more options for distros to choose from and I thought their overall interface was cleaner and designed better.  So when I provisioned the new VM I gave Ubuntu a shot since I run it on my home network I’m a bit more familiar with how I want to configure the box for the software I run at least.  After the DNS/mail issue was resolved everything seemed solid except for a random, albeit fairly minor problem.  For some odd reason hostname resolution replied with “hostname: temporary failure in name resolution” randomly.  I was getting emails from cronjobs running with this error which I found a bit strange.  While I was tinkering with the mail problem I also built a CentOS VM real quick and didn’t notice the error occurring with that host.  I double-checked and made sure the resolv.conf was identical, then /etc/hosts, then nsswitch.conf and so on, all the files seemed the same or at least close enough that I didn’t think it would be a problem.  I made sure DNS resolution worked on the machine and ensured any iptables rules were not in place.  What caught me as the strangest part was the fact it randomly worked and randomly didn’t, there did not seem to be any sort of reproducibility in the issue.  I even ran an strace and compared logs from instances it worked and when it didn’t.  ‘hostname -f’ also took a second or two to reply rather then an immediate response.

Eventually I figured I’d just add an alias to /etc/hosts with the local non-FQDN hostname.  I also noticed then that the /etc/hosts didn’t seem to have an extra carriage return at the end, I put one in and bingo!  Problem fixed.  Looking back through the strace logs I saw upon closer inspection that it didn’t actually read in the second line which had the FQDN hostname, the first for localhost was OK but then it stopped further parsing.  For some reason CentOS behaves differently as I saw – the hosts file was identical (except for the IP’s of course) – it too was missing a carriage return but strace revealed that it parsed the file just fine.  Just in case any one is wondering I was testing this on Ubuntu Lucid 10.04.2 LTS and CentOS 5.5.

::sigh::  Ah well at least I can cancel the plan with my original hoster now. 🙂