Provisioning Ubuntu VM’s with Cobbler

I’ve been playing with Cobbler at home lately now that my server was upgraded to an quad core with the magic vmx flag and ran into an issue deploying Ubuntu VM’s with it.  The install itself and import of the distro is pretty straight-forward, Canonical has some documentation on the process.  Koan can be used for provisoning VM’s (as is mentioned in the docs), however I have some prior Cobbler experience with CentOS and would like to develop this further with Ubuntu.

The problem I encountered was the following: during the install process Ub would not detect the virtual disks and an error is thrown, “no root filesystem is defined”.  I’m using the default KVM virtio disk bus type here and apparently the debian-installer will not detect these with the default configuration.  If you launch a shell and check, /dev/vda exists and running fdisk on it seems to suggest all is good.  Also, running the install via a CD/ISO works just fine as well.  The problem lies with Cobbler.

Eventually I narrowed it down to the preseed file.  When you import the distro it’s mentioned in the Ubuntu Cobbler preseed docs that a default preseed file is generated.  What it doesn’t mention is that something in this configuration is not compatible with virtio disk types.  I haven’t narrowed it down, instead I just copied the Ubuntu profile to a new one and changed the kickstart/preseed to /var/www/cobbler/ks_mirror/[ubuntu-distro-name]/preseed/ubuntu-server-minimalvm.seed.  I’ve got some more work to do on the preseed, I’m not all too familiar with them yet but planning to change that.

Migrating mail server VM to a new host

I’ve been working on migrating a virtual host over to Rackspace which mainly runs a mail server among a few other small items.  I wasn’t 100% sure how smooth the process would be, expecting to hit at least a few road bumps along the way.  The first one I encountered was issues surrounding MX entries and the simplistic nature of the DNS record editor at Rackspace – most of my emails sent from my home PC were bouncing back 550 failed recipient verification.  This was just a dry run however as when the domain was with my previous hoster I just used my registrar’s DNS, when I switched back the problem seemed to be resolved.

However the second issue I hit had me stumped for a few days.  One of the reasons I migrated (besides price) was greater flexibility; Rackspace gave me more options for distros to choose from and I thought their overall interface was cleaner and designed better.  So when I provisioned the new VM I gave Ubuntu a shot since I run it on my home network I’m a bit more familiar with how I want to configure the box for the software I run at least.  After the DNS/mail issue was resolved everything seemed solid except for a random, albeit fairly minor problem.  For some odd reason hostname resolution replied with “hostname: temporary failure in name resolution” randomly.  I was getting emails from cronjobs running with this error which I found a bit strange.  While I was tinkering with the mail problem I also built a CentOS VM real quick and didn’t notice the error occurring with that host.  I double-checked and made sure the resolv.conf was identical, then /etc/hosts, then nsswitch.conf and so on, all the files seemed the same or at least close enough that I didn’t think it would be a problem.  I made sure DNS resolution worked on the machine and ensured any iptables rules were not in place.  What caught me as the strangest part was the fact it randomly worked and randomly didn’t, there did not seem to be any sort of reproducibility in the issue.  I even ran an strace and compared logs from instances it worked and when it didn’t.  ‘hostname -f’ also took a second or two to reply rather then an immediate response.

Eventually I figured I’d just add an alias to /etc/hosts with the local non-FQDN hostname.  I also noticed then that the /etc/hosts didn’t seem to have an extra carriage return at the end, I put one in and bingo!  Problem fixed.  Looking back through the strace logs I saw upon closer inspection that it didn’t actually read in the second line which had the FQDN hostname, the first for localhost was OK but then it stopped further parsing.  For some reason CentOS behaves differently as I saw – the hosts file was identical (except for the IP’s of course) – it too was missing a carriage return but strace revealed that it parsed the file just fine.  Just in case any one is wondering I was testing this on Ubuntu Lucid 10.04.2 LTS and CentOS 5.5.

::sigh::  Ah well at least I can cancel the plan with my original hoster now. 🙂

Upgrading EOL Ubuntu installations

I have a number of Ubuntu boxes laying around and gotten a bit lazy keeping some of the lesser-used ones up to date.  I realized this after trying an apt-get update resulted in 404 errors, oops.  Since I couldn’t directly do a dist-upgrade I checked the Ubuntu wiki for upgrading EOL installations, the process is pretty simple.

All you basically need to do is update your /etc/apt/sources.list and replace us.archives.ubuntu.com (or whatever servers you are using) with old-releases.ubuntu.com, setting the release for your current distro correctly of course.  If it’s a desktop system you may need to install or ugprade update-manager package and/or ubuntu-desktop as well.  Then a simple aptitude update && aptitude safe-upgrade and do-release-upgrade should take care of your needs.  If you are multiple releases behind you will need to upgrade from one release to the next individually one at a time, you can’t skip directly to the latest so it may take some time.  Otherwise it’s pretty straightforward and from my experience thus far very pain-free which is always a plus.

SysAdvent time!

Well December is here now and the SysAdvent calendar is back again!…  thanks to Matt for reminding me.  For those who aren’t familiar, it’s a sysadmin advent calendar similar to the Perl Advent calendar.  Every day is a new posting about something system administration related.

Since it’s the 2nd today, we’ve had two postings.  Yesterday’s was about Linux Containers (LXC), a type of OS-level virtualization similar to OpenVZ and vserver.  These provide a very low-level of virtualization; it’s based upon chroots and namespace partitioning.  The advantages are that it’s high-performance as only one kernel is running but the trade-off is that it does not provide a large amount of flexibility: same environment, same distribution, same kernel, and so on.  It’s also mentioned that LXC is supported with libvirt now, very nice to see.  I wanted to play with OpenVZ for some time now due to it’s low overhead but haven’t gotten a chance, now I think I’ll look at LXC.

Today’s post is called Going Parallel and it’s focused on methods used to parallelize shell scripts for increased performance.  Tools mentioned include xargs, cluster shell, func, capistrano among many others.  It’s a good article and outlines the general idea very well.  We use cluster shell regularly at work and looking  to use func and interact directly with our applications in the near future.  Reminds me of the blog posting I saw at last.fm where they implemented MapReduce in the shell!

Now that I think about it the Perl advent calendar has a nice entry today on Set::Array…  it wraps up the traditional array functionality into a class which also provides tools from Set Theory, union/join/intersect/unique/etc.  Very powerful!

Repository Management

For those of you who are unaware, the latest Ubuntu release – Jaunty – was released several days ago.  Normally, the fastest way to get the latest version is to torrent an ISO…  the repositories are so overloaded attempting to do an upgrade is not even remotely possible.  However, there is an alternative I stumbled upon.  Instead of using the default Ubuntu repositories, select the fastest mirror, apt-get update, then upgrade away!  I was getting sustained rates of 300 KB/s without any issue during my upgrade.

On a related note, I’ve considered tinkering with creating my own local repository mirror.  Not that I have nearly enough machines to make it necessary, but it would be an entertaining exercise.  Even found a basic HOW-TO or two.  However I have heard of potential issues: it can take weeks to fully mirror several distributitions (several GB each) and with an incomplete repository it would be somewhat pointless to use.  Luckily there seems to be an easy solution with mod_rewrite.

Remote monitoring with apticron and logcheck

I wanted to write a brief posting on some basic ways to help remotely administer Ubuntu/Debian boxes.  Over the past few months I’ve been tinkering with various methods of handling this and what I’ve come up with seems to work fairly well.  It basically consists of two applications: apticron, which monitors repositories for package updates, and logcheck, which monitors logs in for any security or other noteworthy entries.

Apticron is very easy to set up, it’s in the repositories and requires basically no configuration.  It will drop a script in /etc/cron.daily and that is about it, emailing any reports to root.  Of course this can be modified through a .forward or an entry in /etc/aliases.

Logcheck is fairly simple to set up as well – it is also in the repositories.  Once installed, edit the /etc/logcheck/logcheck.conf file to configure.  The first thing you will want to set is the REPORTLEVEL setting, options are “workstation”, “server” (default value), or “paranoid”.  I use server on mine, which gives a good amount of detail. I would advise against using paranoid unless the server is extremely locked down and users do not typically login.  Workstation is good for a desktop environment.  The only other variable I edited was SENDMAILTO.  Logcheck works by basically comparing each  logentry against a set of regular expressions and generate a report if it does not match.  I had to modify one or two regex’s slightly to fix false positives, if you want my changes just ask and I’ll send them over.

One other small gem I want to mention : gkrellm.  I use this on both my desktop and server, it is invaluable for providing real-time system performance metrics.  Sure, it does not have any logging capabilities and thus unsuitable in a large-scale environment but for keeping an eye on one or two boxes it fits the bill quite nicely.

Ubuntu Server – First Intrepid boot FAIL

Well, the upgrade to Intrepid went smoothly during the install process itself.  However after a reboot, the system hung partially through boot and dropped to a initramfs shell claiming “cannot find root device /dev/disk/by-uuid/50128bb8…” and “Gave up waiting for root device.”  Wonderful.  Tinkered around a bit, tried mounting drives manually only they were not listed in /dev.  Attempted booting old 2.6.24-18 kernel which worked fine.  Aha, so it’s something related to new kernel.  Did a quick search which revealed the following bug:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/290153

Apparently on certain hardware the kernel has a bug which causes a long timeout for the SCSI/SATA bus.  It took a good 2-3 minutes on my system but when I left it idle while I was reading the bug report on my desktop system, a bunch more lines flew by from the initramfs prompt about ata bus reset and detecting new drives.  After that a simple ‘exit’ from prompt continued a normal boot.

It’s a fairly important bug but at least a workaround exists.  I’ve tinkered with adding the ‘rootdelay’ option to my menu.lst but have not found the best match yet.  Maybe I’ll just leave it as is, my server almost never gets rebooted.  You’re instilling me with a lot of confidence doc, I mean Intrepid.  Definitely going to make a full backup of my desktop machine before attempting upgrade on that one.

Upgrading Ubuntu boxes to Intrepid

So I am going through the process of upgrading my server to 8.10.  A quite useful HOWTO on howtoforge.com can be found guiding through the process (they also document upgrading from Desktop version as well).

I was not sure which exact command to run given that my headless server obviously doesn’t have update-manager running.  The HOWTO covers usage of the ‘do-release-upgrade’ command.  Only thing I ran beforehand was my rootfs rsync script to make a backup copy of my OS drive incase the worst happen.

If this runs smoothly I will make a backup copy of my desktop rootfs drive and do a similar upgrade to Intrepid.  I am already aware of one or two things I’m not keen on with Intrepid, notably that btnx is not compatible!  For those not aware, btnx was the premier application for configuring and making use of every single one of those buttons on the higher-end mice.  I have a Logitech MX Laser something and have it set up perfectly, tilt wheel left/right for forward/back in Firefox, extra buttons for minimize or close windows (Ctrl-W), etc.  I spend weeks trying to get it working the way I wanted with xmodmap and that ended in nothing but frustration.  I’m sure there will be some other things that don’t work quite the way I would like so a mirrored backup drive pre-upgrade is nice to have.

Random md-device recovery?

I was taking a mid-afternoon nap (yes at 3 am, I work nights) and I came back to my PC to see CPU usage on my server hovering around 15% – not at idle like usual.  Doing a quick check revealed md0_raid5 and md0_resync running which is normally not a good sign.

mdadm –detail /dev/md0 showed the following:

    Update Time : Sun Oct  5 03:22:19 2008
          State : clean, recovering
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
         Layout : left-symmetric
     Chunk Size : 64K
 Rebuild Status : 85% complete

Uh oh.  Why was the array rebuilding itself?  All drives were listed as active and working …  but did we experience a drive momentarily dropping from the array or a SATA device reset?  Was this a sign of impending hardware failure?  Tailing /var/log/messages displayed this useful piece of information:

Oct  5 01:06:01 rigel md: data-check of RAID array md0

Ok, so “data-check” doesn’t sound so worrysome.  A quick Google search revealed this nice gem:

root@rigel:~# tail /etc/cron.d/mdadm
# By default, run at 01:06 on every Sunday, but do nothing unless the day of
# the month is less than or equal to 7. Thus, only run on the first Sunday of
# each month. crontab(5) sucks, unfortunately, in this regard; therefore this
# hack (see #380425).
6 1 * * 0 root [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet

Ah, so this is the first Sunday of the month and the check kicked off at 1:06 AM.  You trixies Ubuntu.  Apparently a bug has been filed causing performance issues on some boxes.  Good idea to verify data integrity, although slightly more obvious notice would be nice.