Booting LiveCD’s with Cobbler

As the size of my network grows it becomes increasingly more convenient to be able to boot a live CD rescue environment easily.  Even though most of the hosts are virtual and can be booted directly with an ISO I think it’s even better to PXE boot them directly from Cobbler since it’s already set up.  The documentation for booting live CD’s on Cobbler’s wiki did not seem to work for the Ubuntu Rescue Remix, it’s possible that some modifications need to be made for Ubuntu-based images (as the functionality provided in the livecd-iso-to-pxeboot script seems to be based on RedHat-style distros) but instead I used nfsroot which worked without issue.

First, you’ll want to mount your Live CD via loopback and run a Cobbler import:

# mount ubuntu-rescue-remix-12.04.iso /mnt -o loop
# cobbler import --name=ubuntu-rescue-remix-12.04 --path=/mnt --breed=ubuntu

Next I configured NFS, my configuration is a bit unique as I serve NFS from my file/VM server and Cobbler is a VM which has it’s own application files mounted via NFS as well:

# grep cobbler /etc/exports
/srv/nfs/cobbler <cobblervmip>(rw,async,subtree_check,no_root_squash)
/srv/nfs/cobbler/webroot/cobbler <dhcp-class-c>/24(ro,sync,subtree_check,no_wdelay,insecure_locks,no_root_squash,insecure)

Create the distro as usual and set the kernel options to boot from NFS:

# cobbler distro add --name=ubuntu-rescue-remix-12.04 --kernel=/var/www/cobbler/ks_mirror/ubuntu-rescue-remix-12.04/casper/vmlinuz --initrd=/var/www/cobbler/ks_mirror/ubuntu-rescue-remix-12.04/casper/initrd.gz
# cobbler distro edit --name=ubuntu-rescue-remix-12.04 --kopts='nfsroot=<cobbler IP>:/var/www/cobbler/ks_mirror/ubuntu-rescue-remix-12.04 ip=dhcp netboot=nfs boot=casper

There was one minor issue which I encountered while attempting to mount the nfsroot, “short read: 24 < 28“.  The only thing a Google search turned up was a posting which goes into a bit more detail on the source of the problem.  Apparently if you are have  configured a hosts.allow/deny file it will be an issue because the kernel will use NFSv3 for an nfsroot.  I mistakenly assumed NFS was working fine when I was booted up since my system was configured to use version 4.  Also, the tcp_wrapper hosts.allow/deny files do not recognize CIDR notation (unlike /etc/exports), you’ll have to use “192.168.0.” to specify a /24.

You will also need to add a profile to actually boot the live CD but that is straightforward. Other then that it should work without issue!

mdadm: Device or resource busy error

I recently needed to rebuild a RAID1 array after a reboot for some odd reason and afterwards I was unable to assemble the array.  mdadm came back and reported “Device or resource busy” on one of the drives.  I couldn’t figure out what the issue was originally as it wasn’t mounted and no other processes were using the drive via lsof.  Eventually I tracked it down to a changed UUID – my fstab was trying to mount the old mdadm array and it locked the resource.  I checked it by doing ‘ls’ on /dev/disk/by-uuid/.  Updating fstab with a new UUID, rebooting to clean up things and reassembling the array solved it.  Just a useful item to keep in mind.

Debugging logcheck rules and security events

I’ve been a big fan of logcheck for monitoring my servers, when properly configured it works very well and is pretty flexible.   Unless you are using a centralized logging system such as Splunk most of us are guilty of not thoroughly checking our logs.  I like to use logcheck to perform a simple audit of what my systems are up to, it’s not perfect but certainly better then nothing.

My configuration has been tweaked a bit, adding some custom regex’s for ignoring a few common items.  I found a nice debugging tip on a old posting from the logcheck-devel mailing list which mentions using egrep to test new rules:

cat <logfile> | egrep -v -f /etc/logcheck/ignore.d.workstation/regex

This has saved me a lot of time and frustration when making the final tweaks to a regex.  However, recently I had some difficulties ignoring what seemed to be particularly stubborn security events.  From best I could tell, grep suggested that my expression was filtering properly, yet logcheck was still reporting on these events.  Finally reading through the README I discovered patterns cancelling security alarms must be places in violations.ignore.d, not ignore.d.workstation/server.  Something to be mindful of.

Migrating mail server VM to a new host

I’ve been working on migrating a virtual host over to Rackspace which mainly runs a mail server among a few other small items.  I wasn’t 100% sure how smooth the process would be, expecting to hit at least a few road bumps along the way.  The first one I encountered was issues surrounding MX entries and the simplistic nature of the DNS record editor at Rackspace – most of my emails sent from my home PC were bouncing back 550 failed recipient verification.  This was just a dry run however as when the domain was with my previous hoster I just used my registrar’s DNS, when I switched back the problem seemed to be resolved.

However the second issue I hit had me stumped for a few days.  One of the reasons I migrated (besides price) was greater flexibility; Rackspace gave me more options for distros to choose from and I thought their overall interface was cleaner and designed better.  So when I provisioned the new VM I gave Ubuntu a shot since I run it on my home network I’m a bit more familiar with how I want to configure the box for the software I run at least.  After the DNS/mail issue was resolved everything seemed solid except for a random, albeit fairly minor problem.  For some odd reason hostname resolution replied with “hostname: temporary failure in name resolution” randomly.  I was getting emails from cronjobs running with this error which I found a bit strange.  While I was tinkering with the mail problem I also built a CentOS VM real quick and didn’t notice the error occurring with that host.  I double-checked and made sure the resolv.conf was identical, then /etc/hosts, then nsswitch.conf and so on, all the files seemed the same or at least close enough that I didn’t think it would be a problem.  I made sure DNS resolution worked on the machine and ensured any iptables rules were not in place.  What caught me as the strangest part was the fact it randomly worked and randomly didn’t, there did not seem to be any sort of reproducibility in the issue.  I even ran an strace and compared logs from instances it worked and when it didn’t.  ‘hostname -f’ also took a second or two to reply rather then an immediate response.

Eventually I figured I’d just add an alias to /etc/hosts with the local non-FQDN hostname.  I also noticed then that the /etc/hosts didn’t seem to have an extra carriage return at the end, I put one in and bingo!  Problem fixed.  Looking back through the strace logs I saw upon closer inspection that it didn’t actually read in the second line which had the FQDN hostname, the first for localhost was OK but then it stopped further parsing.  For some reason CentOS behaves differently as I saw – the hosts file was identical (except for the IP’s of course) – it too was missing a carriage return but strace revealed that it parsed the file just fine.  Just in case any one is wondering I was testing this on Ubuntu Lucid 10.04.2 LTS and CentOS 5.5.

::sigh::  Ah well at least I can cancel the plan with my original hoster now. 🙂

Permission issues on slave BIND nameservers

I’m working on several projects at work to enhance our infrastructure and bring automation to our environment through tools such as Cobbler/Puppet/Kerberos and most of these rely on a working DNS system to operate correctly.  After a quick refresh with BIND a primary nameserver was up and running pretty quickly.  The zonefiles were populated easily enough after hacking together some Python to auto-generate configuration files from our server MySQL database.  However I encountered issues when getting the slave nameserver up and running; I was getting errors such as “permission denied” when the slave was attempting to transfer the master zonefile.  I was pretty sure it was configured correctly, it only seemed to fail when creating the temporary file.  I checked the user permissions in the chroot and it all looked good, then I remembered this box was running SELinux.  Checking the BIND FAQ I quickly found the answer: by default named is only allowed to write to the following directories:

  • $ROOTDIR/var/named/slaves
  • $ROOTDIR/var/named/data
  • $ROOTDIR/var/tmp

with $ROOTDIR being the chroot specified in /etc/sysconfig/named.  The configuration files are in $ROOTDIR/var/named and of course I was naming my file “sec.ourdomain.net” as opposed to “slaves/sec.ourdomain.net”.  Oops.  Have to keep this one in mind!

LVM Loopback HOW-TO

This is a simple tutorial on setting up LVM on loopback devices, I’ve used it a few times for creating dynamic virtual disks; it came in particularly handy when archiving NEXRAD radar data for my radarwatchd project – using up all your inodes on several hundreds of thousands of 15Kb files doesn’t sound like my idea of fun.  Creating a virtual volume with reiserfs was a particularly handy solution in this case.

First Steps

  • Create several empty files with dd.  These will hold your physical volumes:
# dd if=/dev/zero of=lvmtest0.img bs=100 count=1M
1048576+0 records in
1048576+0 records out
104857600 bytes (105 MB) copied, 8.69891 s, 12.1 MB/s
# dd if=/dev/zero of=lvmtest1.img bs=100 count=1M
1048576+0 records in
1048576+0 records out
104857600 bytes (105 MB) copied, 8.69891 s, 12.1 MB/s
  • Link them to loopback devices with losetup: (see below if you run out of loopback devices)
# losetup /dev/loop0 lvmtest0.img
# losetup /dev/loop1 lvmtest1.img
  • Partition them with fdisk.  Create a single primary partition, full size of the device and set the type to Linux LVM (0x8e).  Shorthand commands with fdisk: n,p,1,Enter,Enter,t,8e,w.  I’ve also automated this somewhat with sfdisk:
  • # sfdisk /dev/loop0 << EOF
    ,,8e,,
    EOF
  • Install and configure LVM if needed.  In this test I used the filter settings in lvm.conf to ensure only loopback devices will be used with LVM:
filter = [ "a|/dev/loop.*|", "r/.*/" ]
  • Initialize LVM:
# vgscan
Reading all physical volumes.  This may take a while...
# vgchange -a
  • Create physical volumes with pvcreate:
# pvcreate /dev/loop0 /dev/loop1
 Physical volume "/dev/loop0" successfully created
 Physical volume "/dev/loop1" successfully created
  • Create a volume group then extend to include the second and any subsequent physical volumes:
# vgcreate testvg /dev/loop0
 Volume group "testvg" successfully created
# vgextend testvg /dev/loop1
 Volume group "testvg" successfully extended
  • Verify the operation was successful:
# vgdisplay -v
 Finding all volume groups
 Finding volume group "testvg"
 --- Volume group ---
 VG Name               testvg
 System ID
 Format                lvm2
 Metadata Areas        2
 Metadata Sequence No  2
 VG Access             read/write
 VG Status             resizable
 MAX LV                0
 Cur LV                0
 Open LV               0
 Max PV                0
 Cur PV                2
 Act PV                2
 VG Size               192.00 MB
 PE Size               4.00 MB
 Total PE              48
 Alloc PE / Size       0 / 0
 Free  PE / Size       48 / 192.00 MB
 VG UUID               1Gmt3W-ivMH-mXQH-HswP-tjHR-9mAZ-917z0g

 --- Physical volumes ---
 PV Name               /dev/loop0
 PV UUID               X11MYK-u8hk-4R26-CHuy-QUSw-2hLq-Notlnc
 PV Status             allocatable
 Total PE / Free PE    24 / 24

 PV Name               /dev/loop1
 PV UUID               dLKXlz-c536-9Elj-C2zZ-B4aw-69kj-zZ7PuN
 PV Status             allocatable
 Total PE / Free PE    24 / 24
  • Create a logical volume.  Use the largest available size reported to use by vgdisplay (Free PE/Size value):
# lvcreate -L192M -ntest testvg
 Logical volume "test" created
  • Finally create a filesystem on the new logical volume.  We’re using reiserfs for this example here:
# mkfs.reiserfs /dev/mapper/testvg-test
mkfs.reiserfs 3.6.21 (2009 www.namesys.com)

A pair of credits:
Alexander  Lyamin  keeps our hardware  running,  and was very  generous  to our
project in many little ways.

The  Defense  Advanced  Research  Projects Agency (DARPA, www.darpa.mil) is the
primary sponsor of Reiser4.  DARPA  does  not  endorse  this project; it merely
sponsors it.

Guessing about desired format.. Kernel 2.6.31-17-generic-pae is running.
Format 3.6 with standard journal
Count of blocks on the device: 49152
Number of blocks consumed by mkreiserfs formatting process: 8213
Blocksize: 4096
Hash function used to sort names: "r5"
Journal Size 8193 blocks (first block 18)
Journal Max transaction length 1024
inode generation number: 0
UUID: d50559af-5078-4de5-812a-264590e60177
ATTENTION: YOU SHOULD REBOOT AFTER FDISK!
 ALL DATA WILL BE LOST ON '/dev/mapper/testvg-test'!
Continue (y/n):y
Initializing journal - 0%....20%....40%....60%....80%....100%
Syncing..ok
ReiserFS is successfully created on /dev/mapper/testvg-test.
  • Mount the volume and you should be done:
# mount /dev/mapper/testvg-test /mnt/virtlvm
# df
Filesystem           1K-blocks      Used Available Use% Mounted on

/dev/mapper/testvg-test
 196596     32840    163756  17% /mnt/virtlvm

Growing the LVM volume

To expand the LVM volume you will follow similar steps to the ones stated above:

  • Create another virtual disk and partition with dd and fdisk:
# dd if=/dev/zero of=lvmtest2.img bs=100 count=1M
# fdisk /dev/loop2
  • Tie new disk image to a free loopback device with losetup:
# losetup /dev/loop2 lvmtest2.img
  • Create physical volumes on the new device with pvcreate:
# pvcreate /dev/loop2
Physical volume "/dev/loop2" successfully created
  • Extend the volume group:
# vgextend testvg /dev/loop2
 Volume group "testvg" successfully extended
  • Extend the logical volume:
# lvextend /dev/mapper/testvg-test /dev/loop2
 Extending logical volume test to 288.00 MB
 Logical volume test successfully resized
  • Resize the filesystem with resize_reiserfs:
# resize_reiserfs /dev/mapper/testvg-test
resize_reiserfs 3.6.21 (2009 www.namesys.com)

resize_reiserfs: On-line resizing finished successfully.

# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/testvg-test
                        294896     32840    262056  12% /mnt/virtlvm

Modifying the amount of loopback devices

  • To see all the loopback devices:
# ls -l /dev/loop*
  • Adding new loopback devices
  • If loopback support is compiled as a module:
# modprobe loop max_loop=64
  • To make permanent edit /etc/modules or /etc/modprobe.conf and use add one of the following:
loop max_loop=64
options loop max_loop=64
  • If loopback module is compiled into the kernel you will need to reboot, edit the kernel command-line parameters appending ‘max_loop=64’ to the end.
  • Loopback devices can also be dynamically created with mknod.  Loopback devices have a major number of 7 and minor number of the loopback device.  Keep in mind devices created this way will not be persistent and will disappear after a reboot:
# mknod -m660 /dev/loop8 b 7 8
# mknod -m660 /dev/loop9 b 7 9
# mknod -m660 /dev/loop10 b 7 10

That should cover mostly everything.  I’ve been informed that some of the steps may not be strictly necessary, such as the vgscan/vgchange.  I also believe the partitioning may not be needed as we’re using the full size of the virtual devices but it’s good practice nonetheless and definitely makes things clearer being able to see the LVM partitions.   Hope this helps!

Upgrading EOL Ubuntu installations

I have a number of Ubuntu boxes laying around and gotten a bit lazy keeping some of the lesser-used ones up to date.  I realized this after trying an apt-get update resulted in 404 errors, oops.  Since I couldn’t directly do a dist-upgrade I checked the Ubuntu wiki for upgrading EOL installations, the process is pretty simple.

All you basically need to do is update your /etc/apt/sources.list and replace us.archives.ubuntu.com (or whatever servers you are using) with old-releases.ubuntu.com, setting the release for your current distro correctly of course.  If it’s a desktop system you may need to install or ugprade update-manager package and/or ubuntu-desktop as well.  Then a simple aptitude update && aptitude safe-upgrade and do-release-upgrade should take care of your needs.  If you are multiple releases behind you will need to upgrade from one release to the next individually one at a time, you can’t skip directly to the latest so it may take some time.  Otherwise it’s pretty straightforward and from my experience thus far very pain-free which is always a plus.

SysAdvent time!

Well December is here now and the SysAdvent calendar is back again!…  thanks to Matt for reminding me.  For those who aren’t familiar, it’s a sysadmin advent calendar similar to the Perl Advent calendar.  Every day is a new posting about something system administration related.

Since it’s the 2nd today, we’ve had two postings.  Yesterday’s was about Linux Containers (LXC), a type of OS-level virtualization similar to OpenVZ and vserver.  These provide a very low-level of virtualization; it’s based upon chroots and namespace partitioning.  The advantages are that it’s high-performance as only one kernel is running but the trade-off is that it does not provide a large amount of flexibility: same environment, same distribution, same kernel, and so on.  It’s also mentioned that LXC is supported with libvirt now, very nice to see.  I wanted to play with OpenVZ for some time now due to it’s low overhead but haven’t gotten a chance, now I think I’ll look at LXC.

Today’s post is called Going Parallel and it’s focused on methods used to parallelize shell scripts for increased performance.  Tools mentioned include xargs, cluster shell, func, capistrano among many others.  It’s a good article and outlines the general idea very well.  We use cluster shell regularly at work and looking  to use func and interact directly with our applications in the near future.  Reminds me of the blog posting I saw at last.fm where they implemented MapReduce in the shell!

Now that I think about it the Perl advent calendar has a nice entry today on Set::Array…  it wraps up the traditional array functionality into a class which also provides tools from Set Theory, union/join/intersect/unique/etc.  Very powerful!

Quick removal of comments from C/C++ code

Working on another post – writing an optimization HOW-TO with valgrind (using my radarwatchd project as an example) and unfortunately realized the code I have in my Google code repository is fairly out-of-sync with the latest as I was working on some significant patches.  The changes incorporated numerous performance enhancements but there was a lot of old code I simply commented out temporarily while developing the new source.  Needing a quick way to diff the two without seeing any comments I stumbled across a solution with Perl.

Pretty simple: I have the current code in a subdirectory ./radarwatchd and the old version from the Google repo in ./googlecode/radarwatchd.  I made a copy of the sources as to not modify the original files, then ran a Perl construct to perform the regex.  A simple diff wrapped things up.

find {,googlecode/}radarwatchd/cached/ \( -name '*.cpp' -o -name '*.h' \) -exec cp -v '{}' '{}.nocomments' \;
find {,googlecode/}radarwatchd/cached/ -name '*.nocomments' -exec perl -0777 -pi -e 's{//.*?$|/\*.*?\*/}{ }smg' '{}' \;
diff -NBwar {,googlecode/}radarwatchd/cached/*.cpp.nocomments

The Perl regex is not crazy complex and there are probably some instances where it doesn’t work quite right but for a simple search this works well.  The only downside I had with this is that the comments are replaced by a single space, so you’ll need to run diff with -w to ignore whitespace.  With this particular invocation you can’t replace it with nothing as the -exec will see that first {} as the place to put the filename (rather then the last) and cause all sorts of problems.


					

Search/replace with confirmation within selection in Vim

As part of the engineering responsibilities at my job I routinely have to edit configuration files on a particular tier of our infrastructure when building new systems, putting in IP addresses for particular feed entries. Unfortunately no one has enough time to automate this with a proper  config generation tool so it’s something of a manual process at the current moment (hoping to change this soon with some Python when I get a moment).

A common task is to replace ‘a.b.c.d’ placeholders with the correct multicast addresses in a selection of lines but not on every single line. Obviously the ‘:s’ substitution command comes to mind with the ‘c’ flag to confirm, but when the file is hundreds of lines the fact substitution starts at the beginning of the file isn’t much help.

However, I sat down and looked into it a bit more and discovered a way to do do substitution with confirmation only on lines within a visual selection.

Make a visual selection with ‘v/V/^V’, then enter in a substitution command with the \%V prefix in your search regex:

:%s/\%Va\.b\.c\.d/1.2.3.4/gc

This will replace a.b.c.d with 1.2.3.4 within the selection and prompt you to confirm each match.  I also noticed you’ll need to remove the automatically entered ‘<,’> characters when going into command mode with a selection highlighted.

Yay for Vim!  Now only to get the Puppet infrastructure I started rolled out fully so I can get the vim-enhanced RPM installed on all these hundreds of boxes… 🙂