I recently needed to rebuild a RAID1 array after a reboot for some odd reason and afterwards I was unable to assemble the array. mdadm came back and reported “Device or resource busy” on one of the drives. I couldn’t figure out what the issue was originally as it wasn’t mounted and no other processes were using the drive via lsof. Eventually I tracked it down to a changed UUID – my fstab was trying to mount the old mdadm array and it locked the resource. I checked it by doing ‘ls’ on /dev/disk/by-uuid/. Updating fstab with a new UUID, rebooting to clean up things and reassembling the array solved it. Just a useful item to keep in mind.
I’ve been a big fan of logcheck for monitoring my servers, when properly configured it works very well and is pretty flexible. Unless you are using a centralized logging system such as Splunk most of us are guilty of not thoroughly checking our logs. I like to use logcheck to perform a simple audit of what my systems are up to, it’s not perfect but certainly better then nothing.
My configuration has been tweaked a bit, adding some custom regex’s for ignoring a few common items. I found a nice debugging tip on a old posting from the logcheck-devel mailing list which mentions using egrep to test new rules:
cat <logfile> | egrep -v -f /etc/logcheck/ignore.d.workstation/regex
This has saved me a lot of time and frustration when making the final tweaks to a regex. However, recently I had some difficulties ignoring what seemed to be particularly stubborn security events. From best I could tell, grep suggested that my expression was filtering properly, yet logcheck was still reporting on these events. Finally reading through the README I discovered patterns cancelling security alarms must be places in violations.ignore.d, not ignore.d.workstation/server. Something to be mindful of.
I don’t normally post much about Windows here as I have a limited presence with it at home – only a VM for Lightroom – but I came across this counter-intuitive problem earlier in the week. When building a new VM I discovered for some odd reason Windows would not allow me to map multiple shares to my Samba server under different paths with the same credentials. I kept getting the error:
Multiple connections to a server or shared resource by the same user, using more then one user name, are not allowed.
Now I was using the same user name on all of these and it worked perfectly fine on my old 2k3 VM and I changed nothing on the Samba side. No attempts at removing the shares and recreating them or reboots (even 3 of them!) seemed to help. Sigh… A bit of Googling suggested deleting all the shares explicitly on the command line, then try recreating them. This seemed to fix it!
net use * /delete
I wrote some time ago learning and briefly discussed how important it is for one to be continually learning. Along the same topic I recently stumbled across a posting by Ben Rockwood regarding knowledge, wisdom, and information. He wrote a very nice summary of the ideas of Russell Ackoff and W. Edwards Deming, Ackoff’s “Wisdom Hierarchy” being my favorite of the two. I won’t go into detail here, please read his excellent post for the juicy tidbits but it basically outlines the progression of things: from the low-level raw data, climbing up through information to knowledge, then understanding and finally (hopefully) arriving at wisdom.
It’s definitely something you can gloss over at first and think “yeah that’s obvious” but I highly recommend you read it and check out his second post with embedded videos from Dr. Ackoff himself on the subject. It’s very thought provoking and I think hits home the core reason behind a large amount of problems in the world around us today. One of the things I love the most is when he says there is a fundamental issue with our system of education; it’s not effective – who in the classroom learns the most? I’d have to agree with his answer – the teacher. We learn by teaching, not by being taught. This is definitely true and reminded me of recently watching a wonderful video on TED by Salman Khan of the Khan Academy, where he came to the same conclusion. The Khan Academy is an online compendium of educational videos which are helping to revolutionize the classroom and Salman is seeing the changes video teaching is bringing to students – peers are able to teach each other and therefore get a better grasp on the material themselves.
All in all, some very interesting ideas. I just wish this was more widely known among the general public.
I’ve been working on migrating a virtual host over to Rackspace which mainly runs a mail server among a few other small items. I wasn’t 100% sure how smooth the process would be, expecting to hit at least a few road bumps along the way. The first one I encountered was issues surrounding MX entries and the simplistic nature of the DNS record editor at Rackspace – most of my emails sent from my home PC were bouncing back 550 failed recipient verification. This was just a dry run however as when the domain was with my previous hoster I just used my registrar’s DNS, when I switched back the problem seemed to be resolved.
However the second issue I hit had me stumped for a few days. One of the reasons I migrated (besides price) was greater flexibility; Rackspace gave me more options for distros to choose from and I thought their overall interface was cleaner and designed better. So when I provisioned the new VM I gave Ubuntu a shot since I run it on my home network I’m a bit more familiar with how I want to configure the box for the software I run at least. After the DNS/mail issue was resolved everything seemed solid except for a random, albeit fairly minor problem. For some odd reason hostname resolution replied with “hostname: temporary failure in name resolution” randomly. I was getting emails from cronjobs running with this error which I found a bit strange. While I was tinkering with the mail problem I also built a CentOS VM real quick and didn’t notice the error occurring with that host. I double-checked and made sure the resolv.conf was identical, then /etc/hosts, then nsswitch.conf and so on, all the files seemed the same or at least close enough that I didn’t think it would be a problem. I made sure DNS resolution worked on the machine and ensured any iptables rules were not in place. What caught me as the strangest part was the fact it randomly worked and randomly didn’t, there did not seem to be any sort of reproducibility in the issue. I even ran an strace and compared logs from instances it worked and when it didn’t. ‘hostname -f’ also took a second or two to reply rather then an immediate response.
Eventually I figured I’d just add an alias to /etc/hosts with the local non-FQDN hostname. I also noticed then that the /etc/hosts didn’t seem to have an extra carriage return at the end, I put one in and bingo! Problem fixed. Looking back through the strace logs I saw upon closer inspection that it didn’t actually read in the second line which had the FQDN hostname, the first for localhost was OK but then it stopped further parsing. For some reason CentOS behaves differently as I saw – the hosts file was identical (except for the IP’s of course) – it too was missing a carriage return but strace revealed that it parsed the file just fine. Just in case any one is wondering I was testing this on Ubuntu Lucid 10.04.2 LTS and CentOS 5.5.
::sigh:: Ah well at least I can cancel the plan with my original hoster now. 🙂
I’m working on several projects at work to enhance our infrastructure and bring automation to our environment through tools such as Cobbler/Puppet/Kerberos and most of these rely on a working DNS system to operate correctly. After a quick refresh with BIND a primary nameserver was up and running pretty quickly. The zonefiles were populated easily enough after hacking together some Python to auto-generate configuration files from our server MySQL database. However I encountered issues when getting the slave nameserver up and running; I was getting errors such as “permission denied” when the slave was attempting to transfer the master zonefile. I was pretty sure it was configured correctly, it only seemed to fail when creating the temporary file. I checked the user permissions in the chroot and it all looked good, then I remembered this box was running SELinux. Checking the BIND FAQ I quickly found the answer: by default named is only allowed to write to the following directories:
with $ROOTDIR being the chroot specified in /etc/sysconfig/named. The configuration files are in $ROOTDIR/var/named and of course I was naming my file “sec.ourdomain.net” as opposed to “slaves/sec.ourdomain.net”. Oops. Have to keep this one in mind!
I’ve been playing with Linux containers (LXC) recently thanks to the SysAdvent calendar and ran into a small issue where network traffic was blocked to the VM (using a bridged interface) when ufw was running on the host. Granting an ufw allow rule to the guest IP did not seem to help either, it seems there is a bug filed for this.
Two solutions are presented on the Launchpad page:
- Disable netfilter on the bridge on sysctl by adding the following lines to /etc/sysctl.conf, /etc/ufw/sysctl.conf or /etc/sysctl.d/ufw:
net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0
- Configure iptables to permit forwarded traffic across the bridge by adding the following line (before the last COMMIT command) to /etc/ufw/before.rules:
-I FORWARD -m physdev --physdev-is-bridged -j ACCEPT
Whichever you choose the relevant changes must be made active (reloading sysctl or restarting ufw). I used the ufw/iptables solution and seemed to work fine.
I’m hoping to have another post up here soon about LXC itself, I’ve made a custom VM creation script for Debian and trying to get one working for Ubuntu as well. From what I’ve seen so far it’s a very nice package although still much under development; I’m not so sure if I’d recommend it to be used in production environments but I see it maturing significantly in the near future.
I have a number of Ubuntu boxes laying around and gotten a bit lazy keeping some of the lesser-used ones up to date. I realized this after trying an apt-get update resulted in 404 errors, oops. Since I couldn’t directly do a dist-upgrade I checked the Ubuntu wiki for upgrading EOL installations, the process is pretty simple.
All you basically need to do is update your /etc/apt/sources.list and replace us.archives.ubuntu.com (or whatever servers you are using) with old-releases.ubuntu.com, setting the release for your current distro correctly of course. If it’s a desktop system you may need to install or ugprade update-manager package and/or ubuntu-desktop as well. Then a simple aptitude update && aptitude safe-upgrade and do-release-upgrade should take care of your needs. If you are multiple releases behind you will need to upgrade from one release to the next individually one at a time, you can’t skip directly to the latest so it may take some time. Otherwise it’s pretty straightforward and from my experience thus far very pain-free which is always a plus.
Well December is here now and the SysAdvent calendar is back again!… thanks to Matt for reminding me. For those who aren’t familiar, it’s a sysadmin advent calendar similar to the Perl Advent calendar. Every day is a new posting about something system administration related.
Since it’s the 2nd today, we’ve had two postings. Yesterday’s was about Linux Containers (LXC), a type of OS-level virtualization similar to OpenVZ and vserver. These provide a very low-level of virtualization; it’s based upon chroots and namespace partitioning. The advantages are that it’s high-performance as only one kernel is running but the trade-off is that it does not provide a large amount of flexibility: same environment, same distribution, same kernel, and so on. It’s also mentioned that LXC is supported with libvirt now, very nice to see. I wanted to play with OpenVZ for some time now due to it’s low overhead but haven’t gotten a chance, now I think I’ll look at LXC.
Today’s post is called Going Parallel and it’s focused on methods used to parallelize shell scripts for increased performance. Tools mentioned include xargs, cluster shell, func, capistrano among many others. It’s a good article and outlines the general idea very well. We use cluster shell regularly at work and looking to use func and interact directly with our applications in the near future. Reminds me of the blog posting I saw at last.fm where they implemented MapReduce in the shell!
Now that I think about it the Perl advent calendar has a nice entry today on Set::Array… it wraps up the traditional array functionality into a class which also provides tools from Set Theory, union/join/intersect/unique/etc. Very powerful!
As I work on some of my projects further I’m looking into making my own PCB’s sometime hopefully in the near future. Previously I’ve only known about Eagle which seems to be something of a standard. There are limitations on their free version: namely only 2 layers, 100×80 mm PCB size and only non-profit use but I didn’t think of them to be an issue for a beginner such as myself. However if I ever want to work on anything bigger or sell my boards the license cost quickly rises… Chris and Dave from The Amp Hour made some good suggestions about needing updated pricing structure on their latest podcast (great podcast by the way if you are into hobbyist electronics). This made me think a bit and brought to mind a recent posting made on Dangerous Prototypes and Adafruit about a free (as in cost) PCB design tool called DesignSpark. They briefly outline it is free of the restrictions of Eagle, allows multiple sheets, and supports importing of Eagle libraries and designs. Unfortunately there is not a native Linux version which is something of a drawback, I’m trying to restrict my use of Windows VM’s in my electronics work only to the cheap JTAG programmer that I bought with my Spartan 3 FPGA. I found it a bit absurd when I bought it that a real JTAG USB programmer costs more then the FPGA prototyping board itself! Anyways, staying on topic – KiCad is also mentioned as an upcoming design tool as well. I’ve played briefly with gEDA too but wasn’t all too impressed with it asides from the fact I could install it easily with apt.