Articles by adrian

  1. bcache on Fedora 19

    After having upgraded our mirror server from Fedora 17 to Fedora 19 two weeks ago I was curious to try out bcache. Knowing how important filesystem caching for a file server like ours is we always tried to have as much memory as "possible". The current system has 128GB of memory and at least 90% are used as filesystem cache. So bcache sounds like a very good idea to provide another layer of caching for all the IOs we are doing. By chance I had an external RAID available with 12 x 1TB hard disc drives which I configured as a RAID6 and 4 x 128GB SSDs configured as a RAID10.

    After modprobing the bcache kernel module and installing the necessary bcache-tools I created the bcache backing device and caching device like it is described here. I then created the filesystem like I did it with our previous RAIDs. For RAID6 with 12 hard disc drive and a RAID chunk size of 512KB I used mkfs.ext4 -b 4096 -E stride=128,stripe-width=1280 /dev/bcache0. Although I am unsure how useful these options are when using bcache.

    So far it worked pretty flawlessly. To know what to expect from /dev/bcache0 I benchmarked it using bonnie++. I got 670MB/s for writing and 550MB/s for reading. Again, I am unsure how to interpret these values as bcache tries to detect sequential IO and bypasses the cache device for sequential IO larger than 4MB.

    Anyway. I started copying my fedora and fedora-archive mirror to the bcache device and we are now serving those two mirrors (only about 4.1TB) from our bcache device.

    I have created a munin plugin to monitor the usage of the bcache device and there are many cache hits (right now more than 25K) and some cache misses (about 1K). So it seems that it does what is supposed to do and the number of IOs directly hitting the hard disc drives is much lower than it would be:

    I also increased the cutoff for sequential IO which should bypass the cache from 4MB to 64MB.

    The user-space tools (bcache-tools) are not yet available in Fedora (as far as I can tell) but I found http://terjeros.fedorapeople.org/bcache-tools/ which I updated to the latest git: http://lisas.de/~adrian/bcache-tools/

    Update: as requested the munin plugin: bcache

    Tagged as : fedora
  2. Remove Old Kernels

    Mainly using Fedora, I am accustomed that old kernel images are automatically uninstalled after a certain number of kernel images have been installed using yum. The default is to have three kernel images installed and so far this has always worked.

    I am also maintaining a large number of Ubuntu VMs and every now and then we have the problem that the filesystem is full, because too many kernel images are installed. I have searched for some time but there seems to be no automatic kernel image removal in apt-get. There is one command which is often recommended which is something like:

    dpkg -l 'linux-*' | sed '/^ii/!d;/'"$(uname -r | sed "s/\(.*\)-\([^0-9]\+\)/\1/")"'/d; s/^[^ ]*[^ ]* \([^ ]*\).*/\1/;/[0-9]/!d' | xargs sudo apt-get -y purge^[1]^

    This works, but only if you are already running the latest kernel and therefore I have adapted it a little for our needs. Instead of removing all kernel images except the running kernel image I remove all kernel images except the running and the newest kernel image. No real big difference but important for our setup where we do not reboot all VMs with every kernel image update.

    Running the script gives me following output
    # remove-old-kernels

    linux-image-3.2.0-23-generic linux-image-3.2.0-36-generic linux-image-3.2.0-37-generic linux-image-3.2.0-38-generic linux-image-3.2.0-39-generic linux-image-3.2.0-40-generic linux-image-3.2.0-43-generic linux-image-3.2.0-45-generic linux-image-3.2.0-48-generic linux-image-3.2.0-49-generic

    The output of the script can then be easily used to remove the unnecessary kernel images with apt-get purge.

    The script can be downloaded here: remove-old-kernels

    And before anybody complains: I know it is not really the most elegant solution and I should have not written it using bash.

  3. A New Home

    After having received my Raspberry Pi in November, I am finally using it. I have connected it to my television using raspbmc.Using XBMC Remote I can control it without the need for a mouse, keyboard or lirc based remote control and so far it works pretty good. Following are a few pictures with the new case I bought a few days ago:

    Pi

    Pi

    Pi

  4. Process Migration coming to Fedora 19 (probably)

    With the recent approved review of the package crtools in Fedora I have made a feature proposal for checkpoint/restore.

    To test checkpoint/restore on Fedora you need to run the current development version of Fedora and install crtools using yum (yum install crtools). Until it is decided if it actually will be a Fedora 19 feature and the necessary changes in the Fedora kernel packages have been implemented it is necessary to install a kernel which is not in the repository. I have built a kernel in Fedora's buildsystem which enables the following config options: CHECKPOINT_RESTORE, NAMESPACES, EXPERT.

    A kernel with these changes enabled is available from koji as a scratch build: http://koji.fedoraproject.org/koji/taskinfo?taskID=4899525

    After installing this kernel I am able to migrate a process from one Fedora system to another. For my test case I am migrating a UDP ping pong (udpp.c) program from one system to another while communicating with a third system.

    udpp

    udpp is running in server mode on 129.143.116.10 and on 134.108.34.90 udpp is started in client mode. After a short time I am migrating, with the help of crtools, the udpp client to 85.214.67.247. The following is part of the output on the udpp server:

    -->

    Received ping packet from 134.108.34.90:38374
    Data: This is ping packet 6

    Sending pong packet 6
    <--
    -->

    Received ping packet from 134.108.34.90:38374
    Data: This is ping packet 7

    Sending pong packet 7
    <--
    -->

    Received ping packet from 85.214.67.247:38374
    Data: This is ping packet 8

    Sending pong packet 8
    <--
    -->

    Received ping packet from 85.214.67.247:38374
    Data: This is ping packet 9

    Sending pong packet 9
    <--

    So with only little changes to the kernel configuration it is possible to migrate a process by checkpointing and restoring a process with the help of crtools.

    Tagged as : criu
  5. If you have too much memory

    We have integrated new nodes into our cluster. All of the new nodes have a local SSD for fast temporary scratch data. In order to find which are the best options and IO scheduler I have written a script which tries a lot of combinations (80 to be precise) of file system options and IO schedulers. As the nodes have 64 GB of RAM the first run of the script took 40 hours as I tried to write always twice the size of the RAM for my benchmarks to avoid any caching effects. In order to reduce the amount of available memory I wrote a program called memhog which malloc()s the memory and then also mlock()s it. The usage is really simple

    $ ./memhog
    Usage: memhog <size in GB>
    

    I am now locking 56GB with memhog and I reduced the benchmark file size to 30GB.

    So, if you have too much memory and want to waste it... Just use memhog.c.

    Tagged as : cluster
  6. Kover 6

    After having successfully updated libcdio in rawhide to 0.90 and also introduced the split off libcdio-paranoia in Fedora's development branch, I rebuilt most of on libcdio depending packages. Two packages were no longer building but their maintainers quickly fixed it. The only broken dependent package was kover. As I am still upstream of kover I had to change the code to use the new CD-Text API of libcdio 0.90.

    With these changes I have released kover version 6 which is available at http://lisas.de/kover/kover-6.tar.bz2.

    Tagged as : fedora kover
  7. More mirror traffic analysis

    I have updated the scripts which are using the mirrored project status information in our database to display even more information about what is going on on our mirror server. In addition to the overall traffic of the last 14 days, 12 months and all the years since we started to collect this data, the overall traffic is now broken down to transferred HTTP, FTP, RSYNC and other data (blue=other, red=http, green=rsync, yellow=ftp). The most traffic is generated by HTTP, followed by RSYNC and last (but not surprising) is FTP.

    In addition to breakdown by traffic type I added an overview of the mirror size (in bytes and number files) at the bottom of the status page of each mirrored project. Looking at the status page of our apache mirror it is now possible to see the growth of the mirror since 2005. It started with 7GB in 2005 and has now reached almost 50GB at the end of 2012.

    Adding the new functionality to the PHP scripts I had to change code I have written many years ago and unfortunately I must confess that this is embarrassingly bad code and it already hurts looking at it. Adding new functionality to it was even worse, but despite my urge to rewrite it I just added the new functionality which makes the code now even more unreadable.

    Tagged as : traffic
  8. PowerStation updated to Fedora 18

    A few days ago I started to upgrade my PowerStation from Fedora 15 (running my own rebuild) to Fedora 18 Beta.

    PowerStation

    The update from the running Fedora 15 to Fedora 16 was the really hard part. It seems that the userspace moved from 32bit to 64bit and that was something that yum, understandably, could not handle. So after the first run of all packages updated to Fedora 16 (which required a lot of rpm -e --justdb --nodeps --noscripts) and a reboot the system was broken. systemd tried to start udev but that failed with:

    [ 38.164191] systemd[1]: udev.service holdoff time over, scheduling restart. [ 38.208255] systemd[1]: Job pending for unit, delaying automatic restart.

    and systemd kept printing those lines forever. Luckily I still had the original Yellow Dog Linux installation on a second drive and could boot that. Unfortunately I could not chroot into the Fedora 16 installation because the Yellow Dog Linux kernel was too old, but I was able to mount it and disabled every occurrence of udev in systemd. Rebooting with systemd.unit=emergency.target on the kernel command-line I was able to get the network running and reinstalled with yum the udev and systemd ppc64 packages. After that (and some more fiddling around) it rebooted into Fedora 16.

    I then just followed the recommendations on the Fedora wiki to upgrade using yum from F16->F17 and F17->F18. The only difference was that I installed the gpg key, which is used to sign the packages, from https://fedoraproject.org/keys using the keys for the secondary architectures.

    Now I have a PowerStation with the latest 64bit Fedora 18 Beta packages up and running.

  9. New RAID

    For our mirror server we now have a third RAID which is also used for the mirror data. The previous external RAIDs (12x1TB as RAID5 + hot spare) were reaching their limits and so additional 11x1TB as RAID6 in the remaining internal slots are a great help to reduce the load and usage of the existing disks. There are now roughly 30TB used for mirror data.

    To create the filesystem on the new internal RAID I have used http://busybox.net/~aldot/mkfs_stride.html. With 11 disks, a RAID level of 6,  RAID chunk size of 512 KiB and number of filesystem blocks of 4KiB I get the following command to create my ext4 filesystem:

    mkfs.ext4 -b 4096 -E stride=128,stripe-width=1152
    

    I am now moving all the data from one of the external RAIDs to the new internal RAID because the older external RAID still uses ext3 and I would like to recreate the filesystem using the same parameter calculation as above. Once the filesystem has been re-created I will distribute our data evenly across the three RAIDs (and maybe also mirror a new project).

    Update: After moving the data from one of the external RAIDs to the internal RAID the filesystem has been re-created with:

    mkfs.ext4 -b 4096 -E stride=128,stripe-width=1280
    

Page 4 / 5