It is a bit late but I still wanted to share my presentations from this year’s Linux Plumbers Conference:

On my way back home I had to stay one night in Albuquerque and it looks like the hotel needs to upgrade its TV system. It is still running Fedora 10 which is EOL since 2009-12-18:

Still Fedora 10

To restore a checkpointed process with CRIU the process ID (PID) has to be the same it was during checkpointing. CRIU uses /proc/sys/kernel/ns_last_pid to set the PID to one lower as the process to be restored just before fork()-ing into the new process.

The same interface (/proc/sys/kernel/ns_last_pid) can also be used from the command-line to influence which PID the kernel will use for the next process.

# cat /proc/sys/kernel/ns_last_pid
1626
# echo -n 9999 > /proc/sys/kernel/ns_last_pid
# cat /proc/sys/kernel/ns_last_pid
10000

Writing ‘9999’ (without a ‘new line’) to /proc/sys/kernel/ns_last_pid tells the kernel, that the next PID should be ‘10000’. This only works if between after writing to /proc/sys/kernel/ns_last_pid and forking the new process no other process has been created. So it is not possible to guarantee which PID the new process will get but it can be influenced.

There is also a posting which describes how to do the same with C: How to set PID using ns_last_pid

In my last post about CRIU in May 2016 I mentioned lazy memory transfer to decrease process downtime during migration. Since May 2016 Mike Rapoport’s patches for remote lazy process migration have been merged into CRIU‘s criu-dev branch as well as my patches to combine pre-copy and post-copy migration.

Using pre-copy (criu pre-dump) it has “always” been possible to dump the memory of a process using soft-dirty-tracking. criu pre-dump can be run multiple times and each time only the changed memory pages will be written to the checkpoint directory.

Depending on the processes to be migrated and how fast they are changing their memory, this can still lead to a situation where the final dump can be rather large which can mean a longer downtime during migration than desired. This is why we started to work on post-copy migration (also know as lazy migration). There are, however, situations where post-copy migration can also increase the process downtime during migration instead of decreasing it.

The latest changes regarding post-copy migration in the criu-dev branch offer the possibility to combine pre-copy and post-copy migration. The memory pages of the process are pre-dumped using soft-dirty-tracking and transferred to the destination while the process on the source machine keeps on running. Once the process is actually migrated to the destination system everything besides the memory pages is transferred to the destination system. Excluding the memory pages (as the remaining memory pages will be migrated lazily) usually only a few hundred kilobytes have to be transferred which reduces the process downtime during migration significantly.

Using criu with pre-copy and post-copy could look like this:

Source system:

# criu pre-dump -D /tmp/cp/1 -t PID
# rsync -a /tmp/cp destination:/tmp
# criu dump -D /tmp/cp/2 -t PID --port 27 --lazy-pages 
  --prev-images-dir ../1/ --track-mem

The first criu command dumps the memory of the process PID and resets the soft-dirty memory tracking. The initial dump is then transferred using rsync to the destination system. During that time the process PID keeps on running. The last criu command starts the lazy page mode which dumps everything besides memory pages which can be transferred lazily and waits for connections over the network on port 27. Only pages which have changed since the last pre-dump are considered for the lazy restore. At this point the process is no longer running and the process downtime starts.

Destination system:

# rsync -a source:/tmp/cp /tmp/
# criu lazy-pages --page-server --address source --port 27 
  -D /tmp/cp/2 &
# criu restore --lazy-pages -D /tmp/cp/2

Once criu is waiting on port 27 on the source system the remaining checkpoint images can be transferred from the source system to the destination system (using rsync in this case). Now criu can be started in lazy-pages mode connecting to the page server on port 27 on the source system. This is the part we usually call the UFFD daemon. The last step is the actual restore (criu restore).

The following diagrams try to visualize what happens during the last step: criu restore.

step1

It all starts with criu restore (on the right). criu does its magic to restore the process and copies the memory pages from criu pre-dump to the process and marks lazy pages as being handled by userfaultfd. Once everything is restored criu jumps into the restored process and the restored process continues to run where it was when checkpointed. Once the process accesses a userfaultfd marked memory address the process will be paused until a memory page (hopefully the correct one) is copied to that address.

step2

The part that we call the UFFD daemon or criu lazy-pages listens on the userfault file descriptor for a message and as soon as a valid UFFD request arrives it requests that page from the source system via TCP where criu is still running in page-server mode. If the page-server finds that memory page it transfers the actual page back to the destination system to the UFFD daemon which injects the page into the kernel using the same userfault file descriptor it previously got the page request from. Now that the page which initially triggered the page-fault or in our case userfault is at its place the restored process continues to run until another missing page is accessed and the whole procedure starts again.

To be able to remove the UFFD daemon and the page-server at some point we currently push all unused pages into the restored process if there are no further userfaultfd requests for 5 seconds.

The whole procedure still has a lot of possibilities for optimization but now that we finally can combine pre-copy and post-copy memory migration we are a lot closer to decreasing process downtime during migration.

The next steps are to get support for pre-copy and post-copy into p.haul (Process Hauler) and into different container runtimes which already support migration via criu.

My other recently posted criu related articles:

For my recently installed PXACB I was looking for a way to remotely power it on and off. I found the Wi-Fi Smart Plug “HS100” and a blog post that it can be controlled from the command-line.

The referenced script uses captured results from wireshark and just re-transmits these messages from a shell script. In one of the comments someone points out that this is XOR’d JSON and how it can be decoded. Instead of a shell script I re-implemented it in Python and I am now always using XOR to encode and decode the JSON messages without needing to include the encoded commands in my script. This makes it easier to read the script and to extend the script.

The protocol used is JSON which is XOR’d and then transmitted to the device. Same goes for the answers. The JSON string is XOR’d with the previous character of the JSON string and the value of the first XOR operation is 0xAB. Additionally each message is prefixed with ‘x00x00x00x23’.

The message to turn on the power looks like this:

{
 "system": {
  "set_relay_state": {
   "state": 1
  }
 }
}

To find more about which commands the device understands I used the information I got from: Why not root your Christmas gift?

I downloaded the firmware for the US model of my smart plug and used binwalk to analyze the content of the firmware. The firmware contains busybox based ramdisk which includes the smart plug relevant programs /usr/bin/shd and /usr/bin/shdTester and it seems at least following commands exist:

  • system
  • reset
  • get_sysinfo
  • set_test_mode
  • set_dev_alias
  • set_relay_state
  • check_new_config
  • download_firmware
  • get_download_state
  • flash_firmware
  • set_mac_addr
  • set_device_id
  • set_hw_id
  • test_check_uboot
  • get_dev_icon
  • set_dev_icon
  • set_led_off
  • set_dev_location

With the knowledge from the original shell script implementation and the results from binwalk I wrote the following script: https://lisas.de/~adrian/hs100.py

Using this script I can power the device behind the smart plug easily on and off:

$ ./hs100.py -H p-pxcab.example.com off
$ ./hs100.py -H p-pxcab.example.com state
Power OFF
$ ./hs100.py -H p-pxcab.example.com on
$ ./hs100.py -H p-pxcab.example.com state
Power ON

The only annoying thing about the smart plug is, that it tries to communicate with some cloud systems so that it could be controlled from anywhere. After starting the smart plug it makes a name lookup for devs.tplinkcloud.com and connects to port 50443. I can connect to that system with openssl s_client -connect devs.tplinkcloud.com:50443 but what the smart plug actually sends to that system I do not know. If I do not block the smart plug in the firewall I see a NTP request after that and then the communication seems to stop. Right now the smart plug is blocked and does no NTP requests but it still tries to reach devs.tplinkcloud.com:50443 once a minute.

Quick bugfix release to address some issues with the audio backends: The user interface allowed selecting the PulseAudio backend, even when terminatorX was built without PulseAudio support. In addition the error message was not really helpful and PulseAudio was not set as default as it was intended.

These issues have been fixed with release 4.0.1, as usual you can find the tarball on the download page, the PPA builds are currently in progress – the resulting .deb packages should be available shortly.