After having worked on optimizing live container migration based on runc (pre-copy migration and post-copy migration) I tried to optimize container migration in LXD.
After a few initial discussions with
Christian I started with pre-copy
migration. Container migration in LXD is based on
CRIU, just as in runc
and CRIU's pre-copy
migration support is based on dirty page tracking support of Linux:
SOFT-DIRTY
PTEs.
As LXD uses LXC for the actual container checkpointing and restoring I was curious if there was already pre-copy migration support in LXC. After figuring out the right command-line parameters it almost worked thanks to the great checkpoint and restore support implemented by Tycho some time ago.
Now that I knew that it works in LXC I focused on getting pre-copy
migration support into LXD. LXD supports container live migration using
the move command: lxc move <container> <remote>:<container>
This move
command, however, did not use any optimization yet. It
basically did:
- Initial sync of the filesystem
- Checkpoint container using CRIU
- Transfer container checkpoint
- Final sync of the filesystem
- Restart container on the remote system
The downtime for the container in this scenario is between step 2 and
step
5 and depends on the used memory of the processes inside the container.
The goal of pre-copy migration is to dump the memory of the container
and transfer it to the remote destination while the container keeps on
running and doing a final dump with only the memory pages that changed
since the last pre-dump (more about process migration optimization
theories).
Back to LXD: At the end of the day I had a very rough (and very hardcoded) first pre-copy migration implementation ready and I kept working on it until it was ready to be submitted upstream. The pull request has already been merged upstream and now LXD supports pre-copy migration.
As not all architecture/kernel/criu combinations support pre-copy migration it has to be turned on manually right now, but we already discussed adding pre-copy support detection to LXC. To tell LXD to use pre-copy migration, the parameter 'migration.incremental.memory' needs to be set to 'true'. Once that is done and if LXD is instructed to migrate a container the following will happen:
- Initial sync of the filesystem
- Start pre-copy checkpointing loop using CRIU
- Check if maximum number pre-copy iterations has been reached
- Check if threshold of unchanged memory pages has been reached
- Transfer container checkpoint
- Continue pre-copy checkpointing loop if neither of those conditions is true
- Final container delta checkpoint using CRIU
- Transfer final delta checkpoint
- Final sync of the filesystem
- Restart container on the remote system
So instead of doing a single checkpoint and transferring it, there are now multiple pre-copy checkpoints and the container keeps on running during those transfers. The container is only suspended during the last delta checkpoint and the transfer of the last delta checkpoint. In many cases this reduces the container downtime during migration, but there is the possibility that pre-copy migration also increases the container downtime during migration. This depends (as always) on the workload.
To control how many pre-copy iterations LXD does there are two additional variables:
migration.incremental.memory.iterations
(defaults to 10)migration.incremental.memory.goal
(defaults to 70%)
The first variable (iterations
) is used to tell LXD how many pre-copy
iterations it should do before doing the final dump and the second
variable (goal
) is used to tell LXD the percentage of pre-copied
memory pages that should not change between pre-copy iterations before
doing the final dump.
So LXD, in the default configuration, does either 10 pre-copy iterations before doing the final migration or the final migration is triggered when at least 70% of the memory pages have been transferred by the last pre-copy iteration.
Now that this pull request is merged and if pre-copy migration is
enabled a lxc move <container> <remote>:<container>
should live
migrate the container with a reduced downtime.
I want to thank Christian for the collaboration on getting CRIU's pre-copy support into LXD, Tycho for his work preparing LXC and LXD to support migration so nicely and the developers of p.haul for the ideas how to implement pre-copy container migration. Next step: lazy migration.