During disk intensive operations, such as backup creation, the backup browsing and backup restoration load average on the Hardware Node is very high.
- In most cases the behavior is expected.
- In some cases this can indicate CPU overcommitment on the Hardware Node or that the Hardware Node itself has a very low priority for scheduling its processes for execution.
For cases with backup creation for VZFS containers, the issue is like the following: during backup creation of a container with
vzabackuptool or using GUI panel the system uses Acronis module to get the data in the private area of a container. This operation includes the step to prepare the snapshot of the file system and initiating of the tracker to gather changes of files during actual backup.
This step requires the processes of the container to be frozen to avoid data corruption in the backup. Usually and in the most cases, this step is meant to take relatively small amount of time, few minutes at the maximum. Rarely, depending on the size of the private area of the container, the size of the partition holding "/vz", on the amount of the free/available memory in the system - this step could take noticeable time, sometimes it can be an hour for really huge containers.
The processes of the container are in frozen state and are not running, they do not process any network requests and thus the services seem to be down and irresponsible. However, once the stage is competed, all processes will be resumed and will continue working without any data loss.
Technically, the processes are in sleep state, waiting while the kernel will return the execution to the processes. This state is uninterruptible sleep (D-state). This state is accounted in the load average values (LA) together with the running state (R-state). Thus, should there be 300 processes in the container, the values of the load average for this server will grow high to 300+.
During the backup creation, run the following command on the node:
~# vzps axfww -o veid,pid,ppid,rsz,vsz,state,wchan=WIDE-WCHAN-COLUMN,cmd -E CTID
(where CTID is the id of the container being backed up).
wchancolumn will point to
refrigerator, which means that the container is in the frozen state to ensure consistency of the created backup.
To confirm that the issue is not caused by CPU overcommitment, check the CPU priorities for the Hardware Node and containers:
~# vzcpucheck -v
If the Node is overcommited, you will see a message like the following:
Warning: hardware node is overcommited
Set a higher CPU priority for the Hardware Node's processes:
~# vzctl set 0 --save --cpuunits VALUE
NOTE: VALUE is a relative number. Set it in accordance with your needs and other container priority values.
For more details, see this article:
112588 CPU limits design in Parallels Server Bare Metal 5 and Parallels Virtuozzo Containers for Linux
To apply changes permanently and preserve the priority after rebooot, adjust the
~# grep CPU /etc/vz/vz.conf VE0CPUUNITS=25000
NOTE: CPUUNITS are translated to scheduler priority in CGroups using this equation: schedprio = 500000 / CPUUNITS. This defines limits for the minimum and maximum value, and defines the range for recommended values for containers' CPUUNITS from 1000 to 10000.