Chunk Services running on a same host are getting inactive simultaneously:
18-03-15 08:20:05 MDS WRN CS#1092, CS#1091, CS#1096, CS#1098, CS#1190, CS#1095 are inactive 18-03-15 08:20:05 MDS WRN The cluster is degraded with 200 active, 6 inactive, 0 offline CS 18-03-15 08:20:06 MDS WRN CS#1094, CS#1093 are inactive 18-03-15 08:20:22 MDS INF CS#1092, CS#1091, CS#1095, CS#1096, CS#1094, CS#1190, CS#1093, CS#1098 are active
Following message can be observed in any chunk service log (
18-03-15 08:20:22.621 monitor process executed 20307 ms 18-03-15 08:20:22.622 watchdog: pcs loop was not working for 20 sec
Note!: time (20 seconds) would be different in each separate case
Chunk Service processes were frozen by external request. KernelCare software freezes all user-space processes when it applies new patches to the kernel:
Mar 18 08:20:02 pcs kcare: Updates already downloaded Mar 18 08:20:22 pcs kernel: [954231.967668] Freezing user space processes ...
Chunk Services are user-space processes, therefore they get frozen as well. Once KernelCare completes update - services will be back online.
Applying patches to the kernel without interrupting services is impossible. This behavior should be expected and taken into an account when scheduling
KernelCare updates. Consider applying updates in a non-business hours to minimize impact.