pstorage top is flooded with messages that keep appearing every 30 seconds:
MON ERR MDS# died unexpectedly (122): Can't load MDS id MON ERR MDS# died unexpectedly (122): Can't load MDS id
MON ERR CS# died unexpectedly (122): csd: could not lock repository MON ERR CS# died unexpectedly (122): csd: could not lock repository
MON ERR MDS# died unexpectedly (1) MON ERR MDS# died unexpectedly (1)
However, all MDS and CS services in
pstorage top are marked as online.
Pstorage-monitor process that has been monitoring some metadata server or chunk server got orphaned — its server was removed, but monitor was not shut down for an unknown reason. Monitor is trying to restart the service which failed in monitor's opinion, however, since the service was removed, monitor cannot open MDS (CS) directory, hence the error.
In order to get rid of these errors it's necessary to find orphaned monitor and kill it manually.
Check hosts participating in pstorage for monitor process that doesn't have CS or MDS service as a child, and kill the monitoring process. Make sure you're killing the proper monitor process. Below you may find examples of all use-cases.
Example of a correct monitor running for a CS service:
# ps fax | grep /usr/libexec/pstorage/monitor -A1 ... 9625 ? S 0:00 /bin/sh /usr/libexec/pstorage/monitor 9628 ? Sl 15:55 \_ /usr/bin/csd -r /pstorage/pcs-bsh-cs/data -l /pstorage/pcs-bsh-cs/data/logs/cs.log.gz -u pstorage ...
Example of a correct monitor running for a MDS service:
# ps fax | grep /usr/libexec/pstorage/monitor -A1 ... 128717 pts/0 S 0:00 /bin/sh /usr/libexec/pstorage/monitor 128720 pts/0 Sl 0:02 \_ /usr/bin/mdsd -r /pstorage/ssd1/mds/data -l /pstorage/ssd1/mds/data/logs/mds.log.gz -u pstorage ...
Example of an orphaned monitor, which is most likely causing the flood:
# ps fax | grep /usr/libexec/pstorage/monitor -A1 ... 3979 ? S 1:09 /bin/sh /usr/libexec/pstorage/monitor 132860 ? S 0:00 \_ sleep 5 ...
Once orphaned monitor is found, simply kill it:
# kill -9 3979
NOTE: Replace monitor PID with the PID you've found