Article ID: 119207, created on Dec 18, 2013, last review on Jul 28, 2015

  • Applies to:
  • Virtuozzo

Parallels Cloud Storage High Availability Cluster is controlled by the shamand service. Its behavior is defined by the configuration that can be viewed using the following command:

# shaman -c $CLUSTERNAME get-config

A real-life example:

# shaman -c pcs1 get-config
WATCHDOG_ACTION=netfilter, reboot

To change the settings, run the set-config command, e.g.:

# shaman -c pcs1 set-config WATCHDOG_ACTION=crash

The description of the available settings


    Sets the timeout for shaman-monitor operations (e.g., electing a new master or deciding that a slave node is down). This helps you avoid situations  when  shaman-monitor performs  a cluster-related operation if someone pulls out the network cable for just a couple of seconds (up to LOCK_TIMEOUT/2). Keep this parameter in mind when deciding on other shaman-monitor timeouts (see their description below) because the value of LOCK_TIMEOUT is always added to the value of other timeouts. The default value is 60 seconds.

    Sets the timeout for electing a new master node when the original master node or the shaman-monitor daemon fails, or high availability support gets disabled. The default value is 10 seconds.

    Sets the timeout after which the master node will consider a slave node dead if this node and the shaman-monitor daemon get down, or high availability support gets  disabled. The default value is 10 seconds.

    Defines  the  action  to  perform if shaman-monitor loses connection to cluster.  This may happen when a node goes online after having been disconnected from network for more than LOCK_TIMEOUT seconds. In this case, the watchdog timer has not expired yet, but the cluster is already unavailable, because  the  master  node  has  prohibited access to the cluster until the node is rebooted.  Available values are "crash", "halt", "reboot", and "none" (do nothing). The default action is "reboot".

    Defines  the  action  to  perform when shaman-monitor detects that the cluster mount point is no longer functioning properly for some reason.  The supported actions are: crash, halt, reboot, none.

    Sets the threshold for the number of simultaneously crashed nodes. If the number of simultaneously crashed nodes becomes greater than or equal to  the  threshold,  the  master stops relocating  resources  from  the  crashed  nodes.  When the number of simultaneously crashed nodes drops below the threshold, the master automatically resumes relocating resources from the crashed nodes. The threshold can be useful when multiple nodes are being rebooted at the same time. Without it, the master would start  relocating  resources from all the rebooting nodes.  The threshold is set to 3 by default and must be 2 or greater. For clusters with only 3 nodes, the threshold is automatically set to 2.


    Sets the interval for shaman-monitor to check Pool for the resources scheduled for relocation. The default value is 30 seconds.

    Defines  a  sequence  of algorithms (modes) used for resource relocation on hardware node failure.  At least one mode must be specified. Multiple modes must be separated with commas.  On hardware node failure, relocation using the first specified mode is attempted. If unsuccessful, the next specified mode is attempted and so on. If relocation using the last specified mode is unsuccessful, the resources are left on the failed hardware node.  

    The following resource relocation modes are supported:

    * round-robin  - Each resource from the failed hardware node is relocated to another node, which is chosen using the round-robin algorithm. In general, resources are relocated to different hardware nodes.
    * spare - All resources from the failed hardware node are relocated to a spare node. A spare node is a hardware node, which is registered  in  the  cluster  and  has  no resources stored on it.
    * drs - All resources from the failed hardware node are relocated using an external DRS daemon.

    The default mode is "drs".


    Sets  the  interval for the watchdog timer. The watchdog timer is responsible for performing the action defined in WATCHDOG_ACTION if shaman-monitor crashes or hangs up.
    shaman-monitor activates the watchdog timer on its start-up and periodically resets it to the specified value. If something goes wrong with  shaman-monitor  so  that  it fails to reset the timer, the watchdog timer counts down until it reaches zero and performs the defined action. Setting the interval to zero disables the watchdog timer.
    Minimal watchdog timer interval that could be set is 10 seconds. The default value is 120 seconds.

    Defines a sequence of actions to perform after the watchdog timer expires (happens when shaman-monitor crashes or hangs up).  When the watchdog timer expires, the  first specified action is attempted. If unsuccessful, the next specified action is attempted and so on. If the last specified action is unsuccessful, then the action specified in the /sys/kernel/watchdog_action file is performed.  At least one action must be specified. Multiple actions must be  separated  with  commas.  

    Available  actions  are listed in the /sys/kernel/watchdog_available_actions file:

    # cat /sys/kernel/watchdog_available_actions
    crash reboot halt netfilter

    The default sequence is "netfilter, reboot".

Search Words


shaman not relocating



nodes had crashed, the limit is - skipping resource relocation for now



best practice update

shaman multiple ips

Cloud Storage High Availability

POA VM migration

0dd5b9380c7d4884d77587f3eb0fa8ef 2897d76d56d2010f4e3a28f864d69223

Email subscription for changes to this article
Save as PDF