Article ID: 124846, created on Mar 12, 2015, last review on Jun 17, 2016

  • Applies to:
  • Virtuozzo 6.0

Symptoms

  1. Pstorage cluster is not operational.

        ~# pstorage -c $CLUSTER_NAME top
        12-03-15 02:40:19.647 Unable connect to cluster, timeout (30 sec) expired.
    
  2. Several nodes with MDS role were lost completely (e.g. because of hardware failure).

Cause

For a Parallels Cloud Storage cluster to function, the majority of MDS servers must be up and running in the cluster, which makes MDS quorum.

When the quorum is lost, all operations on cluster are blocked.

Resolution

To restore the MDS quorum it is possible to remove lost MDS servers from the cluster.

For example: pstorage cluster had 5 nodes with MDS-es "#1" "#2" "#3" "#4" and "#5".

MDSs "#3" "#4" and "#5" were completely lost .

To restore the quorum based on working MDS servers:

  1. Stop live MDS server on each host;

    # service pstorage-mdsd stop
    

    IMPORTANT: broken MDS nodes MUST be dropped on ALL healthy MDSes in cluster before you start them again!

  2. Remove non-functioning MDS servers from local MDS repository ( commands should be executed on each healthy PCS hosts with MDS role ):

    # pstorage -c $CLUSTER_NAME configure-mds -r /$PATH_TO/$LOCAL_MDS/ -d $MDS_ID_3
    # pstorage -c $CLUSTER_NAME configure-mds -r /$PATH_TO/$LOCAL_MDS/ -d $MDS_ID_4
    # pstorage -c $CLUSTER_NAME configure-mds -r /$PATH_TO/$LOCAL_MDS/ -d $MDS_ID_5
    

    where:

    "-c" stands for cluster name

    "-r" defines path to the MDS repository to change

    "-d" Drop MDS node with the id MDS_ID from local repository

    The MDS ID of the nodes that are not available anymore can be found in the MDS log ( /var/log/pstorage/$CLUSTER_NAME/mds-XXXXXXX/) on existing server.

    12-03-15 09:27:21.738 neigh: connect to  #3 [0x37acc51][192.168.12.3:2510] failed 113(No route to host)
    12-03-15 09:27:21.750 neigh: connect to  #4 [0x16dbc30][192.168.12.4:2510] failed  113(No route to host)
    12-03-15 09:27:21.753 neigh: connect to  #5 [0x98dgh54][192.168.12.5:2510] failed   113(No route to host)
    12-03-15 09:27:21.755 wd_cs_status_timer: not master

    NOTE: If you are not sure which is the correct MDS IDs of the lost servers, contact Parallels Technical support for assistance.

  3. Start all MDS servers;

    # service pstorage-mdsd start
    

Related topics

117662 Parallels Cloud Storage cluster does not work: Unable connect to cluster

117794 Cluster hangs : Unable connect to cluster, timeout (30 sec) expired

117122 Unable connect to cluster, timeout (30 sec) expired.

Search Words

disk operations unaccessable

pstorage

quorum lost

Cluster hangs : Unable connect to cluster, timeout (30 sec) expired

configure-mds

cloud storage isn't discoverable?

Unable connect to cluster

cd_cs_status_timer: not master

diskoperations unaccessable

c62e8726973f80975db0531f1ed5c6a2 2897d76d56d2010f4e3a28f864d69223 0dd5b9380c7d4884d77587f3eb0fa8ef

Email subscription for changes to this article
Save as PDF