Article ID: 123676, created on Nov 26, 2014, last review on Jan 8, 2016

  • Applies to:
  • Virtuozzo 6.0


How to perform maintenance of a hardware node which is a part of Virtuozzo storage (Parallels Cloud Storage)?

Here are the frequent questions related to maintenance.


  1. How to install product updates to nodes of the cluster?

    The best practice for updates installation is to update nodes one by one, checking and ensuring that all services are running after installing updates. The process for each node is:

    • The server holding MDS master role is recommended to be updated after all other servers with MDS role. For Virtuozzo 6.0.7 or earlier, MDS master is to be updated in the very end.

      Reason: MDS protocol compatibility is one-way, upward compatible. Older non-master MDS may fail to keep journal consistency with newer master.

    • It is strongly recommended to install updates to one server with MDS/CS role at time.

      Reason: For MDS servers, this is to keep quorum. For CS servers, this is to provide minimum replication level and avoid performance degradation.

    • Once updates are installed and services are restarted, perform verification of services stability, that no service is crashing.
  2. Is it necessary to remove roles from a server during maintenance?

    Before answering this question, it is necessary to clarify certain points of configuration.

    • Here we assume that the replication level is set to 3:2. (details)
    • The replica storing policy (failure domains) is to store 1 replica per a host with CS role.
    • There are 5 or more MDS servers in the cluster.
    • There is no other node disabled, powered off, or planned to be restarted in the near future.

    Note: The points above describe minimum recommended cluster configuration.

    For a short maintenance period (up to 2 hours), it is safe to leave the system as is. For longer periods, it depends on the role and amount of data to replicate:

    • MDS service contains relatively small amount of data (typically less than 10GB, including journal), time to recreate MDS server is small.

      The recommendation is to drop MDS role and create it on some other server to conform with the recommended configuration guidelines.

    • CS service manages large portions of data, few TBs is not an extraordinary situation, and replicating this amount of data can take much time.

      Removal of CS instance will initiate replication of stored data. If no other CS server is to be restarted or powered off, 1 server with CS role can be taken off without removal of managed CSes. Turning off any additional node with CS roles should be done only after full replication is completed.

  3. How to replace failed disk with CS role?

    The steps are provided in the documentation, Replacing Disks Used as Chunk Servers.

    The exception is the case when the disk is no longer recognized by the system and no data can be read from the disk.

    • Stop CS service using the mount-point in the argument:

      ~# service pstorage-csd stop /pstorage/CLUSTER_NAME-CSN
    • Drop CS:

      ~# pstorage -c CLUSTER_NAME -f rm-cs CS_ID
    • Check that no process is holding the mount point, terminate found processes:

      ~# fuser -auv /pstorage/CLUSTER_NAME-CSN
    • Umount the file-system, replace the disk, and so on.

      ~# umount /pstorage/CLUSTER_NAME-CSN
  4. SSD disk failed, how to replace the drive?

    It is recommended to check periodically whether SSD is healthy.

    SSD can be use with different cluster roles, instructions for SSD replacement may vary depending on its usage:

  5. Is it possible to completely redeploy the cluster?

    This is possible without downtime for virtual environments, also if virtual environments can be stopped.

  6. If network equipment needs maintenance, what should be done?

    If it is not related to the network segment which is dedicated for storage operations, then no additional action is needed.

    For storage network, if there is no redundancy (e.g. only one interface is dedicated to storage network, or all network communications are done via the single network switch), then all services on Virtuozzo hosts should be stopped in the order defined in the documentation.

    Note: All virtual environments, iSCSI targets will be stopped.

    1. Stop services depending on Virtuozzo storage mount on all nodes:

      ~# for svc in pvapp pvaagent shaman parallels-server vz pstorage-iscsi; do service $svc stop; done
    2. Stop Virtuozzo storage mount on all nodes:

      ~# service pstorage-fs stop

      Check and ensure that there is no "pstorage://" mount at this point, terminate processes holding the mount point and unmount the storage.

    3. Stop services related to metadata functionality on nodes with this role installed:

      ~# service pstorage-mdsd stop
    4. Stop services related to chunk server functionality on nodes with this role installed:

      ~# service pstorage-csd stop
    5. Perform necessary operations with network equipment - switch replacement, firmware upgrade, etc.

    6. Start metadata and chunk services back, on all nodes with these roles installed:

      ~# for svc in pstorage-mdsd pstorage-csd; do service $svc start; done

      Check that the cluster works:

      ~# pstorage -c CLUSTER_NAME stat
    7. Start other services on all nodes:

      ~# for svc in pstorage-fs pstorage-iscsi vz parallels-server shaman pvaagent pvapp; do service svc start; done

    At this point the cluster should be back online.

Search Words

stop pstorage node


CS 'blinking' and client issue

vm shutting down




c62e8726973f80975db0531f1ed5c6a2 2897d76d56d2010f4e3a28f864d69223 0dd5b9380c7d4884d77587f3eb0fa8ef

Email subscription for changes to this article
Save as PDF