Article ID: 117631, created on Oct 1, 2013, last review on Aug 13, 2014

  • Applies to:
  • Virtuozzo

Synopsis

This video explains how to prevent and handle Parallels Cloud Storage issues through troubleshooting and best practices.

This video is the part of Parallels Cloud Storage course:

Video

Article

Tools

~# pstorage -c clustername top

Reading Event Logs

See most common errors description in Parallels Cloud Storage user guide.

Out of Disk Space

Symptoms

Clients do not return any errors, but rather freeze the application performing the I/O.

Applications running in Containers may get stuck in the D state while applications in virtual machines just switch to the "frozen" state to avoid guest OS I/O stack timeouts and unrecoverable issues

Cause

Cluster is out of disk space. This behavior was implemented to avoid file system corruptions and to seamlessly resolve situations when additional disk space is added to the cluster.

Resolution

Add additional chunk servers to the cluster or decrease the current replication parameters to remove some replicas from existing chunk servers

Low Write Performance

Symptoms

Cluster write performance is low

Cause

Some network adapters, like RTL8111/8168B, are known to fail to deliver full-bandwidth, full-duplex network traffic. This can result in poor write performance.

Resolution

Before deploying a Parallels Cloud Storage cluster, you are highly recommended to test networks for full-duplex support. You can use the netperf utility to simultaneously generate in and out traffic. For example, in 1 GbE networks, it should constantly deliver about 2 Gbit/s of total traffic (1 Gbit/s for incoming and 1 Gbit/s for outgoing traffic).

Low Disk I/O Performance

Symptoms

Cluster write performance is low because chunk servers have huge IO latency

Cause

In most BIOS setups, AHCI mode is configured to work by default with the Legacy option enabled. With this option, your servers work with SATA devices via the legacy IDE mode, which may affect the cluster performance, making it up to 2 times slower than expected.

Resolution

Check disk PIO modes:

~# hdparm -i /dev/sda | grep PIO
PIO modes: pio0 pio1 pio2 pio3 *pio4

The asterisk before pio4 in the PIO modes field indicates that your hard drive /dev/sda is currently operating in the legacy mode. To solve this problem and maximize the cluster performance, always enable the AHCI option in your BIOS settings.

SSD Drives Ignore Data Flushing

Symptoms

Cluster data becomes corrupted after power outage.

Cause

A lot of desktop-grade SSD drives can ignore disk flushes and fool operating systems by reporting that data was written while it was actually not. Examples of such drives are OCZ Vertex 3 and Intel X25-E, X-25-M G2 that are known to be unsafe on data commits.

Resolution

The 3rd generation Intel SSD drives (S3700 and 710 series) do not have these problems, having a capacitor to provide a battery backup for flushing the drive cache when the power goes out. Use SSD drives with care and only when you are sure that drives are server-grade drives and obey "flush" rules. For more information on this issue, read the following article about PostreSQL

Hardware RAID and Disk Write Caches

Symptoms

Cluster data becomes corrupted after power outage.

Cause

Some 3ware RAID controllers do not disable disk write caches and do not send "flush" commands to disk drives. As a result, the file system may sometimes become corrupted even though the RAID controller itself has a battery.

Resolution

To solve this problem, disable writes caches on all disk drives in the RAID. Also, make sure that your configuration is thoroughly tested for consistency before deploying a Parallels Cloud Storage cluster

Cluster Discovery is Broken

Symptoms

Unable to connect to cluster using pstorage tool

Cause

Cluster discovery services are misconfigured.

Resolution

Depending on discovery mode you are using:

  • For DNS discovery
    • Make sure DNS service on authoritative server is running
    • Make sure correct TXT, SRV and A records point to existing metadata server settings
    • Make sure authoritative DNS server is specified in /etc/resolv.conf
  • For Zeroconf discovery
    • Restart avahi-daemon service

Cluster Cannot Create Enough Replicas

Symptoms

Cluster is not creating enough amount of replicas There is enough disk space on chunk server Chunk servers were cloned from a single image

Cause

Host ID is not unique because of cloning the same machine

Resolution

Regenerate host ID on chunk server:

~# /usr/bin/uuidgen -r | tr '-' ' ' | awk '{print $1$2$3}' > /etc/pstorage/host_id

Additional Information

Documentation Portal

Official product page

0dd5b9380c7d4884d77587f3eb0fa8ef 2897d76d56d2010f4e3a28f864d69223

Email subscription for changes to this article
Save as PDF