Virtual Machines and Containers on Parallels Cloud Storage experience significant I/O performance degradation.
How to find out the reason for the slowness?
For the purpose of I/O monitoring, run
pstorage top utility and switch to the CS view by pressing 'c' button. You will get detailed view with many columns, devoted to various parameters. The view can also be toggled with pressing 'i' button. In case of I/O issues, the following columns are the ones to check first:
- REPLICAS - Number of replicas stored on the chunk server.
- IOWAIT - Percentage of time spent waiting for I/O operations being served.
- IOLAT - Average/maximum time, in milliseconds, the client needed to complete a single IO operation during the last 20 seconds.
- QDEPTH - Average chunk server I/O queue depth.
The higher these parameters are - the worse overall performance in the cluster is observed.
Check the Monitoring Chunk Servers documentation section for more details.
Check the Parallels Cloud Storage I/O Benchmarking Guide to get familiar with PCS
at_io_iops, an I/O benchmarking tool.
As a possible way of I/O optimization, it is advised to use SSD disks for storing the Chunk Server journal on each CS host. For more details, check the Using SSD drives section of the documentation.
Real life example
1) The Cloud Storage had a balanced location of chunk replicas across 4 chunk servers.
2) One of the servers experienced a crash, which triggered relocation of the replicas from the failed server to the rest 3 servers.
3) One of the 3 servers did not have SSD caching configured.
4) After the relocation of replicas, the slower CS got an increased amount of replicas and couldn't handle the load. IOWAIT, IOLAT, QDEPTH parameters went much higher than observed on the other servers.
5) The performance degradation was experienced on all the Client Servers - VMs and CTs started to underperform drastically.