Business Continuity

 

Veritas CFS Media Server Workloads

Sequential read I/O throughput test

 

Date: 18th September 2015

 

Colin Eldridge

Shrinivas Chandukar

 

What is the purpose of this document?

  • This initial document is designed to help setup a CFS environment for use as a media server solution.

  • The idea is to repeat the testing we have performed in this document using your own h/w environment

  • This report is specific to sequential read I/O, it includes best practices and configuration recommendations.

  • This testing will identify the I/O bottlenecks in your h/w environment.

  • The testing will identify the maximum read I/O throughput that can be achieved from one node and the maximum read I/O throughput from all nodes combined, using your h/w environment.

  • This testing will identify the best stripe-width and number of columns for your VxVM volume.

  • This testing will identify the best file system read_ahead tuning for a sequential read I/O workload.

 

In summary:

  • This document attempts to explain how to setup a media server solution, including:

    • how to perform the tests

    • how to measure the I/O throughput

    • how to choose the correct VxVM volume configuration and achieve balanced I/O

    • how to identify the bottlenecks in the I/O path using your h/w environment

    • how to tune the file system read_ahead to balance the read I/O throughput across processes

  • You should then understand the capabilities of your h/w environment, including:

    • the maximum read I/O throughput that will be possible in the environment

    • the mechanism of balancing the I/O across the LUNs

    • the mechanism of balancing the read I/O throughput across active processes/threads

 

 

 

< 1. Hardware, DMP paths and volume configuration >

 

HOST side

  • Each node has a dual port HBA card (so 2 active DMP paths to each LUN), each HBA port is connected to a different FC switch

  • The theoretical maximum throughput per FC port on the HBA is        8Gbits/sec

  • The theoretical maximum throughput per node (two FC ports) is    16Gbits/sec.

  • The theoretical maximum throughput for two nodes is therefore    32Gbits/sec.

  • In reality during our testing the maximum throughput we could reach from one node was 12Gbits/sec

  • In our 1-node testing the dual port HBA therefore bottlenecked at approximately 12Gbits/sec (1.5 Gbytes/sec), so this is our approx. maximum throughput from one node.

FC Switch

  • Each switch is capable of 32Gbits/sec, there are two switches so the total theoretical max throughout for both switches is 64Gbits/sec.

  • Each individual switch port is capable of 8Gbits/sec.

  • We are using 4 switch ports connected to HBA FC ports on the host nodes – this limits the max throughout at the switch to 32Gbits/sec (through both switches).

  • We are using 12 switch ports connected to the modular storage arrays.

Storage Array

  • We have 6 modular storage arrays.

  • We are using 2 ports from each storage array – each port has a theoretical maximum throughput of 4Gbits/sec.

  • We therefore have a total of 12 storage array connections to the two FC switches (6 connections to each switch)

  • The theoretical maximum throughput is therefore 48Gbits/sec for the storage  arrays.

  • In our 2-node testing the combination of 6 storage arrays bottlenecked at approximately 20Gbits/sec (2.5 Gbytes/sec), so this is our approx. maximum throughout from both nodes.

 

# vxdmpadm listenclosure
ENCLR_NAME        ENCLR_TYPE     ENCLR_SNO            STATUS       ARRAY_TYPE     LUN_COUNT    FIRMWARE
=======================================================================================================
storagearray-0    STORAGEARRAY-  21000022a1035118     CONNECTED    A/A-A-STORAGE   4            1
storagearray-1    STORAGEARRAY-  21000022a1035119     CONNECTED    A/A-A-STORAGE   4            1
storagearray-2    STORAGEARRAY-  21000022a1035116     CONNECTED    A/A-A-STORAGE   4            1
storagearray-3    STORAGEARRAY-  21000022a1035117     CONNECTED    A/A-A-STORAGE   4            1
storagearray-4    STORAGEARRAY-  21000022a106c70a     CONNECTED    A/A-A-STORAGE   4            1
storagearray-5    STORAGEARRAY-  21000022a106c705     CONNECTED    A/A-A-STORAGE   4            1

 

storage.jpg

 

LUNs

  • Each modular array has 4 enclosures with 12 disks each, only 11 disks are used in each enclosure for a RAID-0 LUN

  • Each LUN is comprised of 11 disks (11 way stripe), 64Kb stripe width (one disk is kept as a failure disk).

  • There are 4 LUNs per modular array, therefore we have a total of 24 LUNs.

  • Each LUN is approximately 3TB.

 

All 24 LUNs can be displayed using the “vxdisk list” command:

# vxdisk list
DEVICE            TYPE            DISK               GROUP        STATUS
storagearray-0_16 auto:cdsdisk    storagearray-0_16  testdg       online shared
storagearray-0_17 auto:cdsdisk    storagearray-0_17  testdg       online shared
storagearray-0_18 auto:cdsdisk    storagearray-0_18  testdg       online shared
storagearray-0_20 auto:cdsdisk    storagearray-0_20  testdg       online shared
storagearray-1_6  auto:cdsdisk    storagearray-1_6   testdg       online shared
storagearray-1_7  auto:cdsdisk    storagearray-1_7   testdg       online shared
storagearray-1_8  auto:cdsdisk    storagearray-1_8   testdg       online shared
storagearray-1_9  auto:cdsdisk    storagearray-1_9   testdg       online shared
storagearray-2_5  auto:cdsdisk    storagearray-2_5   testdg       online shared
storagearray-2_6  auto:cdsdisk    storagearray-2_6   testdg       online shared
storagearray-2_7  auto:cdsdisk    storagearray-2_7   testdg       online shared
storagearray-2_8  auto:cdsdisk    storagearray-2_8   testdg       online shared
storagearray-3_4  auto:cdsdisk    storagearray-3_4   testdg       online shared
storagearray-3_6  auto:cdsdisk    storagearray-3_6   testdg       online shared
storagearray-3_7  auto:cdsdisk    storagearray-3_7   testdg       online shared
storagearray-3_8  auto:cdsdisk    storagearray-3_8   testdg       online shared
storagearray-4_8  auto:cdsdisk    storagearray-4_8   testdg       online shared
storagearray-4_9  auto:cdsdisk    storagearray-4_9   testdg       online shared
storagearray-4_10 auto:cdsdisk    storagearray-4_10  testdg       online shared
storagearray-4_11 auto:cdsdisk    storagearray-4_11  testdg       online shared
storagearray-5_8  auto:cdsdisk    storagearray-5_8   testdg       online shared
storagearray-5_9  auto:cdsdisk    storagearray-5_9   testdg       online shared
storagearray-5_10 auto:cdsdisk    storagearray-5_10  testdg       online shared
storagearray-5_11 auto:cdsdisk    storagearray-5_11  testdg       online shared

 

DMP paths

  • There are 2 paths per LUN (on each node).

  • Both paths are active, therefore there are 48 active paths in total (on each node).

 

All 48 paths can be displayed using the “vxdisk path” command:

# vxdisk path
SUBPATH           DANAME               DMNAME            GROUP        STATE
sdad              storagearray-0_16    storagearray-0_16 testdg       ENABLED
sdo               storagearray-0_16    storagearray-0_16 testdg       ENABLED
sdab              storagearray-0_17    storagearray-0_17 testdg       ENABLED
sdm               storagearray-0_17    storagearray-0_17 testdg       ENABLED
sdae              storagearray-0_18    storagearray-0_18 testdg       ENABLED
sdp               storagearray-0_18    storagearray-0_18 testdg       ENABLED
sdac              storagearray-0_20    storagearray-0_20 testdg       ENABLED
sdn               storagearray-0_20    storagearray-0_20 testdg       ENABLED
sdx               storagearray-1_6     storagearray-1_6  testdg       ENABLED
sdan              storagearray-1_6     storagearray-1_6  testdg       ENABLED
sdaa              storagearray-1_7     storagearray-1_7  testdg       ENABLED
sdaq              storagearray-1_7     storagearray-1_7  testdg       ENABLED
sdz               storagearray-1_8     storagearray-1_8  testdg       ENABLED
sdap              storagearray-1_8     storagearray-1_8  testdg       ENABLED
sdy               storagearray-1_9     storagearray-1_9  testdg       ENABLED
sdao              storagearray-1_9     storagearray-1_9  testdg       ENABLED
sdat              storagearray-2_5     storagearray-2_5  testdg       ENABLED
sdw               storagearray-2_5     storagearray-2_5  testdg       ENABLED
sdar              storagearray-2_6     storagearray-2_6  testdg       ENABLED
sdu               storagearray-2_6     storagearray-2_6  testdg       ENABLED
sdas              storagearray-2_7     storagearray-2_7  testdg       ENABLED
sdv               storagearray-2_7     storagearray-2_7  testdg       ENABLED
sdaz              storagearray-2_8     storagearray-2_8  testdg       ENABLED
sday              storagearray-2_8     storagearray-2_8  testdg       ENABLED
sdq               storagearray-3_4     storagearray-3_4  testdg       ENABLED
sdau              storagearray-3_4     storagearray-3_4  testdg       ENABLED
sds               storagearray-3_6     storagearray-3_6  testdg       ENABLED
sdaw              storagearray-3_6     storagearray-3_6  testdg       ENABLED
sdav              storagearray-3_7     storagearray-3_7  testdg       ENABLED
sdr               storagearray-3_7     storagearray-3_7  testdg       ENABLED
sdax              storagearray-3_8     storagearray-3_8  testdg       ENABLED
sdt               storagearray-3_8     storagearray-3_8  testdg       ENABLED
sdaf              storagearray-4_8     storagearray-4_8  testdg       ENABLED
sdi               storagearray-4_8     storagearray-4_8  testdg       ENABLED
sdag              storagearray-4_9     storagearray-4_9  testdg       ENABLED
sdj               storagearray-4_9     storagearray-4_9  testdg       ENABLED
sdl               storagearray-4_10    storagearray-4_10 testdg       ENABLED
sdai              storagearray-4_10    storagearray-4_10 testdg       ENABLED
sdk               storagearray-4_11    storagearray-4_11 testdg       ENABLED
sdah              storagearray-4_11    storagearray-4_11 testdg       ENABLED
sdh               storagearray-5_8     storagearray-5_8  testdg       ENABLED
sdam              storagearray-5_8     storagearray-5_8  testdg       ENABLED
sdg               storagearray-5_9     storagearray-5_9  testdg       ENABLED
sdal              storagearray-5_9     storagearray-5_9  testdg       ENABLED
sde               storagearray-5_10    storagearray-5_10 testdg       ENABLED
sdaj              storagearray-5_10    storagearray-5_10 testdg       ENABLED
sdf               storagearray-5_11    storagearray-5_11 testdg       ENABLED
sdak              storagearray-5_11    storagearray-5_11 testdg       ENABLED

 

VxVM volume

  • The idea is to achieve balanced I/O across all the LUNs, and to maximise the h/w I/O bandwidth.

  • As we have 24 LUNs available we created our VxVM volume with 24 columns to obtain the maximum possible throughput.

  • We then tested using three different VxVM stripe unit widths, 64Kb, 512Kb, 1024Kb

  • The “stripewidth” argument to the vxassist command is in units of 512byte sectors.

 

Volume configuration using   64k stripe width volume, 24 columns:

# vxassist -g testdg make vol1 50T layout=striped stripewidth=128 `vxdisk list|grep storage|awk '{print $1}'`

v  vol1         -            ENABLED  ACTIVE   107374182400 SELECT vol1-01 fsgen
pl vol1-01      vol1         ENABLED  ACTIVE   107374184448 STRIPE 24/128  RW
sd storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473924352 0/0  storagearray-0_16 ENA
sd storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473924352 1/0  storagearray-0_17 ENA
sd storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473924352 2/0  storagearray-0_18 ENA
sd storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473924352 3/0  storagearray-0_20 ENA
sd storagearray-1_6-01 vol1-01  storagearray-1_6 0  4473924352 4/0  storagearray-1_6  ENA
sd storagearray-1_7-01 vol1-01  storagearray-1_7 0  4473924352 5/0  storagearray-1_7  ENA
sd storagearray-1_8-01 vol1-01  storagearray-1_8 0  4473924352 6/0  storagearray-1_8  ENA
sd storagearray-1_9-01 vol1-01  storagearray-1_9 0  4473924352 7/0  storagearray-1_9  ENA
sd storagearray-2_5-01 vol1-01  storagearray-2_5 0  4473924352 8/0  storagearray-2_5  ENA
sd storagearray-2_6-01 vol1-01  storagearray-2_6 0  4473924352 9/0  storagearray-2_6  ENA
sd storagearray-2_7-01 vol1-01  storagearray-2_7 0  4473924352 10/0 storagearray-2_7  ENA
sd storagearray-2_8-01 vol1-01  storagearray-2_8 0  4473924352 11/0 storagearray-2_8  ENA
sd storagearray-3_4-01 vol1-01  storagearray-3_4 0  4473924352 12/0 storagearray-3_4  ENA
sd storagearray-3_6-01 vol1-01  storagearray-3_6 0  4473924352 13/0 storagearray-3_6  ENA
sd storagearray-3_7-01 vol1-01  storagearray-3_7 0  4473924352 14/0 storagearray-3_7  ENA
sd storagearray-3_8-01 vol1-01  storagearray-3_8 0  4473924352 15/0 storagearray-3_8  ENA
sd storagearray-4_8-01 vol1-01  storagearray-4_8 0  4473924352 16/0 storagearray-4_8  ENA
sd storagearray-4_9-01 vol1-01  storagearray-4_9 0  4473924352 17/0 storagearray-4_9  ENA
sd storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473924352 18/0 storagearray-4_10 ENA
sd storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473924352 19/0 storagearray-4_11 ENA
sd storagearray-5_8-01 vol1-01  storagearray-5_8 0  4473924352 20/0 storagearray-5_8  ENA
sd storagearray-5_9-01 vol1-01  storagearray-5_9 0  4473924352 21/0 storagearray-5_9  ENA
sd storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473924352 22/0 storagearray-5_10 ENA
sd storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473924352 23/0 storagearray-5_11 ENA

 

Volume configuration using  512k stripe width volume, 24 columns:

# vxassist -g testdg make vol1 50T layout=striped stripewidth=1024 `vxdisk list|grep storage|awk '{print $1}'`

v  vol1         -            ENABLED  ACTIVE   107374182400 SELECT vol1-01 fsgen
pl vol1-01      vol1         ENABLED  ACTIVE   107374190592 STRIPE 24/1024 RW
sd storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473924608 0/0  storagearray-0_16 ENA
sd storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473924608 1/0  storagearray-0_17 ENA
sd storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473924608 2/0  storagearray-0_18 ENA
sd storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473924608 3/0  storagearray-0_20 ENA
sd storagearray-1_6-01 vol1-01  storagearray-1_6 0  4473924608 4/0  storagearray-1_6  ENA
sd storagearray-1_7-01 vol1-01  storagearray-1_7 0  4473924608 5/0  storagearray-1_7  ENA
sd storagearray-1_8-01 vol1-01  storagearray-1_8 0  4473924608 6/0  storagearray-1_8  ENA
sd storagearray-1_9-01 vol1-01  storagearray-1_9 0  4473924608 7/0  storagearray-1_9  ENA
sd storagearray-2_5-01 vol1-01  storagearray-2_5 0  4473924608 8/0  storagearray-2_5  ENA
sd storagearray-2_6-01 vol1-01  storagearray-2_6 0  4473924608 9/0  storagearray-2_6  ENA
sd storagearray-2_7-01 vol1-01  storagearray-2_7 0  4473924608 10/0 storagearray-2_7  ENA
sd storagearray-2_8-01 vol1-01  storagearray-2_8 0  4473924608 11/0 storagearray-2_8  ENA
sd storagearray-3_4-01 vol1-01  storagearray-3_4 0  4473924608 12/0 storagearray-3_4  ENA
sd storagearray-3_6-01 vol1-01  storagearray-3_6 0  4473924608 13/0 storagearray-3_6  ENA
sd storagearray-3_7-01 vol1-01  storagearray-3_7 0  4473924608 14/0 storagearray-3_7  ENA
sd storagearray-3_8-01 vol1-01  storagearray-3_8 0  4473924608 15/0 storagearray-3_8  ENA
sd storagearray-4_8-01 vol1-01  storagearray-4_8 0  4473924608 16/0 storagearray-4_8  ENA
sd storagearray-4_9-01 vol1-01  storagearray-4_9 0  4473924608 17/0 storagearray-4_9  ENA
sd storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473924608 18/0 storagearray-4_10 ENA
sd storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473924608 19/0 storagearray-4_11 ENA
sd storagearray-5_8-01 vol1-01  storagearray-5_8 0  4473924608 20/0 storagearray-5_8  ENA
sd storagearray-5_9-01 vol1-01  storagearray-5_9 0  4473924608 21/0 storagearray-5_9  ENA
sd storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473924608 22/0 storagearray-5_10 ENA
sd storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473924608 23/0 storagearray-5_11 ENA

 

Volume configuration using 1024k stripe width volume, 24 columns:

# vxassist -g testdg make vol1 50T layout=striped stripewidth=2048 `vxdisk list|grep storage|awk '{print $1}'`

v  vol1         -            ENABLED  ACTIVE   107374182400 SELECT vol1-01 fsgen
pl vol1-01      vol1         ENABLED  ACTIVE   107374215168 STRIPE 24/2048 RW
sd storagearray-0_16-01 vol1-01 storagearray-0_16 0 4473925632 0/0  storagearray-0_16 ENA
sd storagearray-0_17-01 vol1-01 storagearray-0_17 0 4473925632 1/0  storagearray-0_17 ENA
sd storagearray-0_18-01 vol1-01 storagearray-0_18 0 4473925632 2/0  storagearray-0_18 ENA
sd storagearray-0_20-01 vol1-01 storagearray-0_20 0 4473925632 3/0  storagearray-0_20 ENA
sd storagearray-1_6-01 vol1-01  storagearray-1_6 0  4473925632 4/0  storagearray-1_6  ENA
sd storagearray-1_7-01 vol1-01  storagearray-1_7 0  4473925632 5/0  storagearray-1_7  ENA
sd storagearray-1_8-01 vol1-01  storagearray-1_8 0  4473925632 6/0  storagearray-1_8  ENA
sd storagearray-1_9-01 vol1-01  storagearray-1_9 0  4473925632 7/0  storagearray-1_9  ENA
sd storagearray-2_5-01 vol1-01  storagearray-2_5 0  4473925632 8/0  storagearray-2_5  ENA
sd storagearray-2_6-01 vol1-01  storagearray-2_6 0  4473925632 9/0  storagearray-2_6  ENA
sd storagearray-2_7-01 vol1-01  storagearray-2_7 0  4473925632 10/0 storagearray-2_7  ENA
sd storagearray-2_8-01 vol1-01  storagearray-2_8 0  4473925632 11/0 storagearray-2_8  ENA
sd storagearray-3_4-01 vol1-01  storagearray-3_4 0  4473925632 12/0 storagearray-3_4  ENA
sd storagearray-3_6-01 vol1-01  storagearray-3_6 0  4473925632 13/0 storagearray-3_6  ENA
sd storagearray-3_7-01 vol1-01  storagearray-3_7 0  4473925632 14/0 storagearray-3_7  ENA
sd storagearray-3_8-01 vol1-01  storagearray-3_8 0  4473925632 15/0 storagearray-3_8  ENA
sd storagearray-4_8-01 vol1-01  storagearray-4_8 0  4473925632 16/0 storagearray-4_8  ENA
sd storagearray-4_9-01 vol1-01  storagearray-4_9 0  4473925632 17/0 storagearray-4_9  ENA
sd storagearray-4_10-01 vol1-01 storagearray-4_10 0 4473925632 18/0 storagearray-4_10 ENA
sd storagearray-4_11-01 vol1-01 storagearray-4_11 0 4473925632 19/0 storagearray-4_11 ENA
sd storagearray-5_8-01 vol1-01  storagearray-5_8 0  4473925632 20/0 storagearray-5_8  ENA
sd storagearray-5_9-01 vol1-01  storagearray-5_9 0  4473925632 21/0 storagearray-5_9  ENA
sd storagearray-5_10-01 vol1-01 storagearray-5_10 0 4473925632 22/0 storagearray-5_10 ENA
sd storagearray-5_11-01 vol1-01 storagearray-5_11 0 4473925632 23/0 storagearray-5_11 ENA

 

 

< 2. VxVM maximum disk I/O: Read throughput test execution >

 

Raw disk device read I/O test execution and collection of throughput results:

 

vxbench sequential read test execution method and result collection

An example of the vxbench command that we run on each node is below.

This test executes 64 parallel processes, each process is reading from the same raw volume device, reading using a block size of 1MB.

The output of the vxbench command provides the combined total throughput of all 64 parallel processes, we capture this information in our result table.

 

The result in this example test was       1577033.29 KBytes/second

Therefore the result of this test was     1.504           GBytes/second

 

Test: vxbench

IO : sequential read of raw volume

IOsize=1024K

VxVM volume stripe width 512KB, 24 columns

Processes: 64

 

$ ./vxbench -w read -i iosize=1024k,iotime=300,maxfilesize=40T  /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1 /dev/vx/rdsk/testdg/vol1

user   1:  300.015 sec  24625.94 KB/s  cpu:  0.75 sys   0.00 user
user   2:  300.024 sec  24498.95 KB/s  cpu:  0.75 sys   0.00 user
user   3:  300.004 sec  24667.86 KB/s  cpu:  0.75 sys   0.00 user
user   4:  300.020 sec  24417.35 KB/s  cpu:  0.75 sys   0.00 user
user   5:  300.016 sec  24574.65 KB/s  cpu:  0.74 sys   0.01 user
user   6:  300.012 sec  24615.97 KB/s  cpu:  0.74 sys   0.01 user
user   7:  300.029 sec  24689.68 KB/s  cpu:  0.76 sys   0.00 user
user   8:  300.023 sec  24587.75 KB/s  cpu:  0.75 sys   0.00 user
user   9:  300.032 sec  24668.98 KB/s  cpu:  0.76 sys   0.00 user
user  10:  300.024 sec  24795.84 KB/s  cpu:  0.76 sys   0.00 user
user  11:  300.033 sec  24546.01 KB/s  cpu:  0.75 sys   0.00 user
user  12:  300.024 sec  24761.75 KB/s  cpu:  0.76 sys   0.00 user
user  13:  300.028 sec  24543.02 KB/s  cpu:  0.76 sys   0.00 user
user  14:  300.014 sec  24591.96 KB/s  cpu:  0.75 sys   0.01 user
user  15:  300.013 sec  24568.13 KB/s  cpu:  0.75 sys   0.00 user
user  16:  300.037 sec  24624.20 KB/s  cpu:  0.75 sys   0.00 user
user  17:  300.018 sec  24734.97 KB/s  cpu:  0.76 sys   0.00 user
user  18:  300.003 sec  24596.26 KB/s  cpu:  0.76 sys   0.00 user
user  19:  300.004 sec  24886.31 KB/s  cpu:  0.77 sys   0.00 user
user  20:  300.007 sec  24879.24 KB/s  cpu:  0.76 sys   0.00 user
user  21:  300.017 sec  24434.71 KB/s  cpu:  0.75 sys   0.00 user
user  22:  300.027 sec  24437.31 KB/s  cpu:  0.76 sys   0.00 user
user  23:  300.019 sec  24635.87 KB/s  cpu:  0.75 sys   0.00 user
user  24:  300.028 sec  24665.88 KB/s  cpu:  0.76 sys   0.00 user
user  25:  300.021 sec  24519.64 KB/s  cpu:  0.75 sys   0.00 user
user  26:  300.022 sec  24587.85 KB/s  cpu:  0.76 sys   0.00 user
user  27:  300.006 sec  24647.22 KB/s  cpu:  0.77 sys   0.00 user
user  28:  300.019 sec  24666.62 KB/s  cpu:  0.76 sys   0.00 user
user  29:  300.006 sec  24544.82 KB/s  cpu:  0.76 sys   0.00 user
user  30:  300.022 sec  24625.35 KB/s  cpu:  0.75 sys   0.00 user
user  31:  300.021 sec  24649.38 KB/s  cpu:  0.75 sys   0.00 user
user  32:  300.016 sec  24701.01 KB/s  cpu:  0.76 sys   0.00 user
user  33:  300.018 sec  24683.74 KB/s  cpu:  0.75 sys   0.00 user
user  34:  300.018 sec  24738.38 KB/s  cpu:  0.77 sys   0.00 user
user  35:  300.001 sec  24599.78 KB/s  cpu:  0.75 sys   0.00 user
user  36:  300.008 sec  24674.30 KB/s  cpu:  0.76 sys   0.00 user
user  37:  300.024 sec  24580.86 KB/s  cpu:  0.75 sys   0.00 user
user  38:  300.023 sec  24628.71 KB/s  cpu:  0.75 sys   0.00 user
user  39:  300.007 sec  24701.75 KB/s  cpu:  0.77 sys   0.00 user
user  40:  300.026 sec  24765.01 KB/s  cpu:  0.76 sys   0.00 user
user  41:  300.007 sec  24824.63 KB/s  cpu:  0.76 sys   0.00 user
user  42:  300.015 sec  24707.90 KB/s  cpu:  0.78 sys   0.00 user
user  43:  300.032 sec  24587.01 KB/s  cpu:  0.76 sys   0.00 user
user  44:  300.027 sec  24700.06 KB/s  cpu:  0.78 sys   0.00 user
user  45:  300.019 sec  24584.70 KB/s  cpu:  0.77 sys   0.00 user
user  46:  300.013 sec  24745.56 KB/s  cpu:  0.78 sys   0.00 user
user  47:  300.033 sec  24556.21 KB/s  cpu:  0.77 sys   0.00 user
user  48:  300.012 sec  24728.58 KB/s  cpu:  0.77 sys   0.01 user
user  49:  300.010 sec  24489.82 KB/s  cpu:  0.76 sys   0.00 user
user  50:  300.020 sec  24751.83 KB/s  cpu:  0.76 sys   0.01 user
user  51:  300.035 sec  24846.13 KB/s  cpu:  0.77 sys   0.00 user
user  52:  300.012 sec  24639.83 KB/s  cpu:  0.75 sys   0.00 user
user  53:  300.010 sec  24691.24 KB/s  cpu:  0.77 sys   0.00 user
user  54:  300.029 sec  24686.29 KB/s  cpu:  0.77 sys   0.00 user
user  55:  300.021 sec  24608.41 KB/s  cpu:  0.77 sys   0.00 user
user  56:  300.027 sec  24440.67 KB/s  cpu:  0.77 sys   0.00 user
user  57:  300.017 sec  24700.92 KB/s  cpu:  0.77 sys   0.00 user
user  58:  300.026 sec  24645.57 KB/s  cpu:  0.77 sys   0.00 user
user  59:  300.004 sec  24442.54 KB/s  cpu:  0.76 sys   0.00 user
user  60:  300.011 sec  24749.21 KB/s  cpu:  0.77 sys   0.00 user
user  61:  300.006 sec  24865.61 KB/s  cpu:  0.77 sys   0.00 user
user  62:  300.023 sec  24468.29 KB/s  cpu:  0.75 sys   0.00 user
user  63:  300.023 sec  24662.87 KB/s  cpu:  0.77 sys   0.00 user
user  64:  300.017 sec  24646.26 KB/s  cpu:  0.76 sys   0.00 user

total:     300.037 sec  1577033.29 KB/s  cpu: 48.63 sys   0.05 user

 

iostat throughput data method and result collection

 

An example of the iostat command that we run on each node is below.

The sector size is 512bytes.

Note that the average request size avgrq-sz is 1024, this is 1024 sectors * 512bytes = 512KB read I/O size.

 

The result in this example test is       3155251.20 sectors/second 

Therefore the result of the test is      1.504           GBytes/second

 

$iostat –x 20

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     4.30    0.00    1.25     0.00    44.40    35.52     0.00    0.12   0.04   0.01
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sdp               0.00     0.00   61.95    0.00 63436.80     0.00  1024.00     1.87   30.22  12.73  78.84
sdo               0.00     0.00   64.95    0.00 66508.80     0.00  1024.00     1.82   27.99  12.35  80.21
sdn               0.00     0.00   64.90    0.00 66457.60     0.00  1024.00     2.03   31.32  12.92  83.84
sds               0.00     0.00   64.90    0.00 66457.60     0.00  1024.00     2.05   31.60  12.76  82.79
sdt               0.00     0.00   64.75    0.00 66304.00     0.00  1024.00     2.16   33.38  12.47  80.75
sdq               0.00     0.00   63.20    0.00 64716.80     0.00  1024.00     1.88   29.71  12.37  78.20
sdr               0.00     0.00   64.75    0.00 66304.00     0.00  1024.00     2.05   31.67  12.28  79.48
sdx               0.00     0.00   62.40    0.00 63897.60     0.00  1024.00     2.05   32.81  13.16  82.10
sdz               0.00     0.00   66.60    0.00 68198.40     0.00  1024.00     2.37   35.60  12.27  81.70
sdy               0.00     0.00   64.65    0.00 66201.60     0.00  1024.00     2.37   36.69  12.89  83.35
sdaa              0.00     0.00   62.15    0.00 63641.60     0.00  1024.00     2.17   34.92  13.56  84.25
sdm               0.00     0.00   63.90    0.00 65433.60     0.00  1024.00     1.96   30.63  13.05  83.36
sdv               0.00     0.00   62.75    0.00 64256.00     0.00  1024.00     2.26   36.05  12.86  80.72
sdu               0.00     0.00   64.15    0.00 65689.60     0.00  1024.00     2.32   36.25  13.08  83.93
sdw               0.00     0.00   66.45    0.00 68044.80     0.00  1024.00     2.25   33.89  12.47  82.86
sdg               0.00     0.00   62.80    0.00 64307.20     0.00  1024.00     2.39   38.07  13.43  84.37
sdj               0.00     0.00   64.45    0.00 65996.80     0.00  1024.00     2.14   33.04  13.31  85.81
sdi               0.00     0.00   65.45    0.00 67020.80     0.00  1024.00     2.02   30.76  12.47  81.65
sdl               0.00     0.00   64.05    0.00 65587.20     0.00  1024.00     2.10   32.67  12.64  80.97
sdk               0.00     0.00   64.95    0.00 66508.80     0.00  1024.00     2.28   34.90  13.01  84.53
sdf               0.00     0.00   65.35    0.00 66918.40     0.00  1024.00     2.59   39.63  12.90  84.31
sde               0.00     0.00   62.45    0.00 63948.80     0.00  1024.00     2.38   38.16  12.90  80.53
sdh               0.00     0.00   65.25    0.00 66816.00     0.00  1024.00     2.43   37.21  12.73  83.05
sdab              0.00     0.00   64.30    0.00 65843.20     0.00  1024.00     2.15   33.46  13.61  87.53
sdac              0.00     0.00   63.55    0.00 65075.20     0.00  1024.00     2.23   35.19  13.48  85.69
sdad              0.00     0.00   63.20    0.00 64716.80     0.00  1024.00     2.01   31.87  13.19  83.39
sdae              0.00     0.00   66.40    0.00 67993.60     0.00  1024.00     2.25   33.89  12.58  83.50
sdaf              0.00     0.00   62.90    0.00 64409.60     0.00  1024.00     1.89   30.10  12.73  80.09
sdag              0.00     0.00   63.95    0.00 65484.80     0.00  1024.00     2.08   32.68  13.12  83.93
sdah              0.00     0.00   63.25    0.00 64768.00     0.00  1024.00     2.18   34.56  13.00  82.22
sdai              0.00     0.00   64.20    0.00 65740.80     0.00  1024.00     2.07   32.21  12.40  79.60
sdaj              0.00     0.00   65.70    0.00 67276.80     0.00  1024.00     2.44   37.07  12.33  81.03
sdak              0.00     0.00   62.75    0.00 64256.00     0.00  1024.00     2.44   38.80  13.33  83.64
sdal              0.00     0.00   65.40    0.00 66969.60     0.00  1024.00     2.41   36.79  13.08  85.56
sdam              0.00     0.00   62.90    0.00 64409.60     0.00  1024.00     2.26   35.92  12.97  81.59
sdan              0.00     0.00   66.00    0.00 67584.00     0.00  1024.00     2.13   32.23  12.36  81.59
sdao              0.00     0.00   64.05    0.00 65587.20     0.00  1024.00     2.28   35.60  13.04  83.54
sdap              0.00     0.00   62.05    0.00 63539.20     0.00  1024.00     2.15   34.65  12.77  79.26
sdaq              0.00     0.00   66.30    0.00 67891.20     0.00  1024.00     2.27   34.21  12.95  85.87
sdar              0.00     0.00   64.40    0.00 65945.60     0.00  1024.00     2.17   33.60  13.43  86.51
sdas              0.00     0.00   65.80    0.00 67379.20     0.00  1024.00     2.24   33.90  12.42  81.74
sdat              0.00     0.00   62.30    0.00 63795.20     0.00  1024.00     1.98   31.74  13.24  82.46
sdau              0.00     0.00   65.25    0.00 66816.00     0.00  1024.00     2.00   30.52  12.21  79.66
sdav              0.00     0.00   63.75    0.00 65280.00     0.00  1024.00     2.05   32.12  12.43  79.27
sdaw              0.00     0.00   63.55    0.00 65075.20     0.00  1024.00     2.11   33.08  13.10  83.22
sdax              0.00     0.00   63.80    0.00 65331.20     0.00  1024.00     2.21   34.69  12.86  82.07
sday              0.00     0.00   63.55    0.00 65075.20     0.00  1024.00     2.39   37.70  13.13  83.41
sdaz              0.00     0.00   64.95    0.00 66508.80     0.00  1024.00     2.35   36.08  13.09  85.01
VxVM59000         0.00     0.00 3081.30    0.00 3155251.20   0.00  1024.00   104.68   33.97   0.32 100.00

 

 

vxstat throughput data collection method and result collection

 

An example of the vxstat command that we run on each node is below.

The blocks in the vxstat output are in units of sectors, so the block size is 512bytes.

Note that ‘blocks read / operations read’ gives the average I/O size:   

 

2624512 BLOCKS READ / 2563 OPERATIONS READ / 2 = 512KB avg. read I/O size             

 

The result in this example test is          63109120 blocks (512 byte sectors) read every 20 seconds 

Therefore the result of the test is         1.504        GBytes/second

 

$ vxstat -g testdg -vd –I 20
                                         OPERATIONS       BLOCKS              AVG TIME(ms)
TYP NAME                                 READ     WRITE   READ        WRITE   READ  WRITE
Fri 27 Feb 2015 12:49:49 PM IST
dm  storagearray-0_16                    2563         0   2624512         0  29.86   0.00
dm  storagearray-0_17                    2564         0   2625536         0  32.01   0.00
dm  storagearray-0_18                    2568         0   2629632         0  32.08   0.00
dm  storagearray-0_20                    2568         0   2629632         0  33.20   0.00
dm  storagearray-1_6                     2570         0   2631680         0  32.47   0.00
dm  storagearray-1_7                     2569         0   2630656         0  34.50   0.00
dm  storagearray-1_8                     2572         0   2633728         0  35.12   0.00
dm  storagearray-1_9                     2573         0   2634752         0  36.11   0.00
dm  storagearray-2_5                     2576         0   2637824         0  32.81   0.00
dm  storagearray-2_6                     2572         0   2633728         0  34.88   0.00
dm  storagearray-2_7                     2570         0   2631680         0  34.93   0.00
dm  storagearray-2_8                     2569         0   2630656         0  36.84   0.00
dm  storagearray-3_4                     2570         0   2631680         0  30.09   0.00
dm  storagearray-3_6                     2568         0   2629632         0  32.30   0.00
dm  storagearray-3_7                     2570         0   2631680         0  31.84   0.00
dm  storagearray-3_8                     2572         0   2633728         0  33.96   0.00
dm  storagearray-4_8                     2567         0   2628608         0  30.40   0.00
dm  storagearray-4_9                     2567         0   2628608         0  32.82   0.00
dm  storagearray-4_10                    2566         0   2627584         0  32.41   0.00
dm  storagearray-4_11                    2564         0   2625536         0  34.69   0.00
dm  storagearray-5_8                     2563         0   2624512         0  36.54   0.00
dm  storagearray-5_9                     2563         0   2624512         0  37.37   0.00
dm  storagearray-5_10                    2563         0   2624512         0  37.57   0.00
dm  storagearray-5_11                    2563         0   2624512         0  39.20   0.00
vol vol1                                61630         0  63109120         0  33.92   0.00

 

 

portperfshow – FC switch port throughput data collection method and result collection

 

An example of the command used to collect the throughput at the switch port is below.

The portperfshow command reports the throughput for one switch, so two ‘portperfshow’ commands are executed, one for each FC switch.

The ‘portperfshow’ total is no use here, as we only want to collect the data for the specific ports that are connected to the host HBA FC ports.

In our test case this is port3 and port7. The other six ports are connected to the six modular storage arrays.

 

 

    FC_switch1:admin> portperfshow
    0      1      2      3      4      5      6      7      8      9     10     11     12     13     ...    Total
    ==============================================================================================================
    234.4m 237.3m 238.5m 704.4m 231.4m 242.6m 239.0m 717.7m 0      0      0      0      0      0     ...    2.8g



    FC_switch2:admin> portperfshow
    0      1      2      3      4      5      6      7      8      9     10     11     12     13     ...    Total
    ===============================================================================================================
    236.7m 236.0m 237.8m 708.0m 231.5m 238.2m 232.8m 715.5m 0      0      0      0      0      0     ...    2.8g

  

 

Therefore we have to add:

Switch1   port3 704.4 + port7 717.7 = 1422.1 MB/sec = 1.388769 Gbytes/sec

Switch2   port3 708.0 + port7 715.5 = 1423.5 MB/sec = 1.390137 Gbytes/sec

Total:                                                                              = 2.778906 Gbytes/sec                                

 

 

NOTE:

Measuring the throughout at the switch port always shows a higher reading than the throughput measured by vxbench/vxstat/iostat.

The measurement at the switch port is higher due to 8b/10b encoding overhead.

The I/O throughput reading is therefore best measured by vxbench/vxstat/iostat and not at the FC switch port.

 

Referring to the “Fibre channel roadmap v1.8” table at http://fibrechannel.org/fibre-channel-roadmaps.html

The 8GFC throughput is 1600MB/sec for full duplex, therefore the net throughput for each direction will be 800MB/sec.

As the HBA is a dual port card the maximum theoretical throughput for each direction will be 1600MB/sec.

However, referring to http://en.wikipedia.org/wiki/Fibre_Channel shows 8GFC is actually 797MB/sec for each direction.

Therefore using our dual port card the maximum theoretical throughput for each direction will be 1594MB/sec (1.5566 GB/sec)

 

Therefore, per the specification, the maximum theoretical throughput in our environment will be 1.5566 GB/sec per node.

 

 

< 3. VxVM maximum disk I/O: Test results and conclusions >

 

Raw volume device disk I/O throughput test results summary in Gbits per second:

 

Test program: vxbench

IO : sequential read of raw volume

IOsize=1024K

VxVM volume stripe widths 64KB, 512KB and 1024KB

VxVM volume 24 columns

Processes: 64

Summary of raw volume throughput (Gbits/sec)

 

Stripe width

nodes

vxbench

iostat

Summary

Gbits/sec

Recommended

64k

1

11.429

11.485

11.5

 

64k

2

19.457

19.543

19.5

 

512k

1

12.032

12.040

12.0

YES

512k

2

20.552

20.557

20.5

YES

1024k

1

12.029

12.037

12.0

 

1024k

2

20.341

20.331

20.3

 

 

 

Raw volume device disk I/O throughput detailed test results in GBytes per second:

 

vxbench GB/s

iostat GB/s

vxstat GB/s

FC Switch GB/s

Stripe width

Nodes

1st Node

2nd Node

Total

1st Node

2nd Node

Total

1st Node

2nd Node

Total

1st Switch

2nd Switch

Total

64k

1

1.429

 

1.429

1.436

 

1.436

1.428

 

1.428

0.782

0.795

1.577

64k

2

1.215

1.217

2.432

1.218

1.225

2.443

1.214

1.220

2.434

1.306

1.304

2.610

512k

1

1.504

 

1.504

1.505

 

1.505

1.504

 

1.504

0.803

0.829

1.632

512k

2

1.285

1.284

2.569

1.284

1.286

2.570

1.286

1.287

2.573

1.372

1.370

2.741

1024k

1

1.504

 

1.504

1.505

 

1.505

1.505

 

1.505

0.817

0.813

1.629

1024k

2

1.272

1.271

2.543

1.269

1.273

2.541

1.272

1.273

2.545

1.357

1.359

2.716

 

Conclusions and recommendations so far:

 

  1. Maximum I/O size setting (RHEL6.5)

    The default operating system maximum I/O size is 512KB, there is no need to change the operating system’s default maximum I/O size tunable values.

  2. VxVM stripe width setting

    The optimal VxVM stripe width for media server solutions is also 512KB, Veritas therefore recommend using VxVM stripe width of 512KB.

  3. VxVM stripe columns setting

    The hardware was configured to achieve maximum throughput when accessing all the available LUNs.

    The number of LUNs available using our storage configuration was 24.

    We therefore used all 24 LUNs in our VxVM volume to maximize the storage I/O bandwidth.

  4. Balanced I/O

    Using a VxVM stripe width of 512KB and 24 columns and utilizing all paths, we were able to achieve balanced I/O across all the LUNs (see the iostat output).

    This then allowed us to easily identify the HBA bottleneck (using a single node) and storage bottlenecks (using both nodes).

  5. Maximum achievable read I/O throughout using our hardware configuration

  • 12Gbits/sec (1.5Gbytes/sec) Performing I/O from one node:

    • using our hardware configuration, we identified the dual FC port HBA had a throughput bottleneck of 12Gbits/sec (1.5Gbytes/sec) – this is maximum throughput we can achieve from each node.

  • 20Gbits/sec (2.5Gbytes/sec) Performing I/O from two nodes:

    • using our hardware configuration, we identified the storage bottleneck of 20Gbits/sec (2.5Gbytes/sec)

Conclusion: From this point onwards we now know the maximum throughout achievable using our hardware configuration.

 

 

< 4. VxFS direct I/O maximum disk I/O>

 

Read throughput test execution

 

This VxFS direct I/O test mimics the VxVM raw disk test by performing direct I/O to one file that contains a single contiguous extent.

Thereby, all the vxbench processes begin reading from the same device offset.

This VxFS direct I/O test is therefore equivalent to the VxVM raw device test, only the starting offset into the device is different.

 

Here are the details of the file we created for this test:

# ls -li file1
4 -rw-r--r-- 1 root root 34359738368 Mar  3 14:25 file1
# ls -lhi file1
4 -rw-r--r-- 1 root root 32G Mar  3 14:25 file1
# du -h file1
32G     file1

 

One file with a single contiguous extent of size 32GB:

# fsmap -HA ./file1
Volume  Extent Type     File Offset      Dev Offset     Extent Size  Inode#
vol1    Data            0 Bytes         34359738368     32.00 GB          4

 

Here is how we created this file and performed this test, note that we strongly recommend a file system block size of 8192:

 

// mkfs

$ mkfs -t vxfs /dev/vx/rdsk/testdg/vol1
    version 10 layout
    107374182400 sectors, 6710886400 blocks of size 8192, log size 32768 blocks
    rcq size 8192 blocks
    largefiles supported
    maxlink supported

 

Note that for optimal read performance we recommend using the mount option of “noatime”.

The ‘noatime’ mount option prevents the inode access time being updated for every read operation.

 

// mount

$ mount -t vxfs -o noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1

 

// create a file with a single 32Gb extent and write to it

$ touch /data1/file1
$ /opt/VRTS/bin/setext -r 4194304 -f contig /data1/file1
$ dd if=/dev/zero of=/data1/file1 bs=128k count=262144
262144+0 records in
262144+0 records out
34359738368 bytes (34 GB) copied, 24.0118 s, 1.4 GB/s
$ /opt/VRTS/bin/fsmap -A /data1/file1
Volume  Extent Type     File Offset      Dev Offset     Extent Size  Inode#
   vol1         Data               0     34359738368     34359738368  4
$ ls -lh /data1/file1
-rw-r--r-- 1 root root 32G Mar  3 14:12 /data1/file1

 

// umount the file system to clear the file data from memory

$ umount /data1

 

// mount the file system from both nodes

$ mount -t vxfs -o noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1

 

// vxbench command execution, 64 processes reading from the same file using direct I/O

$./vxbench -w read -c direct -i iosize=1024k,iotime=300,maxfilesize=32G /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1 /data1/file1

 

 

 

< 5. VxFS direct I/O maximum disk I/O: Test results>

 

As expected the results are the same as the VxVM raw disk read throughput test (all results in GBytes/second)

VxFS direct IO 

vxbench

iostat

vxstat

FC Switch

Stripe width

Nodes

1st Node

2nd Node

Total

1st Node

2nd Node

Total

1st Node

2nd Node

Total

1st switch

2nd switch

Total

64k

1

1.423

 

1.423

1.428

 

1.428

1.428

 

1.428

0.769

0.768

1.537

64k

2

1.213

1.209

2.423

1.217

1.208

2.425

1.217

1.208

2.425

1.294

1.302

2.596

512k

1

1.502

 

1.502

1.504

 

1.504

1.504

 

1.504

0.801

0.802

1.603

512k

2

1.282

1.281

2.563

1.283

1.283

2.566

1.283

1.283

2.566

1.370

1.364

2.734

1024k

1

1.502

 

1.502

1.502

 

1.502

1.502

 

1.502

0.802

0.802

1.604

1024k

2

1.271

1.268

2.539

1.271

1.271

2.541

1.271

1.271

2.541

1.352

1.361

2.713

 

 

Using a stripe-width of 512KB is recommended by VERITAS for media server workloads.

 

The I/O is evenly balanced across all 24 LUNs. Below is the iostat output showing all 48 paths (1-node test):

IOstat
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sde               0.00     0.00   65.25    0.00 66816.00     0.00  1024.00     2.26   34.55  12.69  82.80
sdf               0.00     0.00   63.85    0.00 65382.40     0.00  1024.00     2.32   36.19  13.34  85.18
sdg               0.00     0.00   64.95    0.00 66508.80     0.00  1024.00     2.23   34.36  13.30  86.39
sdh               0.00     0.00   63.45    0.00 64972.80     0.00  1024.00     2.09   32.92  12.82  81.34
sdi               0.00     0.00   66.50    0.00 68096.00     0.00  1024.00     2.13   32.10  12.76  84.83
sdj               0.00     0.00   62.90    0.00 64409.60     0.00  1024.00     2.19   34.79  13.83  86.98
sdk               0.00     0.00   64.15    0.00 65689.60     0.00  1024.00     2.27   35.41  13.41  86.04
sdl               0.00     0.00   65.00    0.00 66560.00     0.00  1024.00     2.23   34.35  12.72  82.65
sdm               0.00     0.00   64.95    0.00 66508.80     0.00  1024.00     2.12   32.69  13.03  84.61
sdn               0.00     0.00   66.60    0.00 68198.40     0.00  1024.00     2.21   33.34  12.66  84.32
sdo               0.00     0.00   62.95    0.00 64460.80     0.00  1024.00     1.94   30.84  12.92  81.36
sdp               0.00     0.00   61.20    0.00 62668.80     0.00  1024.00     1.93   31.63  13.12  80.28
sdq               0.00     0.00   62.95    0.00 64460.80     0.00  1024.00     1.98   31.52  13.15  82.80
sdr               0.00     0.00   65.70    0.00 67276.80     0.00  1024.00     2.23   33.94  12.65  83.10
sds               0.00     0.00   66.40    0.00 67993.60     0.00  1024.00     2.25   33.97  13.16  87.41
sdt               0.00     0.00   63.85    0.00 65382.40     0.00  1024.00     2.20   34.40  13.47  85.97
sdu               0.00     0.00   62.60    0.00 64102.40     0.00  1024.00     2.32   37.00  13.60  85.17
sdv               0.00     0.00   65.00    0.00 66560.00     0.00  1024.00     2.36   36.23  12.79  83.13
sdw               0.00     0.00   62.65    0.00 64153.60     0.00  1024.00     2.20   35.17  13.05  81.75
sdx               0.00     0.00   64.85    0.00 66406.40     0.00  1024.00     2.50   38.48  13.12  85.11
sdy               0.00     0.00   62.80    0.00 64307.20     0.00  1024.00     1.84   29.32  12.63  79.30
sdz               0.00     0.00   64.75    0.00 66304.00     0.00  1024.00     2.13   32.81  12.73  82.45
sdaa              0.00     0.00   62.20    0.00 63692.80     0.00  1024.00     1.95   31.29  12.58  78.24
sdab              0.00     0.00   63.85    0.00 65382.40     0.00  1024.00     2.00   31.35  13.22  84.43
sdac              0.00     0.00   61.75    0.00 63232.00     0.00  1024.00     1.98   31.93  13.00  80.28
sdad              0.00     0.00   65.25    0.00 66816.00     0.00  1024.00     2.18   33.30  13.15  85.79
sdae              0.00     0.00   64.10    0.00 65638.40     0.00  1024.00     2.22   34.64  13.11  84.05
sdaf              0.00     0.00   63.25    0.00 64768.00     0.00  1024.00     2.08   32.85  12.67  80.12
sdag              0.00     0.00   62.95    0.00 64460.80     0.00  1024.00     2.19   34.82  12.93  81.40
sdah              0.00     0.00   64.30    0.00 65843.20     0.00  1024.00     2.31   36.00  13.21  84.95
sdai              0.00     0.00   63.30    0.00 64819.20     0.00  1024.00     2.22   35.10  13.36  84.54
sdak              0.00     0.00   65.35    0.00 66918.40     0.00  1024.00     2.13   32.52  12.42  81.17
sdal              0.00     0.00   62.55    0.00 64051.20     0.00  1024.00     2.17   34.68  12.67  79.22
sdaj              0.00     0.00   64.80    0.00 66355.20     0.00  1024.00     2.15   33.23  12.53  81.22
sdam              0.00     0.00   61.90    0.00 63385.60     0.00  1024.00     2.19   35.28  13.44  83.18
sdan              0.00     0.00   64.40    0.00 65945.60     0.00  1024.00     2.31   35.80  12.98  83.58
sdaq              0.00     0.00   66.10    0.00 67686.40     0.00  1024.00     2.46   37.17  12.69  83.87
sdao              0.00     0.00   65.55    0.00 67123.20     0.00  1024.00     2.28   34.79  12.84  84.16
sdap              0.00     0.00   63.50    0.00 65024.00     0.00  1024.00     2.44   38.47  13.48  85.58
sdar              0.00     0.00   64.55    0.00 66099.20     0.00  1024.00     2.40   37.14  13.59  87.73
sdas              0.00     0.00   65.80    0.00 67379.20     0.00  1024.00     2.10   32.03  12.92  85.01
sdat              0.00     0.00   63.25    0.00 64768.00     0.00  1024.00     2.01   31.74  12.76  80.72
sdau              0.00     0.00   65.75    0.00 67328.00     0.00  1024.00     1.98   30.15  12.60  82.86
sdav              0.00     0.00   63.25    0.00 64768.00     0.00  1024.00     2.10   33.10  13.19  83.44
sdaw              0.00     0.00   63.30    0.00 64819.20     0.00  1024.00     1.97   31.20  13.32  84.32
sdax              0.00     0.00   61.85    0.00 63334.40     0.00  1024.00     2.01   32.52  13.27  82.08
sday              0.00     0.00   65.35    0.00 66918.40     0.00  1024.00     1.92   29.36  12.36  80.74
sdaz              0.00     0.00   67.15    0.00 68761.60     0.00  1024.00     2.07   30.88  12.08  81.15
VxVM40000         0.00     0.00 3078.65    0.00 3152537.60   0.00  1024.00   103.77   33.71   0.32 100.00

 

 

< 6. VxVM raw disk and VxFS direct I/O>

 

Results comparison and conclusions

 

  • Raw volume device disk I/O throughput test results in Gbytes/sec :

VxVM RAW IO

vxbench

iostat

vxstat

FC Switch

Stripe width

Nodes

1st Node

2nd Node

Total

1st Node

2nd Node

Total

1st Node

2nd Node

Total

1st Switch

2nd Switch

Total

64k

1

1.429

 

1.429

1.436

 

1.436

1.428

 

1.428

0.782

0.795

1.577

64k

2

1.215

1.217

2.432

1.218

1.225

2.443

1.214

1.220

2.434

1.306

1.304

2.610

512k

1

1.504

 

1.504

1.505

 

1.505

1.504

 

1.504

0.803

0.829

1.632

512k

2

1.285

1.284

2.569

1.284

1.286

2.570

1.286

1.287

2.573

1.372

1.370

2.741

1024k

1

1.504

 

1.504

1.505

 

1.505

1.505

 

1.505

0.817

0.813

1.629

1024k

2

1.272

1.271

2.543

1.269

1.273

2.541

1.272

1.273

2.545

1.357

1.359

2.716

 

  • VxFS direct I/O disk I/O throughput test results in Gbytes/sec :

VxFS direct IO 

vxbench

iostat

vxstat

FC Switch

Stripe width

Nodes

1st Node

2nd Node

Total

1st Node

2nd Node

Total

1st Node

2nd Node

Total

1st switch

2nd switch

Total

64k

1

1.423

 

1.423

1.428

 

1.428

1.428

 

1.428

0.769

0.768

1.537

64k

2

1.213

1.209

2.423

1.217

1.208

2.425

1.217

1.208

2.425

1.294

1.302

2.596

512k

1

1.502

 

1.502

1.504

 

1.504

1.504

 

1.504

0.801

0.802

1.603

512k

2

1.282

1.281

2.563

1.283

1.283

2.566

1.283

1.283

2.566

1.370

1.364

2.734

1024k

1

1.502

 

1.502

1.502

 

1.502

1.502

 

1.502

0.802

0.802

1.604

1024k

2

1.271

1.268

2.539

1.271

1.271

2.541

1.271

1.271

2.541

1.352

1.361

2.713

 

Conclusion: The test results show that VxFS direct I/O does not degrade sequential read I/O throughput performance compared to raw disk.

  1. By creating a file system and creating a file with a single contiguous extent we could emulate the raw disk read throughput using VxFS direct I/O

  2. Each direct I/O read will fetch data from disk, so no buffering is being performed using either direct I/O or raw disk I/O.

  3. Using VxFS direct I/O and running an identical vxbench test, we hit the same maximum achievable read I/O throughout.

Therefore the sequential read throughput was not impacted using VxFS direct I/O compared to reading from VxVM raw disk.

 

 

< 7. VxFS buffered I/O maximum disk I/O throughput test>

 

Test execution

 

This VxFS buffered I/O test is different. For the buffered read I/O throughout test, each process needs to read from a different file.

To prepare the files for this test we pre-allocate 16GB of file system space to each file, then write to the files to increase their file size to 16GB.

To pre-create the 64 files for this test the following script can used. The script assumes an 8192 byte file system block size is being used.

 

mkdir /data1/primary
mkdir /data1/secondary


for n in `seq 1 64`
do 
     touch /data1/primary/file${n};
     /opt/VRTS/bin/setext -r 2097152 -f contig /data1/primary/file${n};
     dd if=/dev/zero of=/data1/primary/file${n} bs=128k count=131072 &

     touch /data1/secondary/file${n};
     /opt/VRTS/bin/setext -r 2097152 -f contig /data1/secondary/file${n};
     dd if=/dev/zero of=/data1/secondary/file${n} bs=128k count=131072 &
done

 

When this script has finished some of the file data will remain in memory, before we run our buffered I/O test we need to remove the file data from memory.

 

Note that for improved read performance you can also use the “noatime” mount option.

The ‘noatime’ mount option prevents the inode access time being updated for every read operation.

We did not use the “noatime” mount option in our test.

 

To remove the file data from memory the file system can be umounted and mounted again.

Alternatively, a simple trick can be used to remove the file data from memory before each test run by using the “remount” mount option, as follows:

 

// mount

$ mount -t vxfs -o remount,noatime,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1

 

Again we are using vxbench to perform our test. This time however we need to explicitly stipulate the path to each separate file on the vxbench command line, as shown below.

Note also that the iosize argument has been changed, we are no longer reading using a 1024KB block size; in our  VxFS buffered I/O test we are reading using a 32KB block size, because a smaller read(2) iosize will be used in the media server solution implementation.

 

# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G  /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64

 

 

< 8. VxFS buffered I/O max disk I/O throughput test >

Tests and test results and individual test conclusions

 

All the tests in this entire report read from disk using sequential read I/O.

 

VxFS readahead is required

The greatest impact to the performance of sequential reads from disk when using VxFS/CFS buffered I/O is readahead.

File system readahead utilizes the file system page cache to asynchronously pre-fetch file data into memory, this logically benefits sequential read I/O performance.

Our buffered I/O sequential read performance tests demonstrate the impact of readahead and highlight how tuning readahead can avoid a potential imbalance in throughput between processes.

Readahead is tunable using the ‘read_pref_io’ and ‘read_nstream’ VxFS tunables.

 

The VxVM volume configuration will impact readahead

We have already determined, in our earlier testing above, that the optimal VxVM stripe-width to maximize the I/O throughput is 512KB running our test.

In our storage configuration we created 24 LUNs across 6 modular arrays, by striping across all 24 LUNs we can balance the I/O across the LUNs and maximize the overall storage bandwidth.

Using this optimal volume configuration we could easily identify two bottlenecks, one due to the FC HBA ports (a per-node bottleneck) and the other bottleneck in the storage itself.

However the volume stripe width and the number of columns (LUNs) in the volume are also used to auto-tune the values for the ‘read_pref_io’ and ‘read_nstream’ VxFS tunables.

 

VxFS readahead tunables – default values

When mounting a VxFS file system it will auto-tune values for the ‘read_pref_io’ and  ‘read_nstream’ VxFS tunables. These two tunables are used to tune VxFS readahead.

The value for read_pref_io will be set to the VxVM volume stripe width – therefore the default auto-tuned value is read_pref_io=524288 in our test.

The value for read_nstream will be the number of columns (LUNs) in the volume – therefore the default auto-tuned value is read_nstream=24 in our test.

VxFS picks the default values for these tunables from the VxVM volume configuration.

This means read_pref_io=524288 and read_nstream=24 will be set by default by VxFS at mount time using our volume configuration.

 

VxFS readahead tunables – maximum amount file data that will be pre-fetched

The maximum amount of file data that is pre-fetched from disk using read_ahead is determined by read_pref_io*read_nstream.

Therefore, by default, the maximum amount of read_ahead will be “512KB * 24 = 12MB” using our  volume configuration.

As we will see during the buffered I/O testing, pre-fetching 12MB of file data is too much readahead, we found this caused an imbalance in read I/O throughput between processes.

 

VxFS readahead tunable – read_pref_io

The VxFS read_pref_io tunable is set to the VxVM volume stripe-width by default. The tunable means the “preferred read I/O size”.

VxFS readahead will be triggered by two sequential read I/O’s. The amount of file data to pre-fetch from disk is increased as more sequential I/O’s are performed.

As mentioned above, the maximum amount of readahead (the maximum amount of file data to pre-fetch from disk) is read_pref_io*read_nstream.

However the maximum I/O request size submitted by VxFS to VxVM will be ‘read_pref_io’. Therefore read_pref_io is the maximum read I/O request size submitted to VxVM.

What does it mean if read_pref_io is set to 512KB:

If (for example) we read a file using the ‘dd’ command and use a dd block size of 8KB, then VxFS readahead will pre-fetch the file data using I/O requests of size 512KB to VxVM.

Readahead can therefore result in a smaller number of I/O’s and a larger I/O request size, thus improving read I/O performance.

 

Veritas do not recommend tuning ‘read_pref_io’ from its default auto-tuned value.

If a different value (other than the default value) for ‘read_pref_io’ is desired, then Veritas recommend changing the volume stripe width instead.

 

VxFS readahead tunable – read_nstream

The read_nstream value defaults to the number of columns in the VxVM volume.

As mentioned above, the maximum amount of readahead (the maximum amount of file data to pre-fetch from disk) is read_pref_io*read_nstream

To reduce the maximum amount of read_ahead simply reduce the value of read_nstream, please see the results of our tests using different values for read_nstream below.

 

 

 The best practice for tuning readahead is as follows:

  • Do not change the auto-tuned value for read_pref_io, if you want to change read_pref_io change the VxVM volume stripe-width instead.

  • Reduce read_nstream to reduce the amount of readahead

  • You could disable readahead if necessary, but this will usually be a disadvantage (see test4).

  • Use /etc/tunefstab to set read_nstream, this means the value will persist across a reboot.

     

     

    Summary:

     

    By performing sequential reads using VxFS buffered I/O and performing readahead, the application I/O size is effectively converted to read_pref_io sized requests to VxVM.

    So there are two performance benefits of readahead, one is to pre-fetch file data from disk, the other is to increase the I/O size of the read request from disk (so reducing the number of I/O’s).

     

    These buffered I/O throughput tests will therefore help you decide what stripe-width, number of columns and readahead tuning is best for your solution implementation.

    Also, these buffered I/O throughput tests will help you to determine how many running processes you will want to be reading from disk at the same time.

     

 

 

Buffered I/O tests:

We have chosen a volume configuration that was best for disk I/O performance, however this volume configuration also results in very aggressive read_ahead (12MB at maximum).

With a stripe_width of 512KB and 24 LUNs (columns) the default maximum read_ahead is therefore too aggressive.

 

TEST1:  Use the default auto-tuned settings, using one node: <this is the baseline test>

Baseline vxbench test – 64files/64processess/32KB block size

Default auto-tuning    – read_ahead enabled/read_nstream=24/read_pref_io=524288

 

# vxtunefs /data1

Filesystem I/O parameters for /data1

read_pref_io = 524288

read_nstream = 24

read_ahead = 1

 

# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1

# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G  /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64

user   1:  300.062 sec  48868.77 KB/s  cpu:  9.78 sys   0.08 user
user   2:  300.102 sec  48370.93 KB/s  cpu:  9.78 sys   0.06 user
user   3:  300.042 sec  48094.01 KB/s  cpu:  9.86 sys   0.08 user
user   4:  300.176 sec   4461.92 KB/s  cpu:  1.01 sys   0.00 user
user   5:  300.105 sec   4584.10 KB/s  cpu:  1.12 sys   0.00 user
user   6:  300.102 sec  48125.32 KB/s  cpu:  9.85 sys   0.08 user
user   7:  300.031 sec  48341.50 KB/s  cpu:  9.79 sys   0.07 user
user   8:  300.201 sec   4583.81 KB/s  cpu:  1.12 sys   0.01 user
user   9:  300.194 sec   4582.32 KB/s  cpu:  1.14 sys   0.00 user
user  10:  300.203 sec   4755.40 KB/s  cpu:  1.19 sys   0.00 user
user  11:  300.126 sec  48121.38 KB/s  cpu:  9.74 sys   0.08 user
user  12:  300.220 sec   4500.70 KB/s  cpu:  1.01 sys   0.00 user
user  13:  300.201 sec   4665.25 KB/s  cpu:  1.11 sys   0.00 user
user  14:  300.086 sec  48291.58 KB/s  cpu:  9.74 sys   0.07 user
user  15:  300.165 sec   4501.41 KB/s  cpu:  1.01 sys   0.01 user
user  16:  300.203 sec   4633.57 KB/s  cpu:  1.16 sys   0.00 user
user  17:  300.147 sec  48159.06 KB/s  cpu:  9.64 sys   0.08 user
user  18:  300.035 sec  48504.56 KB/s  cpu:  9.41 sys   0.08 user
user  19:  300.078 sec  48497.65 KB/s  cpu:  9.73 sys   0.07 user
user  20:  300.161 sec  48238.58 KB/s  cpu:  9.66 sys   0.08 user
user  21:  300.136 sec  48201.71 KB/s  cpu:  9.74 sys   0.08 user
user  22:  300.193 sec   4705.78 KB/s  cpu:  1.21 sys   0.00 user
user  23:  300.086 sec  48045.94 KB/s  cpu:  9.86 sys   0.07 user
user  24:  300.062 sec  47926.93 KB/s  cpu:  9.69 sys   0.08 user
user  25:  300.198 sec   4460.09 KB/s  cpu:  1.11 sys   0.01 user
user  26:  300.207 sec   4623.79 KB/s  cpu:  1.09 sys   0.00 user
user  27:  300.215 sec   4582.00 KB/s  cpu:  1.01 sys   0.00 user
user  28:  300.125 sec  48203.53 KB/s  cpu:  9.70 sys   0.08 user
user  29:  300.141 sec  48323.77 KB/s  cpu:  9.65 sys   0.07 user
user  30:  300.212 sec   4705.48 KB/s  cpu:  1.20 sys   0.00 user
user  31:  300.153 sec  48485.59 KB/s  cpu:  9.72 sys   0.07 user
user  32:  300.163 sec  48033.68 KB/s  cpu:  9.66 sys   0.07 user
user  33:  300.160 sec  48525.35 KB/s  cpu:  9.82 sys   0.07 user
user  34:  300.144 sec   4624.56 KB/s  cpu:  1.09 sys   0.01 user
user  35:  300.102 sec  48002.47 KB/s  cpu:  9.60 sys   0.07 user
user  36:  300.203 sec   4821.38 KB/s  cpu:  1.18 sys   0.01 user
user  37:  300.006 sec  48072.18 KB/s  cpu:  9.64 sys   0.07 user
user  38:  300.219 sec   4746.29 KB/s  cpu:  1.15 sys   0.00 user
user  39:  300.213 sec   4701.73 KB/s  cpu:  1.18 sys   0.00 user
user  40:  300.176 sec   4460.00 KB/s  cpu:  1.13 sys   0.00 user
user  41:  300.207 sec   4583.50 KB/s  cpu:  1.05 sys   0.00 user
user  42:  300.213 sec   4624.56 KB/s  cpu:  1.03 sys   0.00 user
user  43:  300.049 sec  48789.10 KB/s  cpu:  9.87 sys   0.08 user
user  44:  300.207 sec   4708.85 KB/s  cpu:  1.18 sys   0.00 user
user  45:  300.077 sec  48129.27 KB/s  cpu:  9.59 sys   0.07 user
user  46:  300.079 sec  48374.66 KB/s  cpu:  9.74 sys   0.07 user
user  47:  300.099 sec  48494.28 KB/s  cpu:  9.64 sys   0.09 user
user  48:  300.064 sec  48581.86 KB/s  cpu:  9.47 sys   0.08 user
user  49:  300.199 sec   4705.78 KB/s  cpu:  1.10 sys   0.00 user
user  50:  300.204 sec   4788.64 KB/s  cpu:  1.20 sys   0.01 user
user  51:  300.032 sec   9044.38 KB/s  cpu:  1.94 sys   0.02 user
user  52:  300.120 sec  47917.67 KB/s  cpu:  9.69 sys   0.06 user
user  53:  300.128 sec  48407.76 KB/s  cpu:  9.56 sys   0.07 user
user  54:  300.203 sec   4746.24 KB/s  cpu:  1.07 sys   0.00 user
user  55:  300.201 sec   4460.37 KB/s  cpu:  1.02 sys   0.01 user
user  56:  300.206 sec   4623.49 KB/s  cpu:  1.11 sys   0.00 user
user  57:  300.212 sec   4664.43 KB/s  cpu:  1.09 sys   0.00 user
user  58:  300.212 sec   4664.76 KB/s  cpu:  1.06 sys   0.00 user
user  59:  300.211 sec   4623.52 KB/s  cpu:  1.04 sys   0.01 user
user  60:  300.206 sec   4623.80 KB/s  cpu:  1.08 sys   0.01 user
user  61:  300.133 sec  12111.95 KB/s  cpu:  2.64 sys   0.02 user
user  62:  300.035 sec   9945.29 KB/s  cpu:  2.15 sys   0.01 user
user  63:  300.195 sec   4583.47 KB/s  cpu:  1.13 sys   0.00 user
user  64:  300.047 sec  48093.15 KB/s  cpu:  9.80 sys   0.09 user

total:     300.220 sec  1578817.49 KB/s  cpu: 323.53 sys   2.31 user

 

Conclusion to TEST1: <this is our baseline test, using the default auto-tuned values, read_nstream is therefore set to its default value of 24>

  • This test ran for 300.220 seconds and read from disk at an average rate of 1578817.49 KB/sec, vxbench therefore read 452 GB of data from disk.

  • The throughput per process is very imbalanced, some processes achieved ~49000 KB/sec others processes only achieved ~4800 KB/sec

  • However the maximum possible read I/O throughput from one node is still being achieved 1578817.49 KB/sec = 1.506 GB/sec

  • The problem is not the total throughput, the problem is the maximum readahead per process is 12MB at a time

  • 12MB of readahead (read_pref_io*read_nstream) is too aggressive and is causing an imbalance of throughout between processes.

  • This readahead configuration is therefore a failure, too much readahead is causing an imbalance of throughput between the processes.

 

  • We do not want to change the value of read_pref_io because we want to request large I/O sizes for better performance.

  • By default the VxFS read_pref_io tunable is set to the VxVM volume stripe-width, in our test this value is 512KB.

  • By default the VxFS read_nstream tunable is set to the number of columns in the VxVM volume, in our test this value is 24 (we have 24 LUNs).

     

  • Next, we therefore want to experiment by setting smaller values of read_nstream and also test with read_ahead disabled as well.

  • Our goal is to maintain the maximum amount of total throughput (approx. 1.5Gbytes/sec) whilst also spreading this throughput evenly between all the active processes reading from disk.

     

 

TEST2: change read_nstream to 1, keep everything else the same as the baseline test.

vxbench – 64files/64processess/32KB block size

Tuning    – read_ahead enabled/read_nstream=1/read_pref_io=524288

 

# vxtunefs /data1 -o read_nstream=1

UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1

 

# vxtunefs /data1

Filesystem I/O parameters for /data1

read_pref_io = 524288

read_nstream = 1

read_ahead = 1

 

# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1

#./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G  /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64

user   1:  300.013 sec  24639.76 KB/s  cpu:  5.35 sys   0.05 user
user   2:  300.044 sec  24748.27 KB/s  cpu:  5.41 sys   0.06 user
user   3:  300.010 sec  24706.66 KB/s  cpu:  5.52 sys   0.06 user
user   4:  300.021 sec  24872.94 KB/s  cpu:  5.46 sys   0.05 user
user   5:  300.023 sec  24724.40 KB/s  cpu:  5.58 sys   0.05 user
user   6:  300.060 sec  24683.79 KB/s  cpu:  5.58 sys   0.06 user
user   7:  300.021 sec  24744.96 KB/s  cpu:  5.66 sys   0.06 user
user   8:  300.016 sec  24680.46 KB/s  cpu:  5.49 sys   0.06 user
user   9:  300.017 sec  24784.51 KB/s  cpu:  5.55 sys   0.06 user
user  10:  300.021 sec  24744.97 KB/s  cpu:  5.54 sys   0.05 user
user  11:  300.015 sec  24747.12 KB/s  cpu:  5.54 sys   0.06 user
user  12:  300.017 sec  24830.60 KB/s  cpu:  5.46 sys   0.05 user
user  13:  300.013 sec  24824.11 KB/s  cpu:  5.61 sys   0.05 user
user  14:  300.028 sec  24729.11 KB/s  cpu:  5.57 sys   0.05 user
user  15:  300.017 sec  24752.09 KB/s  cpu:  5.42 sys   0.06 user
user  16:  300.028 sec  24655.71 KB/s  cpu:  5.53 sys   0.06 user
user  17:  300.013 sec  24834.38 KB/s  cpu:  5.68 sys   0.05 user
user  18:  300.048 sec  24773.52 KB/s  cpu:  5.52 sys   0.07 user
user  19:  300.024 sec  24697.01 KB/s  cpu:  5.50 sys   0.07 user
user  20:  300.012 sec  24938.48 KB/s  cpu:  5.61 sys   0.06 user
user  21:  300.016 sec  24646.33 KB/s  cpu:  5.54 sys   0.06 user
user  22:  300.016 sec  24689.11 KB/s  cpu:  5.57 sys   0.05 user
user  23:  300.019 sec  24695.60 KB/s  cpu:  5.50 sys   0.06 user
user  24:  300.023 sec  24719.31 KB/s  cpu:  5.59 sys   0.05 user
user  25:  300.015 sec  24755.66 KB/s  cpu:  5.58 sys   0.05 user
user  26:  300.018 sec  24596.75 KB/s  cpu:  5.59 sys   0.07 user
user  27:  300.049 sec  24717.11 KB/s  cpu:  5.54 sys   0.08 user
user  28:  300.019 sec  24753.74 KB/s  cpu:  5.59 sys   0.06 user
user  29:  300.021 sec  24214.23 KB/s  cpu:  5.44 sys   0.06 user
user  30:  300.021 sec  24772.27 KB/s  cpu:  5.61 sys   0.05 user
user  31:  300.019 sec  24908.96 KB/s  cpu:  5.68 sys   0.05 user
user  32:  300.045 sec  24637.23 KB/s  cpu:  5.53 sys   0.06 user
user  33:  300.053 sec  24677.55 KB/s  cpu:  5.59 sys   0.05 user
user  34:  300.017 sec  24692.39 KB/s  cpu:  5.60 sys   0.07 user
user  35:  300.018 sec  24787.86 KB/s  cpu:  5.55 sys   0.06 user
user  36:  300.019 sec  24741.70 KB/s  cpu:  5.57 sys   0.07 user
user  37:  300.015 sec  24813.68 KB/s  cpu:  5.52 sys   0.06 user
user  38:  300.014 sec  24808.66 KB/s  cpu:  5.40 sys   0.06 user
user  39:  300.013 sec  24716.57 KB/s  cpu:  5.53 sys   0.06 user
user  40:  300.024 sec  24705.55 KB/s  cpu:  5.54 sys   0.06 user
user  41:  300.039 sec  24796.47 KB/s  cpu:  5.50 sys   0.05 user
user  42:  300.044 sec  24852.33 KB/s  cpu:  5.60 sys   0.05 user
user  43:  300.044 sec  24836.97 KB/s  cpu:  5.59 sys   0.06 user
user  44:  300.028 sec  24735.94 KB/s  cpu:  5.54 sys   0.05 user
user  45:  300.060 sec  24803.28 KB/s  cpu:  5.71 sys   0.05 user
user  46:  300.019 sec  24830.57 KB/s  cpu:  5.54 sys   0.07 user
user  47:  300.052 sec  24587.20 KB/s  cpu:  5.57 sys   0.05 user
user  48:  300.020 sec  24750.25 KB/s  cpu:  5.54 sys   0.06 user
user  49:  300.016 sec  24675.38 KB/s  cpu:  5.53 sys   0.04 user
user  50:  300.020 sec  24704.09 KB/s  cpu:  5.52 sys   0.06 user
user  51:  300.035 sec  24716.59 KB/s  cpu:  5.37 sys   0.06 user
user  52:  300.049 sec  24700.04 KB/s  cpu:  5.54 sys   0.05 user
user  53:  300.022 sec  24818.32 KB/s  cpu:  5.40 sys   0.06 user
user  54:  300.014 sec  24725.01 KB/s  cpu:  5.50 sys   0.06 user
user  55:  300.026 sec  24683.17 KB/s  cpu:  5.57 sys   0.05 user
user  56:  300.058 sec  24786.37 KB/s  cpu:  5.65 sys   0.04 user
user  57:  300.022 sec  24850.79 KB/s  cpu:  5.61 sys   0.06 user
user  58:  300.021 sec  24702.35 KB/s  cpu:  5.38 sys   0.06 user
user  59:  300.015 sec  24735.30 KB/s  cpu:  5.59 sys   0.06 user
user  60:  300.027 sec  24840.10 KB/s  cpu:  5.58 sys   0.05 user
user  61:  300.021 sec  24687.03 KB/s  cpu:  5.58 sys   0.06 user
user  62:  300.021 sec  24799.55 KB/s  cpu:  5.61 sys   0.06 user
user  63:  300.011 sec  24744.08 KB/s  cpu:  5.42 sys   0.05 user
user  64:  300.015 sec  24655.08 KB/s  cpu:  5.54 sys   0.06 user

total:     300.061 sec  1582992.52 KB/s  cpu: 354.62 sys   3.65 user

 

Conclusion to TEST2: <read_nstream set to 1>

  • Using read_nstream=1 produces a perfect balance in throughput per process (~24700 KB/sec), so now all the process have the same consistent throughput during the test:

    • The maximum total throughput from one node is still being achieved (1582992.52 KB/s), approx. 1.5 GB/sec

    • The total throughput is now divided evenly across all 64 processes and remains consistent throughout the test.

    • The average read I/O size is obviously still 512KB (avgrq-sz = 1024.00), this is because read_pref_io is set to 512KB.

    • The I/O is obviously evenly balanced across all 24 LUNs ( see r/s and rsec/s in the iostat output below)

    • Most importantly the I/O throughput is now evenly balanced across all 64 processes, yet the total throughput remains the same.

  • The maximum readahead per process is now 512KB

  • The throughput per process is now therefore balanced, all 64 processes are now consitently performing approx. 24700 KB/s – perfect!!

 

  • Please note:

    • If the throughput per process had not been evenly distributed using read_nstream=1, then we would recommend reducing the stripe-width to 256KB or 128KB

    • Reducing the stripe-width will reduce the default value of “read_pref_io”.

    • We do not advise tuning read_pref_io to override its default value, we recommend tuning the VxVM volume stripe-width instead.

 

 

# iostat –x 20
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sde               0.00     0.00   63.90    0.00 65433.60     0.00  1024.00     2.88   45.11  15.37  98.20
sdf               0.00     0.00   64.35    0.00 65894.40     0.00  1024.00     2.88   44.76  15.08  97.06
sdg               0.00     0.00   64.75    0.00 66304.00     0.00  1024.00     2.91   44.88  15.17  98.19
sdh               0.00     0.00   64.95    0.00 66508.80     0.00  1024.00     2.90   44.61  15.01  97.48
sdi               0.00     0.00   64.80    0.00 66355.20     0.00  1024.00     2.64   40.74  14.97  97.00
sdj               0.00     0.00   65.40    0.00 66969.60     0.00  1024.00     2.70   41.32  14.85  97.11
sdk               0.00     0.00   65.05    0.00 66611.20     0.00  1024.00     2.73   42.02  14.87  96.71
sdl               0.00     0.00   63.30    0.00 64819.20     0.00  1024.00     2.61   41.25  15.39  97.39
sdm               0.00     0.00   64.50    0.00 66048.00     0.00  1024.00     2.77   42.91  15.14  97.67
sdn               0.00     0.00   64.85    0.00 66406.40     0.00  1024.00     2.79   43.06  14.89  96.58
sdo               0.00     0.00   63.10    0.00 64614.40     0.00  1024.00     2.66   42.17  15.23  96.12
sdp               0.00     0.00   65.45    0.00 67020.80     0.00  1024.00     2.80   42.74  14.97  97.97
sdq               0.00     0.00   64.00    0.00 65536.00     0.00  1024.00     2.66   41.63  15.10  96.61
sdr               0.00     0.00   64.55    0.00 66099.20     0.00  1024.00     2.72   42.14  15.14  97.70
sds               0.00     0.00   64.20    0.00 65740.80     0.00  1024.00     2.65   41.27  15.09  96.91
sdt               0.00     0.00   64.85    0.00 66406.40     0.00  1024.00     2.75   42.45  14.94  96.85
sdu               0.00     0.00   64.65    0.00 66201.60     0.00  1024.00     2.66   41.25  15.05  97.30
sdv               0.00     0.00   63.85    0.00 65382.40     0.00  1024.00     2.64   41.33  15.18  96.90
sdw               0.00     0.00   64.25    0.00 65792.00     0.00  1024.00     2.63   41.00  15.12  97.17
sdx               0.00     0.00   64.95    0.00 66508.80     0.00  1024.00     2.68   41.32  14.87  96.55
sdy               0.00     0.00   64.00    0.00 65536.00     0.00  1024.00     2.71   42.18  15.04  96.26
sdz               0.00     0.00   63.85    0.00 65382.40     0.00  1024.00     2.70   42.16  15.21  97.14
sdaa              0.00     0.00   64.80    0.00 66355.20     0.00  1024.00     2.68   41.35  15.08  97.71
sdab              0.00     0.00   65.05    0.00 66611.20     0.00  1024.00     2.70   41.53  15.03  97.80
sdac              0.00     0.00   64.15    0.00 65689.60     0.00  1024.00     2.57   40.17  15.02  96.34
sdad              0.00     0.00   63.50    0.00 65024.00     0.00  1024.00     2.56   40.34  15.23  96.69
sdae              0.00     0.00   64.00    0.00 65536.00     0.00  1024.00     2.57   40.21  15.08  96.51
sdaf              0.00     0.00   65.65    0.00 67225.60     0.00  1024.00     2.65   40.35  14.77  96.97
sdag              0.00     0.00   65.25    0.00 66816.00     0.00  1024.00     2.95   45.29  15.03  98.04
sdah              0.00     0.00   64.50    0.00 66048.00     0.00  1024.00     2.91   45.07  15.16  97.75
sdai              0.00     0.00   64.25    0.00 65792.00     0.00  1024.00     2.88   44.77  15.22  97.79
sdak              0.00     0.00   64.85    0.00 66406.40     0.00  1024.00     2.64   40.69  14.89  96.56
sdal              0.00     0.00   64.50    0.00 66048.00     0.00  1024.00     2.66   41.21  15.16  97.80
sdaj              0.00     0.00   64.05    0.00 65587.20     0.00  1024.00     2.90   45.20  15.21  97.45
sdam              0.00     0.00   64.75    0.00 66304.00     0.00  1024.00     2.64   40.75  15.04  97.39
sdan              0.00     0.00   64.05    0.00 65587.20     0.00  1024.00     2.63   41.15  15.09  96.68
sdaq              0.00     0.00   64.15    0.00 65689.60     0.00  1024.00     2.74   42.72  15.23  97.68
sdao              0.00     0.00   64.75    0.00 66304.00     0.00  1024.00     2.74   42.36  15.02  97.26
sdap              0.00     0.00   64.95    0.00 66508.80     0.00  1024.00     2.81   43.19  14.91  96.87
sdar              0.00     0.00   63.85    0.00 65382.40     0.00  1024.00     2.74   42.83  15.30  97.67
sdas              0.00     0.00   64.20    0.00 65740.80     0.00  1024.00     2.67   41.61  15.19  97.53
sdat              0.00     0.00   65.05    0.00 66611.20     0.00  1024.00     2.69   41.29  14.99  97.53
sdau              0.00     0.00   64.85    0.00 66406.40     0.00  1024.00     2.66   41.04  14.87  96.41
sdav              0.00     0.00   64.00    0.00 65536.00     0.00  1024.00     2.65   41.37  15.03  96.17
sdaw              0.00     0.00   64.55    0.00 66099.20     0.00  1024.00     2.82   43.67  15.25  98.45
sdax              0.00     0.00   64.05    0.00 65587.20     0.00  1024.00     2.82   44.11  15.14  96.96
sday              0.00     0.00   65.85    0.00 67430.40     0.00  1024.00     2.88   43.75  14.85  97.81
sdaz              0.00     0.00   63.65    0.00 65177.60     0.00  1024.00     2.80   43.95  15.34  97.66
VxVM56000         0.00     0.00 3094.90    0.00 3169177.60   0.00  1024.00   131.05   42.35   0.32 100.00

 

 

TEST3: change read_nstream to 1, read from 16 files using 16 processes, keep everything else the same as the baseline test.

vxbench – 16files/16processess/32KB block size

Tuning    – read_ahead enabled/read_nstream=1/read_pref_io=524288

 

# vxtunefs /data1 -o read_nstream=1

UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1

 

# vxtunefs /data1

Filesystem I/O parameters for /data1

read_pref_io = 524288

read_nstream = 1

read_ahead = 1

 

# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1

# ./vxbench -w read -i iosize=32k,iotime=120,maxfilesize=16G /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16

user   1:  120.030 sec  97417.75 KB/s  cpu:  7.43 sys   0.07 user
user   2:  120.037 sec  98452.82 KB/s  cpu:  7.68 sys   0.09 user
user   3:  120.033 sec  98302.87 KB/s  cpu:  7.55 sys   0.08 user
user   4:  120.031 sec  98227.29 KB/s  cpu:  7.37 sys   0.08 user
user   5:  120.030 sec  98381.88 KB/s  cpu:  7.89 sys   0.05 user
user   6:  120.033 sec  98272.61 KB/s  cpu:  7.42 sys   0.07 user
user   7:  120.032 sec  97744.74 KB/s  cpu:  7.70 sys   0.08 user
user   8:  120.037 sec  98069.12 KB/s  cpu:  7.74 sys   0.10 user
user   9:  120.030 sec  98603.74 KB/s  cpu:  7.79 sys   0.06 user
user  10:  120.036 sec  98756.87 KB/s  cpu:  7.82 sys   0.07 user
user  11:  120.037 sec  98513.11 KB/s  cpu:  7.78 sys   0.10 user
user  12:  120.040 sec  98360.81 KB/s  cpu:  7.80 sys   0.08 user
user  13:  120.030 sec  98488.47 KB/s  cpu:  7.48 sys   0.09 user
user  14:  120.030 sec  98241.64 KB/s  cpu:  7.50 sys   0.09 user
user  15:  120.039 sec  97824.57 KB/s  cpu:  7.76 sys   0.09 user
user  16:  120.032 sec  98700.71 KB/s  cpu:  7.42 sys   0.09 user

total:     120.041 sec  1572267.32 KB/s  cpu: 122.13 sys   1.29 user

 

Conclusion to TEST3: <read_nstream to 1, read from 16 files using 16 processes>

  • Using read_nstream=1 produces a perfect balance in throughput per process (98000 KB/sec), so all process still have an equal amount of throughput:

    • The maximum total throughput from one node is still being achieved (1572267.32 KB/s) with 16 processes, this is approx. 1.5 GB/sec

    • The total throughput is now divided evenly across all 16 processes, so the throughput per-process is higher using less processes

    • Most importantly the I/O throughput is now evenly balanced across all 16 processes, yet the total throughput remains the same.

  • The maximum readahead per process is still 512KB, this amount of readahead provides perfectly balanced throughput per process in our test.

  • The throughput per process is now therefore balanced, all 16 processes are now performing approx. 98000 KB/s – perfect!!

     

  • Please note:

    • The throughput per process is now much higher using 16 processes rather than 64 processes.

    • The number of processes reduced by a factor of 4 in test3, so the throughput per process increased by a factor of 4 in test3, but the total throughput is unchanged.

    • It is therefore very important to consider the number of running processes that will be reading from disk at the same time, as the available throughput will be evenly distributed between these processes.

 

 

 

TEST4: disable readahead, keep everything else the same as the baseline test.

vxbench – 64files/64procs/32KB block size

Tuning    – read_ahead disabled/read_nstream=24/read_pref_io=524288

 

# vxtunefs /data1 -o read_nstream=24,read_ahead=0

UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1

 

# vxtunefs /data1

Filesystem I/O parameters for /data1

read_pref_io = 524288

read_nstream = 24

read_ahead = 0

 

# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1

# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G  /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64

user   1:  300.011 sec  12246.06 KB/s  cpu:  7.68 sys   0.07 user
user   2:  300.009 sec  11192.53 KB/s  cpu:  6.96 sys   0.07 user
user   3:  300.010 sec  11619.25 KB/s  cpu:  7.35 sys   0.06 user
user   4:  300.014 sec  11551.35 KB/s  cpu:  7.30 sys   0.07 user
user   5:  300.015 sec  11563.46 KB/s  cpu:  7.19 sys   0.08 user
user   6:  300.007 sec  12257.53 KB/s  cpu:  7.65 sys   0.10 user
user   7:  300.008 sec  11638.53 KB/s  cpu:  7.34 sys   0.09 user
user   8:  300.007 sec  11449.44 KB/s  cpu:  7.26 sys   0.09 user
user   9:  300.014 sec  12062.17 KB/s  cpu:  7.50 sys   0.08 user
user  10:  300.008 sec  11544.21 KB/s  cpu:  7.18 sys   0.08 user
user  11:  300.012 sec  11442.10 KB/s  cpu:  7.22 sys   0.10 user
user  12:  300.012 sec  11666.33 KB/s  cpu:  7.34 sys   0.07 user
user  13:  300.007 sec  11740.63 KB/s  cpu:  7.38 sys   0.07 user
user  14:  300.015 sec  11528.29 KB/s  cpu:  7.32 sys   0.07 user
user  15:  300.009 sec  11616.83 KB/s  cpu:  7.31 sys   0.08 user
user  16:  300.008 sec  12253.34 KB/s  cpu:  7.54 sys   0.07 user
user  17:  300.013 sec  11727.19 KB/s  cpu:  7.36 sys   0.07 user
user  18:  300.009 sec  11700.54 KB/s  cpu:  7.36 sys   0.07 user
user  19:  300.008 sec  12245.63 KB/s  cpu:  7.70 sys   0.09 user
user  20:  300.007 sec  11757.38 KB/s  cpu:  7.42 sys   0.08 user
user  21:  300.007 sec  11242.93 KB/s  cpu:  7.10 sys   0.06 user
user  22:  300.012 sec  11589.92 KB/s  cpu:  7.23 sys   0.08 user
user  23:  300.008 sec  12262.93 KB/s  cpu:  7.56 sys   0.09 user
user  24:  300.007 sec  11756.85 KB/s  cpu:  7.41 sys   0.08 user
user  25:  300.014 sec  12086.92 KB/s  cpu:  7.48 sys   0.08 user
user  26:  300.011 sec  12001.58 KB/s  cpu:  7.54 sys   0.07 user
user  27:  300.012 sec  12096.78 KB/s  cpu:  7.60 sys   0.10 user
user  28:  300.017 sec  11550.08 KB/s  cpu:  7.27 sys   0.08 user
user  29:  300.011 sec  11734.28 KB/s  cpu:  7.24 sys   0.09 user
user  30:  300.011 sec  11962.11 KB/s  cpu:  7.51 sys   0.08 user
user  31:  300.014 sec  12128.16 KB/s  cpu:  7.50 sys   0.08 user
user  32:  300.011 sec  11725.32 KB/s  cpu:  7.38 sys   0.10 user
user  33:  300.009 sec  11371.62 KB/s  cpu:  7.18 sys   0.06 user
user  34:  300.009 sec  12041.25 KB/s  cpu:  7.62 sys   0.07 user
user  35:  300.008 sec  11980.36 KB/s  cpu:  7.48 sys   0.08 user
user  36:  300.015 sec  11908.75 KB/s  cpu:  7.51 sys   0.07 user
user  37:  300.010 sec  11432.46 KB/s  cpu:  7.12 sys   0.08 user
user  38:  300.014 sec  11796.37 KB/s  cpu:  7.48 sys   0.06 user
user  39:  300.008 sec  11824.77 KB/s  cpu:  7.43 sys   0.08 user
user  40:  300.014 sec  12077.29 KB/s  cpu:  7.57 sys   0.07 user
user  41:  300.012 sec  11564.45 KB/s  cpu:  7.29 sys   0.08 user
user  42:  300.015 sec  11583.94 KB/s  cpu:  7.28 sys   0.05 user
user  43:  300.015 sec  11874.83 KB/s  cpu:  7.45 sys   0.08 user
user  44:  300.010 sec  12142.53 KB/s  cpu:  7.54 sys   0.08 user
user  45:  300.015 sec  11335.74 KB/s  cpu:  7.05 sys   0.09 user
user  46:  300.011 sec  11915.63 KB/s  cpu:  7.43 sys   0.08 user
user  47:  300.014 sec  12259.67 KB/s  cpu:  7.56 sys   0.10 user
user  48:  300.010 sec  11405.71 KB/s  cpu:  7.12 sys   0.08 user
user  49:  300.010 sec  11862.76 KB/s  cpu:  7.34 sys   0.07 user
user  50:  300.014 sec  11556.89 KB/s  cpu:  7.28 sys   0.07 user
user  51:  300.010 sec  12149.05 KB/s  cpu:  7.49 sys   0.08 user
user  52:  300.010 sec  11384.38 KB/s  cpu:  7.11 sys   0.10 user
user  53:  300.008 sec  11414.31 KB/s  cpu:  7.09 sys   0.07 user
user  54:  300.016 sec  11336.45 KB/s  cpu:  7.09 sys   0.07 user
user  55:  300.017 sec  12173.06 KB/s  cpu:  7.57 sys   0.08 user
user  56:  300.011 sec  11808.63 KB/s  cpu:  7.33 sys   0.06 user
user  57:  300.011 sec  12277.61 KB/s  cpu:  7.55 sys   0.08 user
user  58:  300.008 sec  11529.39 KB/s  cpu:  7.14 sys   0.07 user
user  59:  300.010 sec  12021.34 KB/s  cpu:  7.44 sys   0.07 user
user  60:  300.011 sec  11499.74 KB/s  cpu:  7.18 sys   0.07 user
user  61:  300.010 sec  12001.73 KB/s  cpu:  7.55 sys   0.06 user
user  62:  300.011 sec  11978.65 KB/s  cpu:  7.52 sys   0.08 user
user  63:  300.008 sec  11540.61 KB/s  cpu:  7.20 sys   0.09 user
user  64:  300.009 sec  11221.21 KB/s  cpu:  7.06 sys   0.07 user

total:     300.017 sec  753196.79 KB/s  cpu: 471.23 sys   4.95 user

 

iostat

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sde               0.00     0.00  504.35    0.00 32278.40     0.00    64.00     0.51    1.02   0.79  39.97
sdf               0.00     0.00  501.55    0.00 32099.20     0.00    64.00     0.44    0.88   0.70  35.15
sdg               0.00     0.00  507.70    0.00 32492.80     0.00    64.00     0.52    1.02   0.79  40.20
sdh               0.00     0.00  496.70    0.00 31788.80     0.00    64.00     0.55    1.10   0.81  40.10
sdi               0.00     0.00  502.40    0.00 32153.60     0.00    64.00     0.47    0.95   0.76  38.42
sdj               0.00     0.00  499.70    0.00 31980.80     0.00    64.00     0.62    1.24   0.91  45.48
sdk               0.00     0.00  502.80    0.00 32179.20     0.00    64.00     0.46    0.91   0.72  36.22
sdl               0.00     0.00  503.90    0.00 32249.60     0.00    64.00     0.47    0.93   0.76  38.05
sdm               0.00     0.00  501.25    0.00 32080.00     0.00    64.00     0.49    0.99   0.78  39.16
sdn               0.00     0.00  504.10    0.00 32262.40     0.00    64.00     0.91    1.80   1.11  55.75
sdo               0.00     0.00  497.20    0.00 31820.80     0.00    64.00     3.51    7.07   1.96  97.30
sdp               0.00     0.00  496.50    0.00 31776.00     0.00    64.00     0.44    0.90   0.73  36.06
sdq               0.00     0.00  505.40    0.00 32345.60     0.00    64.00     0.67    1.32   0.93  47.04
sdr               0.00     0.00  503.40    0.00 32217.60     0.00    64.00     0.60    1.19   0.85  42.86
sds               0.00     0.00  502.65    0.00 32169.60     0.00    64.00     3.46    6.88   1.92  96.49
sdt               0.00     0.00  501.35    0.00 32086.40     0.00    64.00     0.46    0.92   0.75  37.40
sdu               0.00     0.00  511.30    0.00 32723.20     0.00    64.00     0.60    1.18   0.85  43.55
sdv               0.00     0.00  502.70    0.00 32172.80     0.00    64.00     0.76    1.52   1.07  53.56
sdw               0.00     0.00  502.45    0.00 32156.80     0.00    64.00     3.73    7.42   1.96  98.52
sdx               0.00     0.00  503.10    0.00 32198.40     0.00    64.00     3.70    7.36   1.92  96.67
sdy               0.00     0.00  501.15    0.00 32073.60     0.00    64.00     0.47    0.93   0.73  36.65
sdz               0.00     0.00  506.15    0.00 32393.60     0.00    64.00     3.47    6.85   1.92  97.28
sdaa              0.00     0.00  505.45    0.00 32348.80     0.00    64.00     3.53    6.99   1.95  98.45
sdab              0.00     0.00  507.10    0.00 32454.40     0.00    64.00     0.51    1.00   0.78  39.43
sdac              0.00     0.00  504.25    0.00 32272.00     0.00    64.00     0.46    0.92   0.74  37.47
sdad              0.00     0.00  506.30    0.00 32403.20     0.00    64.00     0.61    1.21   0.89  45.24
sdae              0.00     0.00  500.80    0.00 32051.20     0.00    64.00     0.48    0.97   0.75  37.80
sdaf              0.00     0.00  501.70    0.00 32108.80     0.00    64.00     0.51    1.01   0.81  40.70
sdag              0.00     0.00  497.90    0.00 31865.60     0.00    64.00     0.49    0.98   0.76  37.72
sdah              0.00     0.00  499.20    0.00 31948.80     0.00    64.00     0.47    0.95   0.75  37.24
sdai              0.00     0.00  493.50    0.00 31584.00     0.00    64.00     0.54    1.09   0.83  40.93
sdak              0.00     0.00  505.10    0.00 32326.40     0.00    64.00     0.66    1.30   0.93  47.05
sdal              0.00     0.00  504.60    0.00 32294.40     0.00    64.00     0.61    1.21   0.86  43.54
sdaj              0.00     0.00  504.60    0.00 32294.40     0.00    64.00     0.53    1.06   0.79  40.05
sdam              0.00     0.00  505.85    0.00 32374.40     0.00    64.00     3.46    6.85   1.91  96.67
sdan              0.00     0.00  506.60    0.00 32422.40     0.00    64.00     0.47    0.92   0.73  36.80
sdaq              0.00     0.00  497.65    0.00 31849.60     0.00    64.00     3.54    7.11   1.97  97.96
sdao              0.00     0.00  500.25    0.00 32016.00     0.00    64.00     0.43    0.87   0.70  34.90
sdap              0.00     0.00  497.00    0.00 31808.00     0.00    64.00     3.43    6.91   1.96  97.41
sdar              0.00     0.00  494.80    0.00 31667.20     0.00    64.00     0.52    1.05   0.81  40.03
sdas              0.00     0.00  497.45    0.00 31836.80     0.00    64.00     0.61    1.23   0.90  44.75
sdat              0.00     0.00  507.25    0.00 32464.00     0.00    64.00     0.75    1.49   1.03  52.22
sdau              0.00     0.00  503.20    0.00 32204.80     0.00    64.00     3.74    7.44   1.96  98.61
sdav              0.00     0.00  506.55    0.00 32419.20     0.00    64.00     3.69    7.28   1.92  97.06
sdaw              0.00     0.00  498.65    0.00 31913.60     0.00    64.00     0.46    0.93   0.75  37.21
sdax              0.00     0.00  497.60    0.00 31846.40     0.00    64.00     0.90    1.80   1.14  56.59
sday              0.00     0.00  503.15    0.00 32201.60     0.00    64.00     3.53    7.02   1.94  97.52
sdaz              0.00     0.00  504.45    0.00 32284.80     0.00    64.00     0.44    0.88   0.72  36.44
VxVM56000         0.00     0.00 24109.05   0.00 1542979.20   0.00    64.00    62.84    2.61   0.04 100.00

 

Conclusion to TEST4: <read_ahead disabled>

  • The maximum read I/O throughput from one node is NOT being achieved, approx. 0.72 GBytes/sec.

  • The throughput for all 64 processes is balanced but is now much lower per process, they are now only performing approx. 12000 KB/s.

  • By disabling readahead the total throughput has halved.

  • All the read I/O is synchronous read I/O using a 32KB I/O request size.

    • The iostat above shows 64 sectors (32KB) as the average I/O size for all LUN paths –

      avgrq-sz 64.00
    • Because readahead is disabled we are no longer submitting read_pref_io sized requests.

    • Instead we are submitting a 32KB read request size, because this is the I/O size that vxbench is using.

 

 

TEST5: change read_nstream to 6, keep everything else the same as the baseline test.

vxbench – 64files/64procs/32KB block size

Tuning    – read_ahead enabled/read_nstream=6/read_pref_io=524288

 

# vxtunefs /data1 -o read_nstream=6

UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1

 

# vxtunefs /data1

Filesystem I/O parameters for /data1

read_pref_io = 524288

read_nstream = 6

read_ahead = 1

 

# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1


# ./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G  /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64

user   1:  300.008 sec  26677.91 KB/s  cpu:  5.16 sys   0.05 user
user   2:  300.107 sec  26689.61 KB/s  cpu:  5.25 sys   0.04 user
user   3:  300.116 sec  26596.61 KB/s  cpu:  4.97 sys   0.04 user
user   4:  300.031 sec  26716.80 KB/s  cpu:  4.98 sys   0.05 user
user   5:  300.089 sec  26680.92 KB/s  cpu:  5.19 sys   0.05 user
user   6:  300.072 sec  26631.30 KB/s  cpu:  5.01 sys   0.04 user
user   7:  300.099 sec  26843.86 KB/s  cpu:  5.21 sys   0.04 user
user   8:  300.091 sec  26762.65 KB/s  cpu:  5.20 sys   0.04 user
user   9:  300.074 sec  26784.68 KB/s  cpu:  5.17 sys   0.04 user
user  10:  300.076 sec  26774.26 KB/s  cpu:  5.07 sys   0.05 user
user  11:  300.062 sec  26785.71 KB/s  cpu:  4.97 sys   0.04 user
user  12:  300.027 sec  14609.45 KB/s  cpu:  2.90 sys   0.02 user
user  13:  300.035 sec  26675.62 KB/s  cpu:  5.21 sys   0.05 user
user  14:  300.101 sec   9641.12 KB/s  cpu:  1.95 sys   0.01 user
user  15:  300.066 sec  26897.99 KB/s  cpu:  4.93 sys   0.04 user
user  16:  300.027 sec  26645.46 KB/s  cpu:  5.09 sys   0.04 user
user  17:  300.016 sec  26677.21 KB/s  cpu:  5.19 sys   0.04 user
user  18:  300.020 sec  26636.02 KB/s  cpu:  5.25 sys   0.05 user
user  19:  300.012 sec  26728.77 KB/s  cpu:  4.98 sys   0.05 user
user  20:  300.081 sec  18732.43 KB/s  cpu:  3.46 sys   0.04 user
user  21:  300.008 sec  26729.13 KB/s  cpu:  5.22 sys   0.04 user
user  22:  300.087 sec  26701.62 KB/s  cpu:  5.16 sys   0.04 user
user  23:  300.083 sec  14616.98 KB/s  cpu:  2.86 sys   0.01 user
user  24:  300.085 sec  26926.99 KB/s  cpu:  5.02 sys   0.03 user
user  25:  300.031 sec  26542.74 KB/s  cpu:  5.16 sys   0.05 user
user  26:  300.101 sec  26608.19 KB/s  cpu:  5.02 sys   0.06 user
user  27:  300.112 sec  26760.74 KB/s  cpu:  5.28 sys   0.03 user
user  28:  300.050 sec  26674.13 KB/s  cpu:  5.20 sys   0.04 user
user  29:  300.058 sec  19430.05 KB/s  cpu:  3.79 sys   0.03 user
user  30:  300.062 sec  26703.79 KB/s  cpu:  5.24 sys   0.04 user
user  31:  300.079 sec  26692.03 KB/s  cpu:  5.18 sys   0.05 user
user  32:  300.078 sec  19572.11 KB/s  cpu:  3.75 sys   0.03 user
user  33:  300.014 sec  26872.00 KB/s  cpu:  5.25 sys   0.05 user
user  34:  300.035 sec  26593.60 KB/s  cpu:  4.86 sys   0.04 user
user  35:  300.011 sec  26554.73 KB/s  cpu:  5.17 sys   0.04 user
user  36:  300.065 sec  26713.74 KB/s  cpu:  5.23 sys   0.05 user
user  37:  300.011 sec  26687.96 KB/s  cpu:  5.18 sys   0.04 user
user  38:  300.034 sec  26696.03 KB/s  cpu:  5.30 sys   0.04 user
user  39:  300.046 sec  18888.19 KB/s  cpu:  3.62 sys   0.03 user
user  40:  300.019 sec  26656.42 KB/s  cpu:  5.18 sys   0.04 user
user  41:  300.039 sec  26685.39 KB/s  cpu:  5.08 sys   0.05 user
user  42:  300.041 sec  14332.34 KB/s  cpu:  2.85 sys   0.02 user
user  43:  300.112 sec  26863.12 KB/s  cpu:  5.27 sys   0.04 user
user  44:  300.008 sec  26667.66 KB/s  cpu:  5.07 sys   0.05 user
user  45:  300.060 sec  26949.71 KB/s  cpu:  5.05 sys   0.04 user
user  46:  300.021 sec  26635.77 KB/s  cpu:  5.07 sys   0.05 user
user  47:  300.052 sec  26817.32 KB/s  cpu:  5.26 sys   0.03 user
user  48:  300.110 sec  26760.94 KB/s  cpu:  5.19 sys   0.04 user
user  49:  300.096 sec   7747.49 KB/s  cpu:  1.56 sys   0.00 user
user  50:  300.116 sec  14676.80 KB/s  cpu:  2.93 sys   0.03 user
user  51:  300.026 sec  26737.82 KB/s  cpu:  5.13 sys   0.06 user
user  52:  300.027 sec  26737.80 KB/s  cpu:  5.05 sys   0.05 user
user  53:  300.044 sec  26777.10 KB/s  cpu:  4.96 sys   0.04 user
user  54:  300.017 sec  26769.30 KB/s  cpu:  5.13 sys   0.04 user
user  55:  300.024 sec  26799.31 KB/s  cpu:  5.33 sys   0.05 user
user  56:  300.102 sec  26720.72 KB/s  cpu:  5.17 sys   0.05 user
user  57:  300.043 sec  26807.85 KB/s  cpu:  5.04 sys   0.03 user
user  58:  300.055 sec  26868.24 KB/s  cpu:  4.91 sys   0.04 user
user  59:  300.047 sec  26879.17 KB/s  cpu:  5.24 sys   0.05 user
user  60:  300.070 sec  26907.83 KB/s  cpu:  5.32 sys   0.04 user
user  61:  300.055 sec  26786.37 KB/s  cpu:  5.02 sys   0.05 user
user  62:  300.097 sec  16684.09 KB/s  cpu:  3.18 sys   0.03 user
user  63:  300.093 sec  14063.68 KB/s  cpu:  2.79 sys   0.01 user
user  64:  300.024 sec  26635.57 KB/s  cpu:  4.90 sys   0.04 user

total:     300.117 sec  1572798.88 KB/s  cpu: 302.31 sys   2.53 user

 

Conclusion to TEST5:  <read_nstream set to 6>

  • The maximum read I/O throughput from one node is being achieved, approx. 1.5 GBytes/s

  • The throughput per process is still imbalanced.

  • The maximum amount of readahead per process is 3MB, this is too aggressive (i.e. too much readahead is causing a throughput imbalance between the reading processes).

     

TEST6: change read_nstream to 12, keep everything else the same as the baseline test.

vxbench – 64files/64procs/32KB block size

Tuning    – read_ahead enabled/read_nstream=12/read_pref_io=524288

 

# vxtunefs /data1 -o read_nstream=12

UX:vxfs vxtunefs: INFO: V-3-22525: Parameters successfully set for /data1

 

# vxtunefs /data1

Filesystem I/O parameters for /data1

read_pref_io = 524288

read_nstream = 12

read_ahead = 1

 

# mount -t vxfs -o remount,largefiles,cluster /dev/vx/dsk/testdg/vol1 /data1

#./vxbench -w read -i iosize=32k,iotime=300,maxfilesize=16G  /data1/primary/file1 /data1/primary/file2 /data1/primary/file3 /data1/primary/file4 /data1/primary/file5 /data1/primary/file6 /data1/primary/file7 /data1/primary/file8 /data1/primary/file9 /data1/primary/file10 /data1/primary/file11 /data1/primary/file12 /data1/primary/file13 /data1/primary/file14 /data1/primary/file15 /data1/primary/file16 /data1/primary/file17 /data1/primary/file18 /data1/primary/file19 /data1/primary/file20 /data1/primary/file21 /data1/primary/file22 /data1/primary/file23 /data1/primary/file24 /data1/primary/file25 /data1/primary/file26 /data1/primary/file27 /data1/primary/file28 /data1/primary/file29 /data1/primary/file30 /data1/primary/file31 /data1/primary/file32 /data1/primary/file33 /data1/primary/file34 /data1/primary/file35 /data1/primary/file36 /data1/primary/file37 /data1/primary/file38 /data1/primary/file39 /data1/primary/file40 /data1/primary/file41 /data1/primary/file42 /data1/primary/file43 /data1/primary/file44 /data1/primary/file45 /data1/primary/file46 /data1/primary/file47 /data1/primary/file48 /data1/primary/file49 /data1/primary/file50 /data1/primary/file51 /data1/primary/file52 /data1/primary/file53 /data1/primary/file54 /data1/primary/file55 /data1/primary/file56 /data1/primary/file57 /data1/primary/file58 /data1/primary/file59 /data1/primary/file60 /data1/primary/file61 /data1/primary/file62 /data1/primary/file63 /data1/primary/file64

user   1:  300.152 sec   4957.28 KB/s  cpu:  0.94 sys   0.00 user
user   2:  300.133 sec   4896.28 KB/s  cpu:  0.92 sys   0.00 user
user   3:  300.068 sec  32767.44 KB/s  cpu:  6.28 sys   0.05 user
user   4:  300.090 sec  32949.22 KB/s  cpu:  6.12 sys   0.05 user
user   5:  300.067 sec   5041.22 KB/s  cpu:  0.95 sys   0.01 user
user   6:  300.139 sec   4855.57 KB/s  cpu:  0.89 sys   0.00 user
user   7:  300.048 sec  32872.02 KB/s  cpu:  6.31 sys   0.05 user
user   8:  300.129 sec   4610.50 KB/s  cpu:  0.89 sys   0.00 user
user   9:  300.146 sec   4855.02 KB/s  cpu:  0.92 sys   0.00 user
user  10:  300.015 sec  32855.14 KB/s  cpu:  6.17 sys   0.05 user
user  11:  300.040 sec  32811.44 KB/s  cpu:  6.20 sys   0.06 user
user  12:  300.069 sec   4897.65 KB/s  cpu:  0.92 sys   0.00 user
user  13:  300.013 sec  32793.85 KB/s  cpu:  6.30 sys   0.06 user
user  14:  300.082 sec  32806.79 KB/s  cpu:  6.31 sys   0.04 user
user  15:  300.033 sec  32914.55 KB/s  cpu:  6.36 sys   0.04 user
user  16:  300.067 sec  32726.51 KB/s  cpu:  6.33 sys   0.05 user
user  17:  300.057 sec  32604.74 KB/s  cpu:  6.30 sys   0.05 user
user  18:  300.140 sec   4753.19 KB/s  cpu:  0.93 sys   0.00 user
user  19:  300.090 sec  32703.57 KB/s  cpu:  6.29 sys   0.06 user
user  20:  300.030 sec  32914.89 KB/s  cpu:  6.30 sys   0.06 user
user  21:  300.005 sec  32835.71 KB/s  cpu:  6.37 sys   0.05 user
user  22:  300.103 sec  32845.42 KB/s  cpu:  6.27 sys   0.05 user
user  23:  300.061 sec  32993.42 KB/s  cpu:  6.30 sys   0.06 user
user  24:  300.152 sec   4732.43 KB/s  cpu:  0.89 sys   0.01 user
user  25:  300.067 sec  32501.34 KB/s  cpu:  6.34 sys   0.05 user
user  26:  300.162 sec   4794.20 KB/s  cpu:  0.91 sys   0.00 user
user  27:  300.006 sec  32651.33 KB/s  cpu:  6.36 sys   0.05 user
user  28:  300.067 sec  32767.47 KB/s  cpu:  6.38 sys   0.05 user
user  29:  300.147 sec   4791.47 KB/s  cpu:  0.93 sys   0.01 user
user  30:  300.020 sec  32711.16 KB/s  cpu:  6.31 sys   0.04 user
user  31:  300.151 sec   5113.69 KB/s  cpu:  0.92 sys   0.00 user
user  32:  300.017 sec  14987.11 KB/s  cpu:  2.89 sys   0.02 user
user  33:  300.028 sec  32689.88 KB/s  cpu:  6.38 sys   0.06 user
user  34:  300.136 sec   4856.04 KB/s  cpu:  0.91 sys   0.00 user
user  35:  300.146 sec   4794.78 KB/s  cpu:  0.91 sys   0.00 user
user  36:  300.005 sec  32712.86 KB/s  cpu:  6.19 sys   0.05 user
user  37:  300.100 sec  32927.68 KB/s  cpu:  6.37 sys   0.04 user
user  38:  300.048 sec  32994.80 KB/s  cpu:  6.34 sys   0.04 user
user  39:  300.010 sec  32630.41 KB/s  cpu:  6.27 sys   0.04 user
user  40:  300.054 sec  32768.91 KB/s  cpu:  6.32 sys   0.05 user
user  41:  300.019 sec  33100.39 KB/s  cpu:  6.17 sys   0.04 user
user  42:  300.066 sec  32726.68 KB/s  cpu:  6.38 sys   0.05 user
user  43:  300.035 sec  33221.54 KB/s  cpu:  6.34 sys   0.06 user
user  44:  300.008 sec  32692.06 KB/s  cpu:  6.32 sys   0.06 user
user  45:  300.025 sec  33181.65 KB/s  cpu:  6.38 sys   0.05 user
user  46:  300.146 sec   4773.57 KB/s  cpu:  0.90 sys   0.00 user
user  47:  300.108 sec  33172.52 KB/s  cpu:  6.27 sys   0.04 user
user  48:  300.073 sec  32766.86 KB/s  cpu:  6.30 sys   0.06 user
user  49:  300.007 sec  32814.98 KB/s  cpu:  6.36 sys   0.06 user
user  50:  300.050 sec  32933.22 KB/s  cpu:  6.35 sys   0.05 user
user  51:  300.026 sec  33038.23 KB/s  cpu:  6.35 sys   0.05 user
user  52:  300.087 sec  32970.03 KB/s  cpu:  6.38 sys   0.07 user
user  53:  300.022 sec  32833.90 KB/s  cpu:  6.09 sys   0.05 user
user  54:  300.091 sec  32990.10 KB/s  cpu:  6.32 sys   0.04 user
user  55:  300.075 sec  32991.84 KB/s  cpu:  6.34 sys   0.06 user
user  56:  300.075 sec  32909.93 KB/s  cpu:  6.37 sys   0.04 user
user  57:  300.059 sec   4778.89 KB/s  cpu:  0.90 sys   0.00 user
user  58:  300.062 sec  32993.34 KB/s  cpu:  6.12 sys   0.06 user
user  59:  300.064 sec  33156.83 KB/s  cpu:  6.38 sys   0.05 user
user  60:  300.079 sec  33011.92 KB/s  cpu:  6.38 sys   0.05 user
user  61:  300.044 sec  33097.62 KB/s  cpu:  6.36 sys   0.05 user
user  62:  300.049 sec   4774.69 KB/s  cpu:  0.90 sys   0.00 user
user  63:  300.142 sec   4897.10 KB/s  cpu:  0.92 sys   0.00 user
user  64:  300.039 sec  33221.04 KB/s  cpu:  6.33 sys   0.04 user

total:     300.163 sec  1581364.62 KB/s  cpu: 302.20 sys   2.33 user

 

Conclusion to TEST6:  <read_nstream set to 12>

  • The maximum read I/O throughput from one node is being achieved, approx. 1.5 GBytes/s

  • The throughput per process is imbalanced.

  • The maximum amount of readahead per process is MB, this is too aggressive (i.e. too much readahead is causing a throughput imbalance between the reading processes).

 

 

Graphics for buffered I/O tests

The graphs below show the results of the tests running 64 processes (only Test3, which runs 16 processes, is excluded from the graphs). The second graph simply joins the dots for each process; each test uses a different colour. The graphs clearly show that only read_nstream=1 (Test2) and read_ahead off (Test4) provide an evenly balanced throughput across all 64 processes. However when read_ahead is disabled the throughput is much lower.

Therefore, in our test, read_nstream=1 (dark blue in the graphics) is clearly the correct value because the throughput is evenly balanced across all 64 processes and the maximum throughput is still achieved.

 

graphic1.png

graphic2.png

 

 

< 9. Final conclusions and best practices for optimizing sequential read I/O workloads>

 

To maximize the sequential read I/O throughout, maintain evenly balanced I/O across all the LUNs and balance the throughput across the active reading processes, we identified the following configuration for our test environment:

  • 512KB VxVM stripe width (for the optimum I/O size reading from disk)

  • 24 LUNs and 24 columns in our VxVM volume (to use maximum storage bandwidth)

  • Leave read_pref_io set to the default value of 524288 (max I/O size using readahead)

  • Reduce read_nstream from a default value of 24 to a value of 1 (to reduce the maximum amount of data to pre-fetch in one go using readahead)

 

The best practices for sequential read media server solution configurations are as follows:

  • Set up your hardware so that the maximum I/O bandwidth can be achieved.

  • We did not change the operating system maximum I/O size, we kept the default of 512KB.

  • Ensure that your I/O is balanced evenly across all your LUNs by using VxVM striped volumes  

    • We found a VxVM stripe-width of 512KB is optimal, different stripe-widths can be tested, a stripe-width greater than 1024KB is not required.

    • We created 24 LUNs that maximized access to the storage arrays, we therefore created our VxVM volume with 24 columns to maximize the bandwidth to the storage arrays.

    • During this process identify any bottlenecks in your HBA cards and storage, begin with a single node, the bottlenecks will give you the maximum throughput you can achieve in your environment.

  • If VxVM mirroring had been required in our configuration then 12 LUNs would be used in each mirror.

    • As reads can come from either mirror the read I/O throughput should not be impacted by mirroring, because we are still reading from all 24 LUNs, however writes will be impacted.

  • The value of read_pref_io is the read I/O request size that VxFS readahead will submit to VxVM, we want a larger I/O size for performance (read_pref_io is set to the stripe-width).

    • Do not change the auto-tuned value for read_pref_io, if you want to change read_pref_io change the VxVM volume stripe-width instead.

  • Using higher read_nstream values produced an imbalance in throughput between the different processes performing disk read I/O, this is due to overly aggressive read_ahead

    • No matter what value of read_nstream we used, we always hit the FC HBA Card throughput bottleneck of approximately 1.5GBytes/sec

    • The larger the value of read_nstream the more aggressive read_ahead becomes, and the greater the imbalance in read throughput between the different processes

    • Reduce read_nstream to reduce the amount of readahead. We found read_nstream=1 provided a perfect balance in throughout between processes.

  • Do not disable readahead unless absolutely necessary as sequential read performance will be impacted.

  • Use /etc/tunefstab to set read_nstream, this means the value will persist across a reboot.

 

If this information is considered useful we will provide a second report for media server workload testing that explains sequential write I/O and some more best practices for balancing the throughput across processes performing a combination of read and write I/O.

 

Best regards

Veritas IA Engineering team

 

 

Server h/w configuration information: <2 nodes>

 

System

# dmidecode -q -t 1|head -5
System Information
        Manufacturer: HP
        Product Name: ProLiant DL380p Gen8

 

CPU

# dmidecode -q -t 4|grep -e Processor -e Socket -e Manufacturer -e Version -e "Current Speed" -e Core -e Thread|grep -v Upgrade

Processor Information
        Socket Designation: Proc 1
        Type: Central Processor
        Manufacturer: Intel
        Version:  Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
        Current Speed: 2200 MHz
        Core Count: 8
        Core Enabled: 8
        Thread Count: 16

Processor Information
        Socket Designation: Proc 2
        Type: Central Processor
        Manufacturer: Intel
        Version:  Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
        Current Speed: 2200 MHz
        Core Count: 8
        Core Enabled: 8
        Thread Count: 16

Memory

# dmidecode -q -t 17|grep Size|grep -v "No Module Installed"|awk 'BEGIN{memsize=0}{memsize=memsize+$2}END{print memsize, $3}'

98304 MB

# dmidecode -q -t 17|grep -e Speed -e Type|grep -v Detail|sort|uniq|grep -v Unknown

        Configured Clock Speed: 1600 MHz
        Speed: 1600 MHz
        Type: DDR3

 

 

 

 

 

 

 

Comments

Hi Colin,

 

Thanks for sharing this. Question is, are you guys doing this for fun or are NetBackup and Infoscale teams actually working together to put this into an appliance so we can finally have an HA configuration. And of course, it being CVM wont hurt either  Smiley Happy

Hi Riaan,

Many thanks for reading the article and for commenting as well. We have recently been discussing I/O balancing with a few partners who implement media server solutions using our CFS/CVM shared data cluster environment, our in-house testing in this article was a follow-up to help us explain and share some useful best practices.The NetBackup Appliance also uses Infoscale in the backend, so I have asked a friend in the NetBackup team to see if these best practices could help add value for them also.

Thanks again!

Best regards

Colin

Hi Colin,

 

Yes, I believe a load balancing active / active (RAC Like) type configuration in the appliance would really ensure its bullet proof. NetBackup already has the concept of load balancing media servers for storage unit selection, and also load balancers where dedupe is concerned. With this config the storage server should also be active / active but not sure if the same storage server could access another's data. Interesting concepts.

 

As the capacity of the appliance is increasing it becomes a little alarming (IMHO) that we're putting all that data in a pool accessible via only 1 server. HA in media servers is fairly complex but implementing a load balancing type configuration on CVM should ensure we can get to the data if something fails.

 

Sign me up for the testing team Smiley Happy

i'm fully agree with you on the future of the NBU Appliance.
CommVault have created a Storage Grid deduplicated solution, and i think that's a good idea.
With CFS/CVM, the CR partitioning capabilities (MSDP 96TB or MSDP Appliance), the new refDB, auto repair, ... I suppose that Veritas have a lot of arguments to create a new vision of Storage Pool. Not a PureDisk with a multiple CR, but a real Grid solution (independant node but agregated, lost node capacibilties, auto-repaire node, ... the best coffe of the world, and more if possible Smiley Happy)

Thank you for this very good work..

I'm also available for the testing team Smiley Happy

Yeah the Puredisk multinode concept wasn't very nice. Need something RAC like.