cancel
Showing results for 
Search instead for 
Did you mean: 

RAID5 Full Stripe Write

damjan
Level 3

I created a fresh VxVM 6.0 RAID5 volume on four columns. I performed
some basic I/O operations and noticed that writes always end in
read-modify-write cycle. I haven't been able to trigger a full stripe
writes yet, regardless of the command I used to write to the
filesystem. I tried with 64k and with 128k stripe width but in both
cases, I cannot get writes aligned. Filesystem is VxFS.
Below are outputs of my simple tests and configuration of the RAID5
volume. Any hints where I got it wrong?

1. dd example
root@mlincek:~# dd if=/dev/zero of=/mlincek/zero.dd bs=384k

root@mlincek:~# iostat -zxn 2
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
69.0 115.4 8827.8 14777.0 0.0 0.4 0.0 2.3 0 41 c0t2d0
69.0 115.4 8827.8 14777.0 0.0 0.3 0.0 1.4 0 25 c0t3d0
70.0 115.4 8955.8 14777.0 0.0 0.4 0.0 2.0 0 36 c0t5d0
69.5 115.9 8891.8 14841.0 0.0 0.4 0.0 2.2 0 41 c0t4d0

2. Copy file from another physical volume using cp command:

root@mlincek:~# cp testfile.tmp /mlincek/

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
19.0 159.9 2250.5 20290.3 0.0 0.5 0.0 2.7 0 40 c0t2d0
17.5 158.4 2239.0 20278.9 0.0 0.3 0.0 1.5 0 16 c0t3d0
36.5 142.4 4489.5 18051.4 0.0 0.5 0.0 2.6 0 33 c0t5d0
37.0 143.4 4494.0 18119.8 0.0 0.5 0.0 3.0 0 36 c0t4d0


3. Volume config

-bash-4.1# vxprint -g mlincekdg -t
...
dg mlincekdg default default 28000 1333350408.27.mlincek
dm mlincekdg01 disk_1 auto 65536 1953459328 -
dm mlincekdg02 disk_2 auto 65536 1953459328 -
dm mlincekdg03 disk_3 auto 65536 1953459328 -
dm mlincekdg04 disk_4 auto 65536 1953459328 -
sd mlincekdg01-01 mlincek-01 mlincekdg01 0 1953458944 0/0 disk_1 ENA
sd mlincekdg02-01 mlincek-01 mlincekdg02 0 1953458944 1/0 disk_2 ENA
sd mlincekdg03-01 mlincek-01 mlincekdg03 0 1953458944 2/0 disk_3 ENA
sd mlincekdg04-01 mlincek-01 mlincekdg04 0 1953458944 3/0 disk_4 ENA
pl mlincek-01 mlincek ENABLED ACTIVE 5860376832 RAID 4/256 RW
v mlincek - ENABLED ACTIVE 5860376832 RAID - raid5

-bash-4.1# /opt/VRTSvxfs/sbin/vxtunefs -p /mlincek
Filesystem i/o parameters for /mlincek
read_pref_io = 131072
read_nstream = 4
read_unit_io = 131072
write_pref_io = 393216
write_nstream = 1
write_unit_io = 131072
...


Mount: /mlincek on
/dev/vx/dsk/mlincekdg/mlincekread/write/setuid/devices/rstchown/delaylog/largefiles/ioerror=mwdisable/xattr/dev=3606d60

Volume was created with the following command:
root@mlincek:~# vxassist -g mlincekdg -o ordered make mlincek
5860376576 layout=raid5 nlog=0 ncol=4 stripewidth=128k mlincekdg01
mlincekdg02 mlincekdg03 mlincekdg04
7 REPLIES 7

damjan
Level 3

I still haven't found out why full stripe writes are not occuring at all. Would it make a difference if I would create a RAID5 volume with four plexes, each with one subdisk instead of one plex containing four subdisks? 

 

-bash-4.1# vxprint -hl
Disk group: mlincekdg

Group:    mlincekdg
info:     dgid=1333350408.27.mlincek
version:  170
alignment: 8192 (bytes)
detach-policy: global
dg-fail-policy: dgdisable
ioship: off
copies:   nconfig=default nlog=default
devices:  max=0 cur=1
minors:   >= 28000

Disk:     mlincekdg01
info:     diskid=1333350348.19.mlincek
assoc:    device=disk_1 type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/disk_1s2
devinfo:  publen=1953459328 privlen=65536
mediatype: hdd
Disk:     mlincekdg02
info:     diskid=1333350355.21.mlincek
assoc:    device=disk_2 type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/disk_2s2
devinfo:  publen=1953459328 privlen=65536
mediatype: hdd
Disk:     mlincekdg03
info:     diskid=1333350364.23.mlincek
assoc:    device=disk_3 type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/disk_3s2
devinfo:  publen=1953459328 privlen=65536
mediatype: hdd
Disk:     mlincekdg04
info:     diskid=1333350373.25.mlincek
assoc:    device=disk_4 type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/disk_4s2
devinfo:  publen=1953459328 privlen=65536
mediatype: hdd
Disk:     mlincekdg05
info:     diskid=1335061022.86.mlincek
assoc:    device=disk_5 type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/disk_5s2
devinfo:  publen=3906897616 privlen=65536
mediatype: hdd

Volume:   mlincek
info:     len=4383982848
type:     usetype=raid5
state:    state=ACTIVE kernel=ENABLED cdsrecovery=0/0 (clean)
assoc:    plexes=mlincek-01
          exports=(none)
policies: read=RAID exceptions=NO_OP
flags:    open writeback
logging:  type=RAID5 loglen=30720 serial=0/0 (disabled)
apprecov: seqno=0/0
recovery: mode=default
recov_id=0
device:   minor=28000 bdev=216/28000 cdev=216/28000 path=/dev/vx/dsk/mlincekdg/mlincek
perms:    user=root group=root mode=0600
guid: {12e8d2e8-7d75-11e1-9b7b-00304897eb06}
mediatype: hdd
Plex:     mlincek-01
info:     len=4383982848
type:     layout=RAID columns=4 width=256
state:    state=ACTIVE kernel=ENABLED io=read-write
assoc:    vol=mlincek sd=mlincekdg01-01,mlincekdg02-01,mlincekdg03-01,...
flags:    complete
mediatype: hdd
Subdisk:  mlincekdg01-01
info:     disk=mlincekdg01 offset=0 len=1461327616
assoc:    vol=mlincek plex=mlincek-01 (column=0 offset=0)
flags:    enabled busy
device:   device=disk_1 path=/dev/vx/dmp/disk_1s2 diskdev=214/194
mediatype: hdd
Subdisk:  mlincekdg02-01
info:     disk=mlincekdg02 offset=0 len=1461327616
assoc:    vol=mlincek plex=mlincek-01 (column=1 offset=0)
flags:    enabled busy
device:   device=disk_2 path=/dev/vx/dmp/disk_2s2 diskdev=214/130
mediatype: hdd
Subdisk:  mlincekdg03-01
info:     disk=mlincekdg03 offset=0 len=1461327616
assoc:    vol=mlincek plex=mlincek-01 (column=2 offset=0)
flags:    enabled busy
device:   device=disk_3 path=/dev/vx/dmp/disk_3s2 diskdev=214/322
mediatype: hdd
Subdisk:  mlincekdg04-01
info:     disk=mlincekdg04 offset=0 len=1461327616
assoc:    vol=mlincek plex=mlincek-01 (column=3 offset=0)
flags:    enabled busy
device:   device=disk_4 path=/dev/vx/dmp/disk_4s2 diskdev=214/258
mediatype: hdd
 
 
--
Damian
 
 

Gaurav_S
Moderator
Moderator
   VIP    Certified

Hi,

little unclear on your query, what do mean with read-modify-write ? If I understand correctly, you want to trigger full writes where read (r/s or kr/s ) should be minimal (ideally 0) ? please let me know if I understand correctly ?

 

For your second query, if you create 4 plexes with 4 subdisks, it won't be a stripe structure (never tried striping on plexes) ... ideally for a 4 coulmn , it would need to be under 1 plex ... I wouldn't really think that would make a difference as plex is a virtual object anyways, the physical device where you are writing is still subdisk ..

I haven't seen a raid configured with 4 plexes with each subdisk ..

I will see if I can try replicate & see results...

 

Gaurav

damjan
Level 3

Hello Guarav,

indeed. If large writes occur on an idle RAID5 volume, I expect to see really minimal r/s. Most of I/O should be w/s. In some cases I see up to 1/3rd of total I/O are reads, despite that volume has only one single write operation with large I/O block size (dd command).

Damjan

 

B__Havey
Level 3
Partner Accredited

The stripe size of the RAID5 volume is 256 sectors. The default is 32 sectors.

bs=384K = 768s > 768/256 = 3 stripe units + 1 parity stripe. The math is right.

Could be an alignment issue. What release of VM?

What is the value of vol_maxio? Perhaps 384k is too large for the disks.

What does vxdisk list mlincekdg01|grep iosize report? (this will contain a min and max value)

damjan
Level 3

VM is 6.0PR1 (VRTSvxvm 6.0.10.0) on Solaris 11 (x64). What allignment issue could it be? In above vxprint output all subdisks have same parameters:  offset=0 len=1461327616.

vol_maxio was set to default when I started this thread. I set it to 2048 blocks in /etc/system and reboot the system, but no change.

* vxvm_START (do not remove)
forceload: drv/vxdmp
forceload: drv/vxio
forceload: drv/vxspec
* vxvm_END (do not remove)

set maxphys=1048576
set vxio:vol_maxio=2048

 

-bash-4.1# vxdisk list mlincekdg01|grep iosize       
iosize:    min=512 (bytes) max=2048 (blocks)

 

TonyGriffiths
Level 6
Employee Accredited Certified

Hi

Also bare in mind that if a FileSystem is involved, then it will be responsible for choosing the volume offsets to store the data for your files. i.e it will not put file extents at volume offset 0 etc.

In your dd-to-a-file test, have you tried using a payload > 256k (bs=512k) ?

BTW you could use vxstat to monitor the i/o with the volume objects

cheers

tony

damjan
Level 3

Tony,

yes, I did try payload > 256k (see above for 384k). I generated another dd to a file test using 1MB block size:

-bash-4.1# dd if=/dev/zero of=/mlincek/dd.tmp bs=1024k

iostat output during dd command:

                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   27.5  110.0 3521.3 14085.1  0.0  0.8    0.0    5.7   0  44 c0t2d0
   27.0  109.5 3457.3 14021.1  0.0  0.4    0.0    2.7   0  30 c0t3d0
   14.5  123.0 1856.7 15749.7  0.0  0.3    0.0    2.1   0  16 c0t5d0
   13.5  122.5 1728.6 15685.7  0.0  0.8    0.0    5.6   0  71 c0t4d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   12.0    0.0   24.0  0.0  0.0    0.0    0.0   0   0 c0t0d0
   26.5  104.4 3330.5 13309.3  0.0  0.8    0.0    5.9   0  43 c0t2d0
   25.5  103.9 3262.3 13305.1  0.0  0.4    0.0    2.8   0  29 c0t3d0
   13.0  116.9 1663.1 14968.2  0.0  0.3    0.0    1.9   0  14 c0t5d0
   13.5  117.4 1667.4 14972.4  0.0  0.8    0.0    6.0   0  72 c0t4d0

and vxstat output:

-bash-4.1# vxstat -i 2
                      OPERATIONS          BLOCKS           AVG TIME(ms)
TYP NAME              READ     WRITE      READ     WRITE   READ  WRITE

Wed May 30 23:33:33 2012
vol mlincek              0        71         0    145408   0.00  29.12

Wed May 30 23:33:35 2012
vol mlincek              0        79         0    155663   0.00  26.45

Wed May 30 23:33:37 2012
vol mlincek              0        83         0    165888   0.00  23.73

Wed May 30 23:33:39 2012
vol mlincek              0        83         0    169984   0.00  24.07

 

Damjan