cancel
Showing results for 
Search instead for 
Did you mean: 

RAID5 Full Stripe Write

I created a fresh VxVM 6.0 RAID5 volume on four columns. I performed
some basic I/O operations and noticed that writes always end in
read-modify-write cycle. I haven't been able to trigger a full stripe
writes yet, regardless of the command I used to write to the
filesystem. I tried with 64k and with 128k stripe width but in both
cases, I cannot get writes aligned. Filesystem is VxFS.
Below are outputs of my simple tests and configuration of the RAID5
volume. Any hints where I got it wrong?

1. dd example
root@mlincek:~# dd if=/dev/zero of=/mlincek/zero.dd bs=384k

root@mlincek:~# iostat -zxn 2
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
69.0 115.4 8827.8 14777.0 0.0 0.4 0.0 2.3 0 41 c0t2d0
69.0 115.4 8827.8 14777.0 0.0 0.3 0.0 1.4 0 25 c0t3d0
70.0 115.4 8955.8 14777.0 0.0 0.4 0.0 2.0 0 36 c0t5d0
69.5 115.9 8891.8 14841.0 0.0 0.4 0.0 2.2 0 41 c0t4d0

2. Copy file from another physical volume using cp command:

root@mlincek:~# cp testfile.tmp /mlincek/

extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
19.0 159.9 2250.5 20290.3 0.0 0.5 0.0 2.7 0 40 c0t2d0
17.5 158.4 2239.0 20278.9 0.0 0.3 0.0 1.5 0 16 c0t3d0
36.5 142.4 4489.5 18051.4 0.0 0.5 0.0 2.6 0 33 c0t5d0
37.0 143.4 4494.0 18119.8 0.0 0.5 0.0 3.0 0 36 c0t4d0


3. Volume config

-bash-4.1# vxprint -g mlincekdg -t
...
dg mlincekdg default default 28000 1333350408.27.mlincek
dm mlincekdg01 disk_1 auto 65536 1953459328 -
dm mlincekdg02 disk_2 auto 65536 1953459328 -
dm mlincekdg03 disk_3 auto 65536 1953459328 -
dm mlincekdg04 disk_4 auto 65536 1953459328 -
sd mlincekdg01-01 mlincek-01 mlincekdg01 0 1953458944 0/0 disk_1 ENA
sd mlincekdg02-01 mlincek-01 mlincekdg02 0 1953458944 1/0 disk_2 ENA
sd mlincekdg03-01 mlincek-01 mlincekdg03 0 1953458944 2/0 disk_3 ENA
sd mlincekdg04-01 mlincek-01 mlincekdg04 0 1953458944 3/0 disk_4 ENA
pl mlincek-01 mlincek ENABLED ACTIVE 5860376832 RAID 4/256 RW
v mlincek - ENABLED ACTIVE 5860376832 RAID - raid5

-bash-4.1# /opt/VRTSvxfs/sbin/vxtunefs -p /mlincek
Filesystem i/o parameters for /mlincek
read_pref_io = 131072
read_nstream = 4
read_unit_io = 131072
write_pref_io = 393216
write_nstream = 1
write_unit_io = 131072
...


Mount: /mlincek on
/dev/vx/dsk/mlincekdg/mlincekread/write/setuid/devices/rstchown/delaylog/largefiles/ioerror=mwdisable/xattr/dev=3606d60

Volume was created with the following command:
root@mlincek:~# vxassist -g mlincekdg -o ordered make mlincek
5860376576 layout=raid5 nlog=0 ncol=4 stripewidth=128k mlincekdg01
mlincekdg02 mlincekdg03 mlincekdg04
7 Replies

RAID5 Full stripe writes

I still haven't found out why full stripe writes are not occuring at all. Would it make a difference if I would create a RAID5 volume with four plexes, each with one subdisk instead of one plex containing four subdisks? 

 

-bash-4.1# vxprint -hl
Disk group: mlincekdg

Group:    mlincekdg
info:     dgid=1333350408.27.mlincek
version:  170
alignment: 8192 (bytes)
detach-policy: global
dg-fail-policy: dgdisable
ioship: off
copies:   nconfig=default nlog=default
devices:  max=0 cur=1
minors:   >= 28000

Disk:     mlincekdg01
info:     diskid=1333350348.19.mlincek
assoc:    device=disk_1 type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/disk_1s2
devinfo:  publen=1953459328 privlen=65536
mediatype: hdd
Disk:     mlincekdg02
info:     diskid=1333350355.21.mlincek
assoc:    device=disk_2 type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/disk_2s2
devinfo:  publen=1953459328 privlen=65536
mediatype: hdd
Disk:     mlincekdg03
info:     diskid=1333350364.23.mlincek
assoc:    device=disk_3 type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/disk_3s2
devinfo:  publen=1953459328 privlen=65536
mediatype: hdd
Disk:     mlincekdg04
info:     diskid=1333350373.25.mlincek
assoc:    device=disk_4 type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/disk_4s2
devinfo:  publen=1953459328 privlen=65536
mediatype: hdd
Disk:     mlincekdg05
info:     diskid=1335061022.86.mlincek
assoc:    device=disk_5 type=auto
flags:    autoconfig
device:   path=/dev/vx/dmp/disk_5s2
devinfo:  publen=3906897616 privlen=65536
mediatype: hdd

Volume:   mlincek
info:     len=4383982848
type:     usetype=raid5
state:    state=ACTIVE kernel=ENABLED cdsrecovery=0/0 (clean)
assoc:    plexes=mlincek-01
          exports=(none)
policies: read=RAID exceptions=NO_OP
flags:    open writeback
logging:  type=RAID5 loglen=30720 serial=0/0 (disabled)
apprecov: seqno=0/0
recovery: mode=default
recov_id=0
device:   minor=28000 bdev=216/28000 cdev=216/28000 path=/dev/vx/dsk/mlincekdg/mlincek
perms:    user=root group=root mode=0600
guid: {12e8d2e8-7d75-11e1-9b7b-00304897eb06}
mediatype: hdd
Plex:     mlincek-01
info:     len=4383982848
type:     layout=RAID columns=4 width=256
state:    state=ACTIVE kernel=ENABLED io=read-write
assoc:    vol=mlincek sd=mlincekdg01-01,mlincekdg02-01,mlincekdg03-01,...
flags:    complete
mediatype: hdd
Subdisk:  mlincekdg01-01
info:     disk=mlincekdg01 offset=0 len=1461327616
assoc:    vol=mlincek plex=mlincek-01 (column=0 offset=0)
flags:    enabled busy
device:   device=disk_1 path=/dev/vx/dmp/disk_1s2 diskdev=214/194
mediatype: hdd
Subdisk:  mlincekdg02-01
info:     disk=mlincekdg02 offset=0 len=1461327616
assoc:    vol=mlincek plex=mlincek-01 (column=1 offset=0)
flags:    enabled busy
device:   device=disk_2 path=/dev/vx/dmp/disk_2s2 diskdev=214/130
mediatype: hdd
Subdisk:  mlincekdg03-01
info:     disk=mlincekdg03 offset=0 len=1461327616
assoc:    vol=mlincek plex=mlincek-01 (column=2 offset=0)
flags:    enabled busy
device:   device=disk_3 path=/dev/vx/dmp/disk_3s2 diskdev=214/322
mediatype: hdd
Subdisk:  mlincekdg04-01
info:     disk=mlincekdg04 offset=0 len=1461327616
assoc:    vol=mlincek plex=mlincek-01 (column=3 offset=0)
flags:    enabled busy
device:   device=disk_4 path=/dev/vx/dmp/disk_4s2 diskdev=214/258
mediatype: hdd
 
 
--
Damian
 
 

Hi, little unclear on your

Hi,

little unclear on your query, what do mean with read-modify-write ? If I understand correctly, you want to trigger full writes where read (r/s or kr/s ) should be minimal (ideally 0) ? please let me know if I understand correctly ?

 

For your second query, if you create 4 plexes with 4 subdisks, it won't be a stripe structure (never tried striping on plexes) ... ideally for a 4 coulmn , it would need to be under 1 plex ... I wouldn't really think that would make a difference as plex is a virtual object anyways, the physical device where you are writing is still subdisk ..

I haven't seen a raid configured with 4 plexes with each subdisk ..

I will see if I can try replicate & see results...

 

Gaurav

Hello Guarav, indeed. If

Hello Guarav,

indeed. If large writes occur on an idle RAID5 volume, I expect to see really minimal r/s. Most of I/O should be w/s. In some cases I see up to 1/3rd of total I/O are reads, despite that volume has only one single write operation with large I/O block size (dd command).

Damjan

 

RAID5 stripe size

The stripe size of the RAID5 volume is 256 sectors. The default is 32 sectors.

bs=384K = 768s > 768/256 = 3 stripe units + 1 parity stripe. The math is right.

Could be an alignment issue. What release of VM?

What is the value of vol_maxio? Perhaps 384k is too large for the disks.

What does vxdisk list mlincekdg01|grep iosize report? (this will contain a min and max value)

RAID5 full stripe write

VM is 6.0PR1 (VRTSvxvm 6.0.10.0) on Solaris 11 (x64). What allignment issue could it be? In above vxprint output all subdisks have same parameters:  offset=0 len=1461327616.

vol_maxio was set to default when I started this thread. I set it to 2048 blocks in /etc/system and reboot the system, but no change.

* vxvm_START (do not remove)
forceload: drv/vxdmp
forceload: drv/vxio
forceload: drv/vxspec
* vxvm_END (do not remove)

set maxphys=1048576
set vxio:vol_maxio=2048

 

-bash-4.1# vxdisk list mlincekdg01|grep iosize       
iosize:    min=512 (bytes) max=2048 (blocks)

 

Hi Also bare in mind that if

Hi

Also bare in mind that if a FileSystem is involved, then it will be responsible for choosing the volume offsets to store the data for your files. i.e it will not put file extents at volume offset 0 etc.

In your dd-to-a-file test, have you tried using a payload > 256k (bs=512k) ?

BTW you could use vxstat to monitor the i/o with the volume objects

cheers

tony

Large payload

Tony,

yes, I did try payload > 256k (see above for 384k). I generated another dd to a file test using 1MB block size:

-bash-4.1# dd if=/dev/zero of=/mlincek/dd.tmp bs=1024k

iostat output during dd command:

                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   27.5  110.0 3521.3 14085.1  0.0  0.8    0.0    5.7   0  44 c0t2d0
   27.0  109.5 3457.3 14021.1  0.0  0.4    0.0    2.7   0  30 c0t3d0
   14.5  123.0 1856.7 15749.7  0.0  0.3    0.0    2.1   0  16 c0t5d0
   13.5  122.5 1728.6 15685.7  0.0  0.8    0.0    5.6   0  71 c0t4d0
                    extended device statistics              
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0   12.0    0.0   24.0  0.0  0.0    0.0    0.0   0   0 c0t0d0
   26.5  104.4 3330.5 13309.3  0.0  0.8    0.0    5.9   0  43 c0t2d0
   25.5  103.9 3262.3 13305.1  0.0  0.4    0.0    2.8   0  29 c0t3d0
   13.0  116.9 1663.1 14968.2  0.0  0.3    0.0    1.9   0  14 c0t5d0
   13.5  117.4 1667.4 14972.4  0.0  0.8    0.0    6.0   0  72 c0t4d0

and vxstat output:

-bash-4.1# vxstat -i 2
                      OPERATIONS          BLOCKS           AVG TIME(ms)
TYP NAME              READ     WRITE      READ     WRITE   READ  WRITE

Wed May 30 23:33:33 2012
vol mlincek              0        71         0    145408   0.00  29.12

Wed May 30 23:33:35 2012
vol mlincek              0        79         0    155663   0.00  26.45

Wed May 30 23:33:37 2012
vol mlincek              0        83         0    165888   0.00  23.73

Wed May 30 23:33:39 2012
vol mlincek              0        83         0    169984   0.00  24.07

 

Damjan