04-07-2012 11:01 PM
I created a fresh VxVM 6.0 RAID5 volume on four columns. I performed
some basic I/O operations and noticed that writes always end in
read-modify-write cycle. I haven't been able to trigger a full stripe
writes yet, regardless of the command I used to write to the
filesystem. I tried with 64k and with 128k stripe width but in both
cases, I cannot get writes aligned. Filesystem is VxFS.
Below are outputs of my simple tests and configuration of the RAID5
volume. Any hints where I got it wrong?
1. dd example root@mlincek:~# dd if=/dev/zero of=/mlincek/zero.dd bs=384k root@mlincek:~# iostat -zxn 2 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 69.0 115.4 8827.8 14777.0 0.0 0.4 0.0 2.3 0 41 c0t2d0 69.0 115.4 8827.8 14777.0 0.0 0.3 0.0 1.4 0 25 c0t3d0 70.0 115.4 8955.8 14777.0 0.0 0.4 0.0 2.0 0 36 c0t5d0 69.5 115.9 8891.8 14841.0 0.0 0.4 0.0 2.2 0 41 c0t4d0
2. Copy file from another physical volume using cp command:
root@mlincek:~# cp testfile.tmp /mlincek/ extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 19.0 159.9 2250.5 20290.3 0.0 0.5 0.0 2.7 0 40 c0t2d0 17.5 158.4 2239.0 20278.9 0.0 0.3 0.0 1.5 0 16 c0t3d0 36.5 142.4 4489.5 18051.4 0.0 0.5 0.0 2.6 0 33 c0t5d0 37.0 143.4 4494.0 18119.8 0.0 0.5 0.0 3.0 0 36 c0t4d0
3. Volume config
-bash-4.1# vxprint -g mlincekdg -t ... dg mlincekdg default default 28000 1333350408.27.mlincek dm mlincekdg01 disk_1 auto 65536 1953459328 - dm mlincekdg02 disk_2 auto 65536 1953459328 - dm mlincekdg03 disk_3 auto 65536 1953459328 - dm mlincekdg04 disk_4 auto 65536 1953459328 - sd mlincekdg01-01 mlincek-01 mlincekdg01 0 1953458944 0/0 disk_1 ENA sd mlincekdg02-01 mlincek-01 mlincekdg02 0 1953458944 1/0 disk_2 ENA sd mlincekdg03-01 mlincek-01 mlincekdg03 0 1953458944 2/0 disk_3 ENA sd mlincekdg04-01 mlincek-01 mlincekdg04 0 1953458944 3/0 disk_4 ENA pl mlincek-01 mlincek ENABLED ACTIVE 5860376832 RAID 4/256 RW v mlincek - ENABLED ACTIVE 5860376832 RAID - raid5 -bash-4.1# /opt/VRTSvxfs/sbin/vxtunefs -p /mlincek Filesystem i/o parameters for /mlincek read_pref_io = 131072 read_nstream = 4 read_unit_io = 131072 write_pref_io = 393216 write_nstream = 1 write_unit_io = 131072 ...
Mount: /mlincek on
/dev/vx/dsk/mlincekdg/mlincekread/write/setuid/devices/rstchown/delaylog/largefiles/ioerror=mwdisable/xattr/dev=3606d60
Volume was created with the following command: root@mlincek:~# vxassist -g mlincekdg -o ordered make mlincek 5860376576 layout=raid5 nlog=0 ncol=4 stripewidth=128k mlincekdg01 mlincekdg02 mlincekdg03 mlincekdg04
04-21-2012 08:33 PM
I still haven't found out why full stripe writes are not occuring at all. Would it make a difference if I would create a RAID5 volume with four plexes, each with one subdisk instead of one plex containing four subdisks?
-bash-4.1# vxprint -hl Disk group: mlincekdg Group: mlincekdg info: dgid=1333350408.27.mlincek version: 170 alignment: 8192 (bytes) detach-policy: global dg-fail-policy: dgdisable ioship: off copies: nconfig=default nlog=default devices: max=0 cur=1 minors: >= 28000 Disk: mlincekdg01 info: diskid=1333350348.19.mlincek assoc: device=disk_1 type=auto flags: autoconfig device: path=/dev/vx/dmp/disk_1s2 devinfo: publen=1953459328 privlen=65536 mediatype: hdd Disk: mlincekdg02 info: diskid=1333350355.21.mlincek assoc: device=disk_2 type=auto flags: autoconfig device: path=/dev/vx/dmp/disk_2s2 devinfo: publen=1953459328 privlen=65536 mediatype: hdd Disk: mlincekdg03 info: diskid=1333350364.23.mlincek assoc: device=disk_3 type=auto flags: autoconfig device: path=/dev/vx/dmp/disk_3s2 devinfo: publen=1953459328 privlen=65536 mediatype: hdd Disk: mlincekdg04 info: diskid=1333350373.25.mlincek assoc: device=disk_4 type=auto flags: autoconfig device: path=/dev/vx/dmp/disk_4s2 devinfo: publen=1953459328 privlen=65536 mediatype: hdd Disk: mlincekdg05 info: diskid=1335061022.86.mlincek assoc: device=disk_5 type=auto flags: autoconfig device: path=/dev/vx/dmp/disk_5s2 devinfo: publen=3906897616 privlen=65536 mediatype: hdd Volume: mlincek info: len=4383982848 type: usetype=raid5 state: state=ACTIVE kernel=ENABLED cdsrecovery=0/0 (clean) assoc: plexes=mlincek-01 exports=(none) policies: read=RAID exceptions=NO_OP flags: open writeback logging: type=RAID5 loglen=30720 serial=0/0 (disabled) apprecov: seqno=0/0 recovery: mode=default recov_id=0 device: minor=28000 bdev=216/28000 cdev=216/28000 path=/dev/vx/dsk/mlincekdg/mlincek perms: user=root group=root mode=0600 guid: {12e8d2e8-7d75-11e1-9b7b-00304897eb06} mediatype: hdd Plex: mlincek-01 info: len=4383982848 type: layout=RAID columns=4 width=256 state: state=ACTIVE kernel=ENABLED io=read-write assoc: vol=mlincek sd=mlincekdg01-01,mlincekdg02-01,mlincekdg03-01,... flags: complete mediatype: hdd Subdisk: mlincekdg01-01 info: disk=mlincekdg01 offset=0 len=1461327616 assoc: vol=mlincek plex=mlincek-01 (column=0 offset=0) flags: enabled busy device: device=disk_1 path=/dev/vx/dmp/disk_1s2 diskdev=214/194 mediatype: hdd Subdisk: mlincekdg02-01 info: disk=mlincekdg02 offset=0 len=1461327616 assoc: vol=mlincek plex=mlincek-01 (column=1 offset=0) flags: enabled busy device: device=disk_2 path=/dev/vx/dmp/disk_2s2 diskdev=214/130 mediatype: hdd Subdisk: mlincekdg03-01 info: disk=mlincekdg03 offset=0 len=1461327616 assoc: vol=mlincek plex=mlincek-01 (column=2 offset=0) flags: enabled busy device: device=disk_3 path=/dev/vx/dmp/disk_3s2 diskdev=214/322 mediatype: hdd Subdisk: mlincekdg04-01 info: disk=mlincekdg04 offset=0 len=1461327616 assoc: vol=mlincek plex=mlincek-01 (column=3 offset=0) flags: enabled busy device: device=disk_4 path=/dev/vx/dmp/disk_4s2 diskdev=214/258 mediatype: hdd
04-25-2012 07:55 AM
Hi,
little unclear on your query, what do mean with read-modify-write ? If I understand correctly, you want to trigger full writes where read (r/s or kr/s ) should be minimal (ideally 0) ? please let me know if I understand correctly ?
For your second query, if you create 4 plexes with 4 subdisks, it won't be a stripe structure (never tried striping on plexes) ... ideally for a 4 coulmn , it would need to be under 1 plex ... I wouldn't really think that would make a difference as plex is a virtual object anyways, the physical device where you are writing is still subdisk ..
I haven't seen a raid configured with 4 plexes with each subdisk ..
I will see if I can try replicate & see results...
Gaurav
04-25-2012 08:37 AM
Hello Guarav,
indeed. If large writes occur on an idle RAID5 volume, I expect to see really minimal r/s. Most of I/O should be w/s. In some cases I see up to 1/3rd of total I/O are reads, despite that volume has only one single write operation with large I/O block size (dd command).
Damjan
05-23-2012 09:56 AM
The stripe size of the RAID5 volume is 256 sectors. The default is 32 sectors.
bs=384K = 768s > 768/256 = 3 stripe units + 1 parity stripe. The math is right.
Could be an alignment issue. What release of VM?
What is the value of vol_maxio? Perhaps 384k is too large for the disks.
What does vxdisk list mlincekdg01|grep iosize report? (this will contain a min and max value)
05-25-2012 04:31 AM
VM is 6.0PR1 (VRTSvxvm 6.0.10.0) on Solaris 11 (x64). What allignment issue could it be? In above vxprint output all subdisks have same parameters: offset=0 len=1461327616.
vol_maxio was set to default when I started this thread. I set it to 2048 blocks in /etc/system and reboot the system, but no change.
* vxvm_START (do not remove)
forceload: drv/vxdmp
forceload: drv/vxio
forceload: drv/vxspec
* vxvm_END (do not remove)
set maxphys=1048576
set vxio:vol_maxio=2048
-bash-4.1# vxdisk list mlincekdg01|grep iosize
iosize: min=512 (bytes) max=2048 (blocks)
05-30-2012 08:21 AM
Hi
Also bare in mind that if a FileSystem is involved, then it will be responsible for choosing the volume offsets to store the data for your files. i.e it will not put file extents at volume offset 0 etc.
In your dd-to-a-file test, have you tried using a payload > 256k (bs=512k) ?
BTW you could use vxstat to monitor the i/o with the volume objects
cheers
tony
05-30-2012 08:38 AM
Tony,
yes, I did try payload > 256k (see above for 384k). I generated another dd to a file test using 1MB block size:
-bash-4.1# dd if=/dev/zero of=/mlincek/dd.tmp bs=1024k
iostat output during dd command:
extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 27.5 110.0 3521.3 14085.1 0.0 0.8 0.0 5.7 0 44 c0t2d0 27.0 109.5 3457.3 14021.1 0.0 0.4 0.0 2.7 0 30 c0t3d0 14.5 123.0 1856.7 15749.7 0.0 0.3 0.0 2.1 0 16 c0t5d0 13.5 122.5 1728.6 15685.7 0.0 0.8 0.0 5.6 0 71 c0t4d0 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 12.0 0.0 24.0 0.0 0.0 0.0 0.0 0 0 c0t0d0 26.5 104.4 3330.5 13309.3 0.0 0.8 0.0 5.9 0 43 c0t2d0 25.5 103.9 3262.3 13305.1 0.0 0.4 0.0 2.8 0 29 c0t3d0 13.0 116.9 1663.1 14968.2 0.0 0.3 0.0 1.9 0 14 c0t5d0 13.5 117.4 1667.4 14972.4 0.0 0.8 0.0 6.0 0 72 c0t4d0
and vxstat output:
-bash-4.1# vxstat -i 2 OPERATIONS BLOCKS AVG TIME(ms) TYP NAME READ WRITE READ WRITE READ WRITE Wed May 30 23:33:33 2012 vol mlincek 0 71 0 145408 0.00 29.12 Wed May 30 23:33:35 2012 vol mlincek 0 79 0 155663 0.00 26.45 Wed May 30 23:33:37 2012 vol mlincek 0 83 0 165888 0.00 23.73 Wed May 30 23:33:39 2012 vol mlincek 0 83 0 169984 0.00 24.07
Damjan