Thursday, November 29, 2007

Disk I/O

I've been looking at I/O on my disks with iostat. iostat can be thought of as a wrapper to /proc/diskstats. You can see how the diskstats change by putting a test load on them. The following does lots of reads from a particular device that I'm trying to trouble shoot:
dd if=/dev/emcpowerb of=/dev/null bs=512 count=100000000000 &
In a multipath set up iostat might look like:
avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.01    0.00    0.02    0.01   99.96

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
...
sdb               0.00         0.00         0.00       2496          0
sdc               0.01         2.52         0.00    2583090        496
sdd               0.00         0.00         0.00       2496          0
sde               0.01         2.52         0.00    2581664        952
...
emcpowerb         0.02         5.02         0.00    5150338       1448
Note that emcpowerb has the total between sdc and sde. You can use several dd comands like the above to drive the I/O load up. Look at all the time spent in wa (iowait - amount of time the CPU has been waiting for I/O to complete):
top - 22:27:44 up 12 days,  3:26,  3 users,  load average: 7.05, 2.78, 1.04
Tasks: 124 total,   1 running, 123 sleeping,   0 stopped,   0 zombie
Cpu0  :  1.4% us,  4.8% sy,  0.0% ni,  0.0% id, 93.9% wa,  0.0% hi,  0.0% si
Cpu1  :  1.7% us,  7.1% sy,  0.0% ni,  0.0% id, 91.2% wa,  0.0% hi,  0.0% si
Cpu2  :  0.0% us,  0.3% sy,  0.0% ni, 81.8% id, 17.9% wa,  0.0% hi,  0.0% si
Cpu3  :  0.3% us,  3.4% sy,  0.0% ni,  6.1% id, 90.1% wa,  0.0% hi,  0.0% si
Mem:   8310532k total,  1028576k used,  7281956k free,   767020k buffers
Swap:  2031608k total,        0k used,  2031608k free,   128680k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
 2400 root      18   0  4836  420  356 D    2  0.0   0:00.83 dd                 
 2352 root      18   0  5364  420  356 D    2  0.0   0:06.15 dd                 
 2358 root      18   0  5036  420  356 D    2  0.0   0:04.05 dd                 
 2359 root      18   0  5204  420  356 D    2  0.0   0:04.06 dd                 
 2402 root      18   0  4492  420  356 D    2  0.0   0:00.86 dd                 
 2345 root      18   0  4112  420  356 D    2  0.0   0:09.21 dd                 
 2348 root      18   0  3884  416  356 D    2  0.0   0:06.40 dd                 
 2401 root      18   0  5276  420  356 D    2  0.0   0:00.81 dd                 
 2353 root      18   0  4348  420  356 D    1  0.0   0:06.09 dd   
and see how your multipath device handles it (note that 5 and 3 queued I/Os):
# powermt display dev=emcpowerb | egrep "sdc|sde"
   1 qla2xxx                   sdc       SP A1     active  alive      5      0
   2 qla2xxx                   sde       SP A0     active  alive      3      0

# iostat
...
avg-cpu:  %user   %nice    %sys %iowait   %idle
           0.01    0.00    0.02    0.04   99.93

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
...
sdb               0.00         0.00         0.00       2496          0
sdc               0.08        17.64         0.00   18117818        496
sdd               0.00         0.00         0.00       2496          0
sde               0.08        17.26         0.00   17725688        952
...
emcpowerb         0.15        34.89         0.00   35829090       1448
Note that there are obviously more reads. Note also how you can get similar stats directly from /proc/diskstats:
# cat /proc/diskstats | egrep "emc|sdc|sde"
   8   32 sdc 121192 15 26827106 2477633 51 0 496 467 7 641017 2478212
   8   64 sde 120903 15 26285720 2529294 45 0 952 472 2 653448 2529891
 120   16 emcpowerb 240323 6396919 53098682 5097396 96 85 1448 1013 10 .. ..
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    1      2       3        4       5  6  7    8   *9* 
Note the 9th column (# of I/Os currently in progress) which is the same queued I/O which "powermt display dev=emcpowerb" displayed. If I stop the load test (killall dd) you can see the queued I/O drop:
# cat /proc/diskstats | egrep "emc|sdc|sde"
   8   32 sdc 160459 15 35087122 3570191 51 0 496 467 0 879727 3570647
   8   64 sde 160766 15 34445608 3638404 45 0 952 472 0 892158 3638873
 120   16 emcpowerb 319453 8370224 69518314 7328191 96 85 1448 1013 0 .. ..
Here's an easy way to focus on the 9th column:
cat /proc/diskstats |grep sdc |awk {'print $12'}

No comments: