Tuesday, January 22, 2008

Fibre Channel I/O Calls

A colleague of mine came across the following fact regarding I/O and fabric switches:
  • 2 Gbps FC can queue 254 I/O commands
  • 4 Gbps FC can queue 2048 I/O commands
We wonder if moving a certain server from a 2G switch (McData DS-24M2) to a 4G switch (Cisco MDS-9124) will improve performance. In order to determine this I'd need to see how many I/O commands we have for different points in time.

I've blogged about seeing I/O with /proc/diskstats before. Let's look at it closer with awk. Note that I don't have anything conclusive below, just some observations. I do think I can use this over time and recognize trends on my system however.

Here's what proc says about a particular LUN:

# cat /proc/diskstats |  grep " sdc "
   8   32 sdc 391348542 3811329 642958694 1765819166 212637694 
1424571277 438970288 1314722135 1 366113251 3445284834
#
As per comp.os.linux.development these fields (starting after the device name) are:
Field 1 -- # of reads issued
Field 2 -- # of reads merged, field 6 -- # of writes merged
Field 3 -- # of sectors read
Field 4 -- # of milliseconds spent reading
Field 5 -- # of writes completed
Field 7 -- # of sectors written
Field 8 -- # of milliseconds spent writing
Field 9 -- # of I/Os currently in progress
Field 10 -- # of milliseconds spent doing I/Os
Field 11 -- weighted # of milliseconds spent doing I/Os 
Or to put it another way:
 391348542 reads issued (4)
   3811329 reads merged (5)
 642958694 sectors read (6)
1765819166 milliseconds spent reading (7)

 212637694 writes completed (8)
1424571277 writes merged (9)
 438970288 sectors written (10)
1314722135 milliseconds spent writing (11)

         1 I/Os currently in progress (12)
 366113251 milliseconds spent doing I/Os (13)
3445284834 weighted milliseconds spent doing I/Os (14)
Note that I've put the awk offset in parentheses above. We can then take more readings and focus on essential columns. E.g. we spend more time reading than writing:
#  while [ 1 ]; do grep " sdc " /proc/diskstats |
     awk {'print $7 " " $11'}; sleep 1; done
1767053699 1323835167
1767053722 1323835217
1767054231 1323858000
1767054400 1323858477
1767054401 1323859097
1767054420 1323859106
1767055201 1323863662
1767055543 1323863671
1767055666 1323864799
1767056048 1323865700
#
If we look at them every quarter second we can see spikes in the number of I/O along with number of reads and write issues during that time (looking at a larger interval hides the spikes):
#  while [ 1 ]; do grep " sdc " /proc/diskstats | 
     awk {'print  $12 "\t" $4 "\t" $8'}; sleep 0.25; done
1       391689249       213184077
1       391689253       213184467
4       391689253       213184912
4       391689253       213185311
4       391689253       213185780
1       391689257       213186170
1       391689257       213186558
2       391689258       213187017
68      391689271       213187319
1       391689271       213187801
2       391689313       213188219
1       391689338       213188481
2       391689379       213188863
44      391689379       213189282
32      391689384       213190180
3       391689400       213190569
3       391689400       213190971
1       391689405       213191429
3       391689407       213192172
# 
We can check the math on the last few lines. Because our sampling interval is missing events that occur in between our numbers won't add up exactly, but we can see a general trend in some of these numbers:
1       391689338       213188481
2       391689379       213188863
44      391689379       213189282
32      391689384       213190180
There were a lot more writes than reads from the samples taken above
3       391689400       213190569
3       391689400       213190971
Nothing was read, but 3 I/O operations seem to have been written.

I don't have anything conclusive from the above but I do think I can use this over time and recognize trends on my system.

No comments: