*nix hacks: May 2010

Nodes	IOPs	TB
6	52,000	27
9	70,000	43
10	78,000	50
15	105,000	79

Today we beat up the XIV by pulling a disk, pulling a whole module of 12 disks and re-zoning its fabric switch so that one of the HBAs lost connectivity, all while under some IO load. It did well.

We used a development server called physics0, which is a Dell M610 with a QLogic HBA running RHEL5.5. It was connected to a six-node XIV 2814 with 1T disks.

We started running four dd's:

$ ps axu | grep dd
root     12994 23.9  0.0  63156   580 pts/0    R    11:13   4:22 dd if /dev/mapper/mpath1p1 of /dev/null
root     12995 23.8  0.0  63156   572 pts/0    D    11:13   4:20 dd if /dev/mapper/mpath1p1 of /dev/null
root     12996 24.0  0.0  63156   572 pts/0    R    11:13   4:23 dd if /dev/mapper/mpath1p1 of /dev/null
root     12997 23.7  0.0  63156   576 pts/0    R    11:13   4:19 dd if /dev/mapper/mpath1p1 of /dev/null
$

top looked like this:

top - 11:43:35 up 4 days, 18:45,  3 users,  load average: 4.00, 4.00, 3.40
Tasks: 214 total,   3 running, 211 sleeping,   0 stopped,   0 zombie
Cpu0  :  0.4%us,  4.0%sy,  0.0%ni, 80.8%id, 14.8%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.0%us, 10.6%sy,  0.0%ni, 53.1%id, 36.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.0%us, 12.0%sy,  0.0%ni, 49.1%id, 38.9%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.0%us,  1.0%sy,  0.0%ni, 95.6%id,  3.2%wa,  0.2%hi,  0.0%si,  0.0%st
Cpu4  :  0.0%us,  1.0%sy,  0.0%ni, 95.6%id,  3.4%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.2%us, 26.4%sy,  0.0%ni,  7.8%id, 49.6%wa,  1.4%hi, 14.6%si,  0.0%st
Cpu6  :  0.0%us, 26.9%sy,  0.0%ni,  7.6%id, 50.3%wa,  1.0%hi, 14.2%si,  0.0%st
Cpu7  :  0.0%us,  3.2%sy,  0.0%ni, 95.6%id,  1.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8177436k total,  8130020k used,    47416k free,  7084504k buffers
Swap:  2096472k total,      220k used,  2096252k free,    36764k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
12996 root      18   0 63156  572  476 D 26.0  0.0   7:00.78 dd                 
12995 root      18   0 63156  572  476 D 22.8  0.0   6:57.09 dd                 
12997 root      18   0 63156  576  476 D 22.6  0.0   6:55.44 dd                 
12994 root      18   0 63156  580  476 R 22.2  0.0   6:58.79 dd                 
  363 root      10  -5     0    0    0 R  2.8  0.0   1:17.65 kswapd0            
 4134 root      18   0 10228  740  628 S  0.2  0.0   0:18.81 hald-addon-stor

First we removed a disk. You can see the dip in IO and quick recovery:

The graph above was made with the xivtop program (control click IOPS, Latency, BW to see all three at the same time).

We saw a graphical representation of the disk turning red on the XIV GUI and were then able to test it, add it, watch it rebuild. As it rebuilt we could get an update on the status with the xcli:

> > monitor_redist 
Type             Initial Capacity to Copy (GB)   Capacity Remaining to Copy (GB)   -1210286092one   Time Started          Estimated Time to Finish   Time Elapsed   
Redistribution   26                              17                                33      2010-05-05 11:25:31   0:04:08                    0:02:02        
> >

We then had the Cisco MDS9124 disable access to Module4 port 2 on the XIV. There was 100% a drop in IO until the retry interval ran out and it switched to an alternate path. IOs were queued until it passed them down a different path. The queue held it and I expect that it will hold it but we want to lower the retry interval so that it doesn't queue it as long. You can see the dip in IO on the graph. It queued it for 45 seconds:

We also observed that the event log saw the available paths lesson:

> > event_list
...
2010-05-05 12:09:27   Informational   HOST_MULTIPATH_OK                                  Host 'nfile0' has redundant connections to the system. #paths=3                                                                                                              
2010-05-05 12:12:15   Informational   HOST_NO_MULTIPATH_ONLY_ONE_PORT                    Host 'physics0' is connected to the system through only one of its ports. #paths=2                                                                                           
2010-05-05 12:13:13   Informational   HOST_MULTIPATH_OK
Host 'physics0' has redundant connections to the system. #paths=4     

> >

Update: see Qlogic retry interval for how to implement instantaneous fail over as opposed to queuing for 45 seconds.

Unplugging the module was the most satisfying. I walked right up behind the unit and pulled both power cords out. The interruption in IO dip was longer as a result of loosing the module. Also IOPs averaged out not at 14,500 IOPs but 10,000 IOPs: The event log showed the unplugged module:

> > event_list
...
2010-05-05 12:46:14   Critical        MODULE_FAILED                                      1:Module:4 failed.                                                                                                                                                           
2010-05-05 12:46:17   Informational   DATA_REBUILD_STARTED                               Rebuild process started because system data is not protected. 1726767077764f the data must be rebuilt.                                                                                                                                                                 
2010-05-05 12:46:20   Informational   HOST_NO_MULTIPATH_ONLY_ONE_MODULE                  Host 'physics0' is connected to the system through only one interface module. #paths=2                                                                                       
2010-05-05 12:49:49   Informational   DATA_REBUILD_COMPLETED                             Rebuild process completed. System data is now protected.     
2010-05-05 13:16:37   Informational   DISK_FINISHED_PHASEIN                 System finished phasing in 1:Disk:4:1.                                                                                                                                       
2010-05-05 13:16:37   Informational   DISK_FINISHED_PHASEIN                 System finished phasing in 1:Disk:4:2.                                                                                                                                       
...
2010-05-05 13:16:38   Informational   DISK_FINISHED_PHASEIN                 System finished phasing in 1:Disk:4:12.                                                                                                     

> >

We can also observe what's "not ok" while the module is unblocked:

> > component_list filter=notok
Component ID           Status         Currently Functioning   
1:Disk:4:1             Failed         no                      
1:Disk:4:2             Failed         no                      
1:Disk:4:3             Failed         no                      
1:Disk:4:4             Failed         no                      
1:Disk:4:5             Failed         no                      
1:Disk:4:6             Failed         no                      
1:Disk:4:8             Failed         no                      
1:Disk:4:9             Failed         no                      
1:Disk:4:10            Failed         no                      
1:Disk:4:11            Failed         no                      
1:Disk:4:12            Failed         no                      
1:Module:4             Initializing   yes                     
1:Data:4               Failed         no                      
1:Interface:4          Failed         no                      
1:Remote:4             Failed         no                      
1:Interface:6          Ready          yes                     
1:Remote:6             Ready          yes                     
1:Ethernet_Cable:4:1   Failed         yes                     
1:Ethernet_Cable:4:2   Failed         yes                     
1:Ethernet_Cable:4:7   Failed         yes                     
1:Ethernet_Cable:4:8   Failed         yes                     
1:Ethernet_Cable:4:9   Failed         yes                     
1:Ethernet_Cable:4:10  Failed         yes                     
1:Disk:4:7             Failed         no                      
> >

and being added back in:

> > component_list filter=notok
Component ID   Status       Currently Functioning   
1:Interface:6  Ready        yes                     
1:Remote:6     Ready        yes                     
1:Disk:4:1     Phasing In   yes                     
1:Disk:4:2     Phasing In   yes                     
1:Disk:4:3     Phasing In   yes                     
1:Disk:4:4     Phasing In   yes                     
1:Disk:4:5     Phasing In   yes                     
1:Disk:4:6     Phasing In   yes                     
1:Disk:4:7     Phasing In   yes                     
1:Disk:4:8     Phasing In   yes                     
1:Disk:4:9     Phasing In   yes                     
1:Disk:4:10    Phasing In   yes                     
1:Disk:4:11    Phasing In   yes                     
1:Disk:4:12    Phasing In   yes  
> >

It's good to know it can take a beating before being sent into production.