yum localinstall --nogpgcheck file1.rpm file2.rpm ...
Thursday, May 27, 2010
yum local install
Given a set of RPMs you can install them with 'yum localinstall' and yum will resolve the dependencies:
Monday, May 24, 2010
Thursday, May 13, 2010
CREDO and Hero
I have an AT&T plan. I'm considering switching to CREDO now that they have an Android-based Phone. I am a little concerned that the HTC Hero is a little dated however; e.g. a 528MHz vs 1GHz snapdragon that the newer phones have. I've asked CREDO what's to stop someone from getting an EVO and using it under their plan. I'll post an update if they write back.
Friday, May 7, 2010
Wednesday, May 5, 2010
XIV resiliency under some load
Today we beat up the XIV by pulling a disk, pulling a whole module of 12 disks and re-zoning its fabric switch so that one of the HBAs lost connectivity, all while under some IO load. It did well.
We used a development server called physics0, which is a Dell M610 with a QLogic HBA running RHEL5.5. It was connected to a six-node XIV 2814 with 1T disks.
We started running four dd's:
$ ps axu | grep dd root 12994 23.9 0.0 63156 580 pts/0 R 11:13 4:22 dd if /dev/mapper/mpath1p1 of /dev/null root 12995 23.8 0.0 63156 572 pts/0 D 11:13 4:20 dd if /dev/mapper/mpath1p1 of /dev/null root 12996 24.0 0.0 63156 572 pts/0 R 11:13 4:23 dd if /dev/mapper/mpath1p1 of /dev/null root 12997 23.7 0.0 63156 576 pts/0 R 11:13 4:19 dd if /dev/mapper/mpath1p1 of /dev/null $top looked like this:
top - 11:43:35 up 4 days, 18:45, 3 users, load average: 4.00, 4.00, 3.40 Tasks: 214 total, 3 running, 211 sleeping, 0 stopped, 0 zombie Cpu0 : 0.4%us, 4.0%sy, 0.0%ni, 80.8%id, 14.8%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 10.6%sy, 0.0%ni, 53.1%id, 36.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 12.0%sy, 0.0%ni, 49.1%id, 38.9%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 1.0%sy, 0.0%ni, 95.6%id, 3.2%wa, 0.2%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 1.0%sy, 0.0%ni, 95.6%id, 3.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.2%us, 26.4%sy, 0.0%ni, 7.8%id, 49.6%wa, 1.4%hi, 14.6%si, 0.0%st Cpu6 : 0.0%us, 26.9%sy, 0.0%ni, 7.6%id, 50.3%wa, 1.0%hi, 14.2%si, 0.0%st Cpu7 : 0.0%us, 3.2%sy, 0.0%ni, 95.6%id, 1.2%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8177436k total, 8130020k used, 47416k free, 7084504k buffers Swap: 2096472k total, 220k used, 2096252k free, 36764k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12996 root 18 0 63156 572 476 D 26.0 0.0 7:00.78 dd 12995 root 18 0 63156 572 476 D 22.8 0.0 6:57.09 dd 12997 root 18 0 63156 576 476 D 22.6 0.0 6:55.44 dd 12994 root 18 0 63156 580 476 R 22.2 0.0 6:58.79 dd 363 root 10 -5 0 0 0 R 2.8 0.0 1:17.65 kswapd0 4134 root 18 0 10228 740 628 S 0.2 0.0 0:18.81 hald-addon-storFirst we removed a disk. You can see the dip in IO and quick recovery: The graph above was made with the xivtop program (control click IOPS, Latency, BW to see all three at the same time).
We saw a graphical representation of the disk turning red on the XIV GUI and were then able to test it, add it, watch it rebuild. As it rebuilt we could get an update on the status with the xcli:
> > monitor_redist Type Initial Capacity to Copy (GB) Capacity Remaining to Copy (GB) -1210286092one Time Started Estimated Time to Finish Time Elapsed Redistribution 26 17 33 2010-05-05 11:25:31 0:04:08 0:02:02 > >We then had the Cisco MDS9124 disable access to Module4 port 2 on the XIV. There was 100% a drop in IO until the retry interval ran out and it switched to an alternate path. IOs were queued until it passed them down a different path. The queue held it and I expect that it will hold it but we want to lower the retry interval so that it doesn't queue it as long. You can see the dip in IO on the graph. It queued it for 45 seconds: We also observed that the event log saw the available paths lesson:
> > event_list ... 2010-05-05 12:09:27 Informational HOST_MULTIPATH_OK Host 'nfile0' has redundant connections to the system. #paths=3 2010-05-05 12:12:15 Informational HOST_NO_MULTIPATH_ONLY_ONE_PORT Host 'physics0' is connected to the system through only one of its ports. #paths=2 2010-05-05 12:13:13 Informational HOST_MULTIPATH_OK Host 'physics0' has redundant connections to the system. #paths=4 > >Update: see Qlogic retry interval for how to implement instantaneous fail over as opposed to queuing for 45 seconds.
Unplugging the module was the most satisfying. I walked right up behind the unit and pulled both power cords out. The interruption in IO dip was longer as a result of loosing the module. Also IOPs averaged out not at 14,500 IOPs but 10,000 IOPs: The event log showed the unplugged module:
> > event_list ... 2010-05-05 12:46:14 Critical MODULE_FAILED 1:Module:4 failed. 2010-05-05 12:46:17 Informational DATA_REBUILD_STARTED Rebuild process started because system data is not protected. 1726767077764f the data must be rebuilt. 2010-05-05 12:46:20 Informational HOST_NO_MULTIPATH_ONLY_ONE_MODULE Host 'physics0' is connected to the system through only one interface module. #paths=2 2010-05-05 12:49:49 Informational DATA_REBUILD_COMPLETED Rebuild process completed. System data is now protected. 2010-05-05 13:16:37 Informational DISK_FINISHED_PHASEIN System finished phasing in 1:Disk:4:1. 2010-05-05 13:16:37 Informational DISK_FINISHED_PHASEIN System finished phasing in 1:Disk:4:2. ... 2010-05-05 13:16:38 Informational DISK_FINISHED_PHASEIN System finished phasing in 1:Disk:4:12. > >We can also observe what's "not ok" while the module is unblocked:
> > component_list filter=notok Component ID Status Currently Functioning 1:Disk:4:1 Failed no 1:Disk:4:2 Failed no 1:Disk:4:3 Failed no 1:Disk:4:4 Failed no 1:Disk:4:5 Failed no 1:Disk:4:6 Failed no 1:Disk:4:8 Failed no 1:Disk:4:9 Failed no 1:Disk:4:10 Failed no 1:Disk:4:11 Failed no 1:Disk:4:12 Failed no 1:Module:4 Initializing yes 1:Data:4 Failed no 1:Interface:4 Failed no 1:Remote:4 Failed no 1:Interface:6 Ready yes 1:Remote:6 Ready yes 1:Ethernet_Cable:4:1 Failed yes 1:Ethernet_Cable:4:2 Failed yes 1:Ethernet_Cable:4:7 Failed yes 1:Ethernet_Cable:4:8 Failed yes 1:Ethernet_Cable:4:9 Failed yes 1:Ethernet_Cable:4:10 Failed yes 1:Disk:4:7 Failed no > >and being added back in:
> > component_list filter=notok Component ID Status Currently Functioning 1:Interface:6 Ready yes 1:Remote:6 Ready yes 1:Disk:4:1 Phasing In yes 1:Disk:4:2 Phasing In yes 1:Disk:4:3 Phasing In yes 1:Disk:4:4 Phasing In yes 1:Disk:4:5 Phasing In yes 1:Disk:4:6 Phasing In yes 1:Disk:4:7 Phasing In yes 1:Disk:4:8 Phasing In yes 1:Disk:4:9 Phasing In yes 1:Disk:4:10 Phasing In yes 1:Disk:4:11 Phasing In yes 1:Disk:4:12 Phasing In yes > >It's good to know it can take a beating before being sent into production.
Sunday, May 2, 2010
The XIV is here...
My organization's XIV arrived recently:
Each rack has several commodity 2U servers with 12 1T drives:
Inside of each server is 8G of RAM and two 2.3 Ghz Intel Quad Cores:
which are connected together with Ethernet in a pre-wired APC cabinet:
Also, the two 60A, 200-240 VAC, single phase, two pole, line-line ground female receptacles are huge:
Subscribe to:
Posts (Atom)