Friday, December 3, 2010
me, make, meet
Tuesday, November 30, 2010
iptables state and cisco firewalls
- -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 443 -j ACCEPTThe above however is for NEW connections only. However, a rule like the following is also added by RedHat's tool which allows anything related to an existing state entry to pass:
-A RH-Firewall-1-INPUT -m state --state ESTABLISHED,RELATED -j ACCEPTThis all seems reasonable but I'm having a problem where packets are in an INVALID state and are getting blocked by iptables (confirmed with an iptables log). The workaround that my colleague came up with is to drop state out of the equation by simply having:
- -A RH-Firewall-1-INPUT -m tcp -p tcp --dport 443 -j ACCEPTThe same colleague also found a blog post on Cisco PIX mangled packets and iptables state tracking which offers are much more satisfying explanation. We're looking into this further.
Friday, October 29, 2010
RedHat Cloud Offerings
My organization has server clusters that host about 100 VMs and more are coming. These systems were built using RedHat's cluster manager and Xen or KVM virtualization. Do we have what some would call a private cloud? Not yet, but we're considering getting closer. Since we're already using RedHat we're looking into what they recommend first.
According to a RedHat sponsored IDC paper: "Cloud services are shared, standardized service and are available from a self-service catalog; able to scale in an 'elsatic' fashion as needed; [can be] priced on actual usage; accessible via the Internet; and supportive of published APIs".
The above definition makes me believe we're on our way to having a private cloud and RedHat's cloud offerings include everything we've done above with a few additions which would allow our clusters to satisfy the cloud definition. The additions are specifically what RedHat calls cloud management services and RedHat implements them with the following three packages.
- Satellite, which provides configuration management and network services via DHCP, DNS, and PXE. Since we use RHN and have a kickstart server this doesn't seem like too much of a reach for us. I can also imagine Satellite offering the functionality I experienced when I tried out EC2. We're going to look more into satellite, spacewalk, and possibly puppet.
- RHEV-M, which offers a GUI for managing VMs similar to vmWare's virtual infrastructure client or EC2's VM manager. It implements the catalog aspect of the cloud definition and allows for easy deployment. Our current method of deploying VMs is to virt-clone a golden image and our catalog consists of Ubuntu Server, Fedora 13, Win200{3,8} (yuck), and RHEL5.5. We then console into the cloned system, update it's network configuration (our clusters have several networks trunked into them) and other settings. Does RHEV-M make this easier? RHEV-M also offers live migration, high availability, load balancing, and power saving, though we have all of this from vanilla clusvcadm and just haven't yet implemented power saving or ballooned dynamic memory. We're comfortable doing this via the CLI and are considering scripting it, though we're still considering trying RHEV-M to make sure we're not missing out.
- MRG Grid, which implements the API aspect defined above for portability between clouds. This feature allows one to automatically spin a VM out to EC2 as per a certification between RedHat and Amazon. We have do not yet have this feature and move VMs using tricks like dd|nc, mounting snapshots and doing virt-clones, etc. I wonder how quickly and what with service interruption a VM can be moved to another cloud and back.
Monday, October 25, 2010
Wednesday, October 20, 2010
Disable Evoluent vertical mouse thumb and bottom buttons
xmodmap -e "pointer = 1 2 3 4 5 15 14 13 12 10 11 9 8 7 6"
Sunday, October 10, 2010
bash set theory
$ cat one a b c $ cat two b c d e $ for x in `cat one`; do grep $x two; done b c $Complement of two files:
$ cat one a b c $ cat two b c d e $ sort one one two | uniq -u d e $ sort two two one | uniq -u a $After playing around for a little I found Peteris Krumins' blog who provided the complement method I'm pasting above.
Thursday, September 30, 2010
udev: renamed network interface eth0 to eth1
Device eth0 does not seem to be present, delaying initializationdmesg revealed that it had renamed eth0 to eth1:
udev: renamed network interface eth0 to eth1I want eth0, not eth1. Virt-cloning changes the mac address so that probably confused udev. If this happens, then you can remove the a bad rule that it made for itself and ask it to try again. In other words:
rm /etc/udev/rules.d/70-persistent-net.rules udevadm trigger
tahoe-lafs
Monday, September 27, 2010
OpenLDAP memberOf Overlay
Monday, September 20, 2010
Simple idea for multifactor authentication
Thursday, September 16, 2010
SET: Social Engineering Toolkit
gpg2, gpg-agent, enigmail
Wednesday, September 15, 2010
Tuesday, September 14, 2010
Sunday, September 5, 2010
The Elements of Style: 0
"The approach to style is by way of plainness, simplicity, orderliness, and sincerity."
"Rich, ornate prose, is hard to digest, generally unwholesome, and sometimes nauseating."
The Elements of Style, Strunk and White.
gns3
"GNS3 is a graphical network simulator that allows simulation of complex networks. It can also be used to experiment features of Cisco IOS, Juniper JunOS or to check configurations that need to be deployed later on real routers."
Installing it looks very easy as per the installation video. It also has a tutorial.
Friday, September 3, 2010
Tuesday, August 31, 2010
Thursday, August 19, 2010
tcp timeout in linux kernel
/proc/sys/net/ipv4/tcp_keepalive_timethat is lower than the default of 7200 seconds and which correlates better with the firewall timeout will fix our problem. I'll be curious if it works.
Thursday, August 12, 2010
Friday, August 6, 2010
Tunnel programs
Note: I'm on a bus reading a magazine with cool tricks with the above but I'm about to throw the magazine away.
Thursday, August 5, 2010
6to4 outbound relay servers, web apps, and programmer archaeologists to dig through the layers
At work yesterday we discussed how backwards compatibility over time leads to so many layers that there might actually be careers for programmer archaeologists.
Our conversation started with our organization's move towards IPv6 (start early, learn early, act as a consultant for those who waited, and profit!) and got into the details of a life of a packet for a person using IPv4 to access a resource we might host over IPv6 only. Someone pointed out how a 6to4 outbound relay server would help us as recommended by Geoff Huston.
We then tried to imagine the same transaction occurring for a web app where javascript requests XML (you do this every time you click a message in gmail) and just how much has to happen. The XML was generated on the server side by a script, which ran on top of a JVM (running on top of Xen or KVM (which was live migrating via a CLVM-based cluster)) which queried a database, which was using the ext3 file system, which retrieved blocks through a fibre switch to get it from a SAN, which distributed the data over 120 SATA disks using RAID-X.... all the way back and up down through OSI but cranked back through a 6to4 Relay Service and finally applied to the DOM in a browser with more lines of code than a Unix kernel (a relic left over from the browser wars for backward compatibility with invalid HTML)... and we had to laugh... Glad I learned to deal with abstraction early.
Someone mentioned the book A Deepness in the Sky to bring up the concept of a programmer archaeologists which I found rather funny. As per the wikipedia:
"An interesting feature of the Qeng Ho's computer and timekeeping systems is the advent of "programmer archaeologists"[1]: the Qeng Ho are packrats of computer programs and systems, retaining them over millennia, even as far back to the era of Unix programs (as implied by one passage mentioning that the fundamental time-keeping system is the Unix epoch, retained for backwards compatibility). This massive accumulation of data implies that almost any useful program one could want already exists in the Qeng Ho fleet library, hence the need for computer archaeologists to dig up needed programs, work around their peculiarities and bugs, and assemble them into useful constructs."
Friday, July 30, 2010
New Mouse
Thursday, July 29, 2010
New Sorting World Record
Tuesday, July 27, 2010
DMCA Exceptions
More often than not there is bad news in the area of copyright that involves people loosing freedom. Yesterday, as a friend of mine pointed out, there was good news.
The Library of Congress introduced new DMCA exceptions documented by the EFF and gizmodo. Also, GE played the role of the good guys and took a case to the 5th Circuit Appeals Court against MGE UPS Systems and won a ruling from a Federal judge that: It's OK to Break DRM for Fair Use.
Sunday, July 25, 2010
Thursday, July 22, 2010
yum confused?
Error: failed to retrieve repodata/filelists.xml.gz from rhel-x86_64-server-5 error was [Errno -1] Metadata file does not match checksum You could try using --skip-broken to work around the problem You could try running: package-cleanup --problems package-cleanup --dupes rpm -Va --nofiles --nodigestit's likely that yum got confused. This happened to me when I changed a subscription channel, re-registered a system and tried to yum upgrade. I have been able to consistently fix it by running:
rm -rf /var/cache/yum/* yum clean all yum update
Tuesday, July 20, 2010
Qlogic retry interval
I posted earlier about a test with the XIV with where the retry interval was too high and IO was queued longer than it had to be on the system buffer as opposed to being written to disk (see graph). We solved this a while ago, but I forgot to post an update.
Qlogic cards have an on-board BIOS setting, which sets the path failure delay to 45 seconds. In order for multipath to perform properly on these systems, this limit should be lowered. We're currently setting the limit to 0. The options to find in the BIOS are under the advanced settings for each host controller.
You should also set the following in in /etc/modprobe.conf (Fedora/RedHat/CentOS) or /etc/modprobe.d/aliases (Debian/Ubuntu):
options qla2xxx ql2xfailover=0 options qla2xxx qlport_down_retry=1 options qla2xxx ql2xmaxqdepth=64Note that the maxqdepth setting on the third line isn't about fail over but about getting more IOPS with respect to the queue depth. This is a standard XIV tuning practice. If you're using Emulex cards you can set the queue depth to 64 with the following line in your modprobe configuration file:
options lpfc lpfc_hba_queue_depth=64but it is not necessary to change a retry or failover interval.
Monday, July 19, 2010
Debian, Dell m600, XIV SAN
Installing Debian on a Dell m600 for use with an IBM XIV SAN (or any multipath'd SAN) requires a few tricks.
Modified Network Install ISO
The Broadcom NetExtreme II NICs that come in the Dell blades require non-free firmware not included in Lenny. I don't fault Debian for this. The same problem exists for the Qlogic HBA's. You can get around this by making your own custom Lenny install boot ISO. J Snell's howto describes how to add firmware for the NIC, but not the HBA. However, adding the HBA is just as easy. Note that the install will go more smoothly (no ambiguity as to which disk is /dev/sda) if no LUNs are mapped by the SAN. You'll see it question where it can load that firmware (bnx2-06-4.0.5.fw and ql2400_fw.bin) during the install.
Some details on how I created my ISO are:
cd /mnt/ mkdir cdrom mount -o loop debian-testing-amd64-CD-1.iso cdrom cd /home/$USER/debian mkdir isocopy cp -av cdrom isocopy cd isocopy/ dpkg-deb -x firmware-bnx2_0.25_all.deb nic-firmware cp nic-firmware/lib/firmware/* isocopy/cdrom dpkg-deb -x firmware-qlogic_0.14+lenny2_all.deb hba-firmware cp hba-firmware/lib/firmware/* isocopy/cdrom cd isocopy/cdrom mkisofs -o ../modified-debian.iso -b isolinux/isolinux.bin -c isolinux/boot.cat\ -no-emul-boot -boot-load-size 4 -boot-info-table -J -R -V disks .
Burn the modified-debian.iso to a CD-ROM and boot the server from it. Once booted to the new ISO, you need to immediately switch to another console and copy the files over before the installer looks for them:
mkdir lib/firmware cp /cdrom/*.fw lib/firmwareAfter that's done, the Debian installer should be able to autoload the bnx2 driver without any problems. Note that you can't do the above until after the system mounts the CD ROM. It will do this automatically. You'll want to do this before you scan for network devices but after the system mounts the CD ROM. I.e. if you try to do the above before the system automounts the CD you'll have trouble.
Mount by UUID
After assigning multiple paths from the SAN the Debian system became confused as to which block device was the local disk where / should be mounted. E.g.server:~# ls /dev/sd<tab><tab> sda sdaa1 sdaf sdaj sdan1 sdd1 sdi sdm1 sdr sdv1 sda1 sdab sdaf1 sdak sdao sde sdi1 sdn sdr1 sdw sda2 sdab1 sdag sdak1 sdap sde1 sdj sdn1 sds sdw1 sda5 sdac sdag1 sdal sdb sdf sdj1 sdo sds1 sdx sda6 sdac1 sdah sdal1 sdb1 sdg sdk sdo1 sdt sdx1 sda7 sdad sdah1 sdam sdc sdg1 sdl sdp sdt1 sdy sda8 sdad1 sdai sdam1 sdc1 sdh sdl1 sdq sdu sdy1 sdaa sdae sdai1 sdan sdd sdh1 sdm sdq1 sdv sdz server:~#Not only that, but it changed after each reboot. So, /dev/sda{1,2,5,6,7,8} was the local disk in the above scenario, but after a reboot it was /dev/sdao{1,2,5,6,7,8} etc. If this happens you'll see a message like the following after waiting for the system to boot:
Target filesystem doesn't have /sbin/init No init found. Try passing init= bootargYou'll then be dropped into BusyBox. You should then determine which disk is your local disk and mount it. In my case I was able to know it by the amount of partitions that it had as per above (e.g. only sdX had six partitions). Then mount that block device:
cd / mkdir /root mount -t ext3 /dev/sdap1 /rootThen bind /dev into your new root directory:
mount --bind /dev/ /root/dev/Then chroot into your new root directory:
chroot /root/then modify your /etc/fstab so you can mount the new block device. BusyBox doesn't have vi so you can use use sed:
sed s/sda/sdap/g -i /etc/fstabYou should then be able to mount everything:
mount -aI would normally mount by label at this point, but e2label does not work on Debian the way it works on RedHat/Fedora. However, you can mount by UUID with Debian and you can find the UUID with the vol_id command:
vol_id /dev/sdao7If you grep UUID from the above you should be able to set $UUID to the output and then modify another sed script to bring the UUID into the /etc/fstab. Once you're able to mount a copule of partitions by UUID reboot to see if you can bring the system up with / mounting correctly.
Multipath
Multipath is able to resolve the confusion from the many block devices and allow you to mount by nice names like /dev/mpathX.apt-get install multipath-tools dmsetup modprobe dm_mod dm-multipath dm-round-robinYou can optimize the queue depth for the XIV on Debian with Qlogic HBAs by adding the following to /etc/modprobe.d/aliases:
options qla2xxx ql2xmaxqdepth=64 options qla2xxx ql2xfailover=0 options qla2xxx qlport_down_retry=1I was then able to test multipath successfully.
Configuration Management and Security: Bellovin & Bush
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 27, NO. 3, APRIL 2009
Configuration Management and Security
Steven M. Bellovin and Randy Bush
Abstract:
Proper configuration management is vital for host and network security. We outline the problems, especially for large-scale environments, and discuss the security aspects of a number of different configuration scenarios, including security appliances (e.g., firewalls), desktop and server computers, and PDAs. We conclude by discussing research challenges.
Thursday, July 15, 2010
Cisco 2501
After starting minicom back up I see:
Welcome to minicom 2.2 OPTIONS: I18n Compiled on Apr 27 2007, 15:50:20. Port /dev/ttyS0 Press CTRL-A Z for help on special keys First, would you like to see the current interface summary? [yes]: AT S7=45 S0=0 % Please answer 'yes' or 'no'. First, would you like to see the current interface summary? [yes]: yes Any interface listed with OK? value "NO" does not have a valid configuration Interface IP-Address OK? Method Status Protocol Ethernet0 unassigned NO not set up down Serial0 unassigned NO not set down down Serial1 unassigned NO not set down down Configuring global parameters: Enter host name [Router]: ... Use this configuration? [yes/no]: yes Building configuration... Press RETURN to get started! %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0, changed state to down %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial0, changed state to down %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial1, changed state to down %LINK-3-UPDOWN: Interface Ethernet0, changed state to up %LINK-3-UPDOWN: Interface Serial0, changed state to down %LINK-3-UPDOWN: Interface Serial1, changed state to down %LINK-5-CHANGED: Interface Ethernet0, changed state to administratively down %LINK-5-CHANGED: Interface Serial0, changed state to administratively down %LINK-5-CHANGED: Interface Serial1, changed state to administratively down %SYS-5-RESTART: System restarted -- Cisco Internetwork Operating System Software IOS (tm) 3000 Software (CPA25-Y-L), Version 11.0(4), RELEASE SOFTWARE (fc1) Copyright (c) 1986-1995 by cisco Systems, Inc. Compiled Mon 18-Dec-95 19:21 by alanyu %SNMP-3-SOCKET: can't open UDP socket Router> Router> Router>sh version Cisco Internetwork Operating System Software IOS (tm) 3000 Software (CPA25-Y-L), Version 11.0(4), RELEASE SOFTWARE (fc1) Copyright (c) 1986-1995 by cisco Systems, Inc. Compiled Mon 18-Dec-95 19:21 by alanyu ... cisco 2500 (68030) processor (revision L) with 2044K/2048K bytes of memory. Processor board ID 02425768, with hardware revision 00000000 ... Router> Router> Router>All set. Looks like I'll be playing with IOS 11.0.
Mail Clients: mutt + zimbra et al
A friend of mine caught me saying I was tempted to get mutt working with zimbra via imap/smtp. He beat me to it. Here's a .muttrc:
# Automatically log in to this mailbox at startup set spoolfile="imaps://netid@incoming.domain.tld/" # Define the = shortcut, and the entry point for the folder browser set folder="imaps://netid@incoming.domain.tld/" # Use a well-defined SMTP server set smtp_url="smtps://netid@outgoing.domain.tld:465/" # Define Zimbra-default folders set record="=Sent" set postponed="=Drafts" # activate TLS if available on the server set ssl_starttls=yes # always use SSL when connecting to a server set ssl_force_tls=yes # Don't wait to enter mailbox manually unset imap_passive # Automatically poll subscribed mailboxes for new mail set imap_check_subscribed # Reduce polling frequency to a sane level set mail_check=60 # And poll the current mailbox more often (not needed with IDLE in post 1.5.11) set timeout=10 # keep a cache of headers for faster loading (1.5.9+?) set header_cache=~/.hcache # Display download progress every 5K set net_inc=5 # Use emacs (remember to server-start) set editor="emacsclient -t" # Set From: address my_hdr From: First LastNote that you must have emacs running in the background to use "emacsclient -t". If you do this add "(server-start)" to your .emacs or use "M-x server-start".
Our discussion on mail clients lead to others worth checking out:
Droid X bricks if you try to mod it
Saturday, July 10, 2010
Fully Free Android system-image?
iPhone to Nexus One
Friday, July 9, 2010
Wednesday, June 30, 2010
openssl the command is amazing
Verify if numbers are prime:
$ openssl prime 119054759245460753 1A6F7AC39A53511 is not prime $
Encrypt a file with your favorite cipher:
openssl list-cipher-commands
base64 encode a file
openssl enc -base64 -in file.txt
Generate a shadow-style password hash:
$ openssl passwd -1 MySecret $1$sXiKzkus$haDZ9JpVrRHBznY5OxB82. $
Many others. I never knew it could do some much. Thanks madboa.com.
Also, I played around with openssl while updating certificates for about 30 web servers. I was able to check that the new cert was installed correctly on all of the hosts easily looking at that host's SSL finger prints as served from Apache:
echo EOF | openssl s_client -connect $host:443 -showcerts | openssl x509 -fingerprint -noout -md5The above fits well into a bash loop which can be run before and after you replace the certs:
for x in `cat vhosts.txt`; do echo "vhost: $x";echo EOF | openssl \ s_client -connect $x:443 -showcerts \ | openssl x509 -fingerprint -noout \ -md5; done | egrep "vhost|Fingerprint" > finger_prints.txtYou can then diff the finger prints files to verify that they're what you're expecting them to be.
Sunday, June 27, 2010
Android Development with Clojure
Saturday, June 26, 2010
FLOSS Smart Phones
HTC Dream currently gives the most software freedom among Android/Linux deployments. It is unlikely that Google wants anything besides their applications to be proprietary. While Google has been unresponsive when asked why these hardware interface libraries are proprietary, it is likely that HTC, the hardware maker with whom Google contracted, insisted that these components remain proprietary, and perhaps fear patent suits... no detailed analysis of the Nexus One is yet available, it's likely similar to the HTC Dream.With regard to permission to run one's own software on the device -- something central to phone development -- Brad points out:
Google is our best ally in this regard. The HTC Dream developer models, called the Android Dev Phone 1 (aka ADP1). permit the user to install any operating system on the phone, and the purchase agreement extract no promises from the purchaser regarding what software runs on the device. Google has no interest in locking you to a single carrier, but only to a single Google experience application vendor. Offering a user “carrier freedom of choice”, while tying those users tighter to Google applications, is probably a central part of their marketing plans.Other interesting quotes include:
- Community-oriented forks... [must] begin in the Android/Linux space.
- A traditional “get some volunteers together and write some code” approach can achieve great advancement toward community-oriented FLOSS systems on mobile devices. Developers could initially focus on applications for the existing “mostly FLOSS” platforms of MeeGo and Android/Linux.
- We need to identify the proprietary software that is important, and write free software replacements. It's catch-up work, but our community is usually successful at such tasks. So, let's get coding on mobile! (see FSF Bulletin, Issue 16, May 2010)
Thursday, June 24, 2010
Wednesday, June 23, 2010
Goodbye EMC
"Linux Magazine" article on SSD file systems
MBps test ext2 ext4 ext4 no jrnl btrfs btrfs(ssd_spread) ----------------------------------------------------------------------------- dbench -D /test 10 520 407 428 347 347 bonnie++ /test -s 2048 38 58 72 64 67The article also had some useful links.
Monday, June 21, 2010
Foot Pedals
Thursday, June 10, 2010
Multicast and Red Hat Cluster Suite
After you start your cluster see which multicast address was assigned:
[root@vserver0 ~]# grep "default multicast" /var/log/messages Dec 27 16:51:51 vserver0 openais[6953]: [MAIN ] Using default multicast address of 239.192.104.1 Dec 27 18:26:33 vserver0 openais[3664]: [MAIN ] Using default multicast address of 239.192.104.1 Dec 28 04:25:28 vserver0 openais[13028]: [MAIN ] Using default multicast address of 239.192.104.1 Dec 30 14:02:35 vserver0 openais[9533]: [MAIN ] Using default multicast address of 239.192.104.1 [root@vserver0 ~]#
I also see that each node is listening on the same address:
[root@vserver0 ~]# netstat -an | grep 239 udp 0 0 239.192.104.1:5405 0.0.0.0:* [root@vserver0 ~]# [root@vserver1 ~]# netstat -an | grep 239 udp 0 0 239.192.104.1:5405 0.0.0.0:* [root@vserver1 ~]# [root@vserver2 ~]# netstat -an | grep 239 udp 0 0 239.192.104.1:5405 0.0.0.0:* [root@vserver2 ~]#
The cluster documentation talks about configuring multicast and implies that you might have to enable it on your router. You can verify if your router is passing multicast packets between nodes with tcpdump and iperf as described in taosecurity.blogspot.com. I can reproduce the results described in the taosecurity blog in my cluster as so:
wget ftp://ftp.pbone.net/mirror/ftp.freshrpms.net/pub/freshrpms/pub/dag/redhat/el5/en/x86_64/dries/RPMS/iperf- 2.0.4-1.el5.rf.x86_64.rpm rpm -ivh iperf-2.0.4-1.el5.rf.x86_64.rpmI set up vserver0 as an iperf server listening on multicast address 239.192.104.1:
[root@vserver0 ~]# iperf -s -u -B 239.192.104.1 -i 1 ------------------------------------------------------------ Server listening on UDP port 5001 Binding to local address 239.192.104.1 Joining multicast group 239.192.104.1 Receiving 1470 byte datagrams UDP buffer size: 126 KByte (default) ------------------------------------------------------------Now I generate multicast traffic from vserver1.
[root@vserver1 ~]# iperf -c 239.192.104.1 -u -T 32 -t 3 -i 1 ------------------------------------------------------------ Client connecting to 239.192.104.1, UDP port 5001 Sending 1470 byte datagrams Setting multicast TTL to 32 UDP buffer size: 126 KByte (default) ------------------------------------------------------------ [ 3] local 192.168.1.129 port 48770 connected with 239.192.104.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 129 KBytes 1.06 Mbits/sec [ ID] Interval Transfer Bandwidth [ 3] 1.0- 2.0 sec 128 KBytes 1.05 Mbits/sec [ ID] Interval Transfer Bandwidth [ 3] 2.0- 3.0 sec 128 KBytes 1.05 Mbits/sec [ ID] Interval Transfer Bandwidth [ 3] 0.0- 3.0 sec 386 KBytes 1.05 Mbits/sec [ 3] Sent 269 datagrams [root@vserver1 ~]#Here is what vserver0 sees:
------------------------------------------------------------ [ 3] local 239.192.104.1 port 5001 connected with 192.168.1.129 port 48770 [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0.0- 1.0 sec 134 KBytes 1.09 Mbits/sec 1.109 ms 0/ 93 (0%) [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 1.0- 2.0 sec 128 KBytes 1.05 Mbits/sec 0.136 ms 0/ 89 (0%) [ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams [ 3] 0.0- 3.0 sec 386 KBytes 1.07 Mbits/sec 0.957 ms 0/ 269 (0%)The traffic looks like this:
[root@vserver0 ~]# tcpdump -n -i eth0 -s 1515 udp | grep 239.192.104.1 > muticast.txt tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 1515 bytes 343 packets captured 343 packets received by filter 0 packets dropped by kernel [root@vserver0 ~]# [root@vserver0 ~]# wc -l muticast.txt 283 muticast.txt [root@vserver0 ~]# head -5 muticast.txt 15:09:23.364851 IP 192.168.1.129.5149 > 239.192.104.1.netsupport: UDP, length 118 15:09:23.760086 IP 192.168.1.129.5149 > 239.192.104.1.netsupport: UDP, length 118 15:09:24.156427 IP 192.168.1.129.5149 > 239.192.104.1.netsupport: UDP, length 118 15:09:24.555871 IP 192.168.1.129.5149 > 239.192.104.1.netsupport: UDP, length 118 15:09:24.956558 IP 192.168.1.129.5149 > 239.192.104.1.netsupport: UDP, length 118 [root@vserver0 ~]#So, I seem to have generated multicast traffic and ensured that a member of the multicast group actually received it.
According to RedHat (http://sources.redhat.com/cluster/doc/usage.txt): "CMAN can be configured to use multicast instead of broadcast (broadcast is used by default if no multicast parameters are given)." They also go on to say that you must enable it in /etc/cluster/cluster.conf:
<cman> <multicast addr="224.0.0.1"/> </cman> <clusternode name="nd1"> <multicast addr="224.0.0.1" interface="eth0"/> </clusternode>
SELinux: the basic idea and how to debug
Update: 11/24/10
Correction: A lot of the time restorecon won't fix it. Instead you'll need to change it with chcon. A typical example is when SELinux and Apache have a disagreement because you want to serve from a directory besides /var/www/, say /mnt/webapps/subversion for example. Even if you configure your httpd.conf to tell it that /mnt/webapps/subversion is a valid web directory it doesn't seem to work for permissions related reasons. The way to check this is to run:
ls -Z /var/www/html/ ls -Z /mnt/webapps/subversionand see if you notice a difference. One difference that might been seen is the following:
unconfined_u:object_r:httpd_sys_content_t:s0 unconfined_u:object_r:mnt_t:s0An easy way to fix this with chcon is:
chcon -R -t httpd_sys_content_t /mnt/webapps/subversion/
Thursday, May 27, 2010
yum local install
yum localinstall --nogpgcheck file1.rpm file2.rpm ...
Monday, May 24, 2010
Thursday, May 13, 2010
CREDO and Hero
Friday, May 7, 2010
Wednesday, May 5, 2010
XIV resiliency under some load
We used a development server called physics0, which is a Dell M610 with a QLogic HBA running RHEL5.5. It was connected to a six-node XIV 2814 with 1T disks.
We started running four dd's:
$ ps axu | grep dd root 12994 23.9 0.0 63156 580 pts/0 R 11:13 4:22 dd if /dev/mapper/mpath1p1 of /dev/null root 12995 23.8 0.0 63156 572 pts/0 D 11:13 4:20 dd if /dev/mapper/mpath1p1 of /dev/null root 12996 24.0 0.0 63156 572 pts/0 R 11:13 4:23 dd if /dev/mapper/mpath1p1 of /dev/null root 12997 23.7 0.0 63156 576 pts/0 R 11:13 4:19 dd if /dev/mapper/mpath1p1 of /dev/null $top looked like this:
top - 11:43:35 up 4 days, 18:45, 3 users, load average: 4.00, 4.00, 3.40 Tasks: 214 total, 3 running, 211 sleeping, 0 stopped, 0 zombie Cpu0 : 0.4%us, 4.0%sy, 0.0%ni, 80.8%id, 14.8%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 10.6%sy, 0.0%ni, 53.1%id, 36.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 12.0%sy, 0.0%ni, 49.1%id, 38.9%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 1.0%sy, 0.0%ni, 95.6%id, 3.2%wa, 0.2%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 1.0%sy, 0.0%ni, 95.6%id, 3.4%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.2%us, 26.4%sy, 0.0%ni, 7.8%id, 49.6%wa, 1.4%hi, 14.6%si, 0.0%st Cpu6 : 0.0%us, 26.9%sy, 0.0%ni, 7.6%id, 50.3%wa, 1.0%hi, 14.2%si, 0.0%st Cpu7 : 0.0%us, 3.2%sy, 0.0%ni, 95.6%id, 1.2%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8177436k total, 8130020k used, 47416k free, 7084504k buffers Swap: 2096472k total, 220k used, 2096252k free, 36764k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12996 root 18 0 63156 572 476 D 26.0 0.0 7:00.78 dd 12995 root 18 0 63156 572 476 D 22.8 0.0 6:57.09 dd 12997 root 18 0 63156 576 476 D 22.6 0.0 6:55.44 dd 12994 root 18 0 63156 580 476 R 22.2 0.0 6:58.79 dd 363 root 10 -5 0 0 0 R 2.8 0.0 1:17.65 kswapd0 4134 root 18 0 10228 740 628 S 0.2 0.0 0:18.81 hald-addon-storFirst we removed a disk. You can see the dip in IO and quick recovery: The graph above was made with the xivtop program (control click IOPS, Latency, BW to see all three at the same time).
We saw a graphical representation of the disk turning red on the XIV GUI and were then able to test it, add it, watch it rebuild. As it rebuilt we could get an update on the status with the xcli:
> > monitor_redist Type Initial Capacity to Copy (GB) Capacity Remaining to Copy (GB) -1210286092one Time Started Estimated Time to Finish Time Elapsed Redistribution 26 17 33 2010-05-05 11:25:31 0:04:08 0:02:02 > >We then had the Cisco MDS9124 disable access to Module4 port 2 on the XIV. There was 100% a drop in IO until the retry interval ran out and it switched to an alternate path. IOs were queued until it passed them down a different path. The queue held it and I expect that it will hold it but we want to lower the retry interval so that it doesn't queue it as long. You can see the dip in IO on the graph. It queued it for 45 seconds: We also observed that the event log saw the available paths lesson:
> > event_list ... 2010-05-05 12:09:27 Informational HOST_MULTIPATH_OK Host 'nfile0' has redundant connections to the system. #paths=3 2010-05-05 12:12:15 Informational HOST_NO_MULTIPATH_ONLY_ONE_PORT Host 'physics0' is connected to the system through only one of its ports. #paths=2 2010-05-05 12:13:13 Informational HOST_MULTIPATH_OK Host 'physics0' has redundant connections to the system. #paths=4 > >Update: see Qlogic retry interval for how to implement instantaneous fail over as opposed to queuing for 45 seconds.
Unplugging the module was the most satisfying. I walked right up behind the unit and pulled both power cords out. The interruption in IO dip was longer as a result of loosing the module. Also IOPs averaged out not at 14,500 IOPs but 10,000 IOPs: The event log showed the unplugged module:
> > event_list ... 2010-05-05 12:46:14 Critical MODULE_FAILED 1:Module:4 failed. 2010-05-05 12:46:17 Informational DATA_REBUILD_STARTED Rebuild process started because system data is not protected. 1726767077764f the data must be rebuilt. 2010-05-05 12:46:20 Informational HOST_NO_MULTIPATH_ONLY_ONE_MODULE Host 'physics0' is connected to the system through only one interface module. #paths=2 2010-05-05 12:49:49 Informational DATA_REBUILD_COMPLETED Rebuild process completed. System data is now protected. 2010-05-05 13:16:37 Informational DISK_FINISHED_PHASEIN System finished phasing in 1:Disk:4:1. 2010-05-05 13:16:37 Informational DISK_FINISHED_PHASEIN System finished phasing in 1:Disk:4:2. ... 2010-05-05 13:16:38 Informational DISK_FINISHED_PHASEIN System finished phasing in 1:Disk:4:12. > >We can also observe what's "not ok" while the module is unblocked:
> > component_list filter=notok Component ID Status Currently Functioning 1:Disk:4:1 Failed no 1:Disk:4:2 Failed no 1:Disk:4:3 Failed no 1:Disk:4:4 Failed no 1:Disk:4:5 Failed no 1:Disk:4:6 Failed no 1:Disk:4:8 Failed no 1:Disk:4:9 Failed no 1:Disk:4:10 Failed no 1:Disk:4:11 Failed no 1:Disk:4:12 Failed no 1:Module:4 Initializing yes 1:Data:4 Failed no 1:Interface:4 Failed no 1:Remote:4 Failed no 1:Interface:6 Ready yes 1:Remote:6 Ready yes 1:Ethernet_Cable:4:1 Failed yes 1:Ethernet_Cable:4:2 Failed yes 1:Ethernet_Cable:4:7 Failed yes 1:Ethernet_Cable:4:8 Failed yes 1:Ethernet_Cable:4:9 Failed yes 1:Ethernet_Cable:4:10 Failed yes 1:Disk:4:7 Failed no > >and being added back in:
> > component_list filter=notok Component ID Status Currently Functioning 1:Interface:6 Ready yes 1:Remote:6 Ready yes 1:Disk:4:1 Phasing In yes 1:Disk:4:2 Phasing In yes 1:Disk:4:3 Phasing In yes 1:Disk:4:4 Phasing In yes 1:Disk:4:5 Phasing In yes 1:Disk:4:6 Phasing In yes 1:Disk:4:7 Phasing In yes 1:Disk:4:8 Phasing In yes 1:Disk:4:9 Phasing In yes 1:Disk:4:10 Phasing In yes 1:Disk:4:11 Phasing In yes 1:Disk:4:12 Phasing In yes > >It's good to know it can take a beating before being sent into production.
Sunday, May 2, 2010
The XIV is here...
Friday, April 30, 2010
Wednesday, April 28, 2010
SSD Experiment
- 160G SSD:
* 96G system drive with the following partitions:/ 10G /home 10G /tmp 5G /usr 10G /var 61G
* 64G of cache/swap: I would like to use the extra speed for cache, not storage. Ideas:- Swap: The simplest thing is to swapon to it and then run a lot of VMs under KVM and see if performance for the VM is tolerable. If it is then its much cheaper than buying more RAM.
- Disk Cache: If I were running ZFS I'd add the SSDs to the L2ARC on-disk cache.
- Other types of Cache: Lots of ideas on this topic recently posted on slashdot.
- 1T SATA:
I would then use the remaining 1T for directories under /home and /var to store VMs (as raw LVs) as well media. I'd also keep an extra 20G of the SATA space free to play with new file systems.
postmortem of the apache.org crack
Sunday, April 11, 2010
tiered ram
Disclaimer: I have not tried this.
Thursday, April 8, 2010
Gratuitous ARPs whenever you need them
/sbin/arping -s 192.168.6.212 -I eth0 192.168.6.1From there it was just a matter of using a bash loop:
for x in `ifconfig | grep 192 | awk {'print $2'} | sed s/addr://g`; do /sbin/arping -c 1 -s $x -I eth0 192.168.6.1 ; doneI'm lucky my co-workers were able to help me understand this problem.
shaping like behavior while doing an scp
$ scp drupal1* user@host:/var/foo/ drupal1_data.img.gz 2% 3122MB 33.6MB/s ... drupal1_data.img.gz 20% 294MB 0.0KB/s - stalled -Write failed: Broken pipe lost connection $The ascii illustration above isn't a direct paste but should convey what happened. I watched 33.6MB/s show down to a complete stop. What's odd is that I was then able to do the scp again and maintain 33.6MB/s and get the scp done. However, I also saw the issue above now and then intermittently. I was going from Fedora 12 to RHEL5.5. I'm curious and want to look into this more. Someone suggested I would find relative information at High Performance SSH/SCP - HPN-SSH
Update Dec 2011::
A solution to the above is posted at linuxsecure.blogspot.com. It comes down to adding:
net.ipv4.tcp_sack = 0to /etc/sysctl.conf and then enacting it with "sysctl -p".
Monday, April 5, 2010
Python SSH with Paramiko quickly
yum install python-paramikoAssumes you are using RSA Public Key Authentication:
#!/usr/bin/python import paramiko, os, getpass # Variables username = 'you' host = 'your.server.com' port = 22 key = '~/.ssh/id_rsa' msg = "Enter passphrase for key '" + key + "': " private_key_pass = getpass.getpass(prompt=msg) private_key_file = os.path.expanduser(key) # Connctions pkey = paramiko.RSAKey.from_private_key_file(private_key_file,\ private_key_pass) transport = paramiko.Transport((host, port)) transport.connect(username = username, pkey = pkey) # Commands cmds = ['ls', 'ls /foo'] for cmd in cmds: channel = transport.open_session() channel.exec_command(cmd) output = channel.makefile('rb', -1).readlines() if output: print "Success:" print output else: print "Error:" print channel.makefile_stderr('rb', -1).readlines() transport.close()
Wednesday, March 31, 2010
Eben Moglen - Freedom in the Cloud
Saturday, March 27, 2010
Python: MySQL & SSH
yum install MySQL-python yum install python-twisted python-twisted-conchI then had what I needed to get Python SSH'ing into a server and talking to MySQL. Some examples:
Monday, February 22, 2010
Thursday, February 11, 2010
Thursday, January 7, 2010
automatic stand up desk
Wednesday, January 6, 2010
svn move
svnlook: DB_VERSION_MISMATCH: Database environment version mismatchThe fix required some processing of the repository before it was moved which SVN's FAQ explained. I also upgraded WebSVN from 2.0 to 2.3.