Sunday, November 30, 2008
Thursday, November 20, 2008
Friday, November 14, 2008
Take that spammers
Monday, November 3, 2008
OpenSSL CLI
Sunday, November 2, 2008
multicore in python as a scripting language
Use Python threads if you need to run IO operations in parallel. Do not use Python threads if you need to run computations in parallel. [Use mpi4py for parallel computations.]
So if you're using Python as a nicer-than-shell scripting language to trigger CPU intensive processes that the OS could easily pass to different cores via two terminals or just '&', what do you use? Threads.Inspired by a basic threading tutorial I whipped up a quick script to give a multicore *nix box a quick workout: search the hard drive for random strings via the find command. This is IO intensive as said on Python-list. It's also CPU intensive but in this case Python's interface with the CPU intensive process is to wait while os.popen() does the work, i.e. wait for IO. Thus, simple threads work well for this.
The basic tutorial's first two examples include simple threads and then locks. The locks are there to keep the threads synchronized so that they wait for each other. This is good if each is sharing a resource like writing to the same file but is the opposite of what I'm after in this case. I've combined both examples to make it easy to flip back and forth between them. Running top in another terminal let's you see how the other cores are being used. As a final step I remove the sleep calls from the simple threads.
#!/usr/bin/env python import time import thread import os def simple_function(string,sleeptime,junk,*args): while 1: print string # not sleeping, full attack mode # time.sleep(sleeptime) cmd = "find / -name " + junk output = os.popen(cmd).read() print output def lock_function(string,sleeptime,junk,lock,*args): while 1: # entering critical lock.acquire() print string," Now Sleeping after Lock acquired for ",sleeptime time.sleep(sleeptime) cmd = "find / -name " + junk output = os.popen(cmd).read() print output print string," Now releasing lock and then sleeping again" lock.release() #exiting critical time.sleep(sleeptime) if __name__=="__main__": no_multi_core = 0 # have 8 corese, so false if (no_multi_core): lock=thread.allocate_lock() thread.start_new_thread(lock_function,("Thread No:1",2,"foo",lock)) thread.start_new_thread(lock_function,("Thread No:2",2,"bar",lock)) thread.start_new_thread(lock_function,("Thread No:3",3,"bar",lock)) thread.start_new_thread(lock_function,("Thread No:4",4,"baz",lock)) thread.start_new_thread(lock_function,("Thread No:5",5,"qux",lock)) thread.start_new_thread(lock_function,("Thread No:6",5,"quux",lock)) thread.start_new_thread(lock_function,("Thread No:7",5,"corge",lock)) thread.start_new_thread(lock_function,("Thread No:8",5,"grault",lock)) else: # imporove w/ loop thread.start_new_thread(simple_function,("Thread No:1",2,"foo")) thread.start_new_thread(simple_function,("Thread No:2",2,"bar")) thread.start_new_thread(simple_function,("Thread No:3",3,"bar")) thread.start_new_thread(simple_function,("Thread No:4",4,"baz")) thread.start_new_thread(simple_function,("Thread No:5",5,"qux")) thread.start_new_thread(simple_function,("Thread No:6",5,"quux")) thread.start_new_thread(simple_function,("Thread No:7",5,"corge")) thread.start_new_thread(simple_function,("Thread No:8",5,"grault")) while 1:passHere's a screen cap of top with eight find commands spawned immediately with no_multi_core set to false. This variable name is probably a little chauvinistic but I've emphasized the example for my original goal: use Python to script lots of IO work and use many cores.
Update
I've updated this to optimally run for desired_thread_count users at a time from a file. I originally had a scheduling bug whose fix was to put the actual_thread_count increment in main as opposed to the function call; my original code. Tuns out that more than desired_thread_count threads were running. They were spawned before any thread could update the variable to keep future spawn calls from sleeping.
#!/usr/bin/env python import time import thread import os # global variables the_file = "users.txt" desired_thread_count = 8 actual_thread_count = 0 # ------------------------------------------------------- def find(lock,count,user): global desired_thread_count global actual_thread_count # do the work print "T:" + str(count) + " of " + str(actual_thread_count) + " " + user cmd = "find / -name " + user + " 2> /dev/null" os.popen(cmd) lock.acquire() # we're done, let someone else go actual_thread_count = actual_thread_count - 1 lock.release() # ------------------------------------------------------- if __name__=="__main__": i = 0 lock = thread.allocate_lock() try: f = open(the_file) try: for line in f: user = line.strip().split(',')[0] i += 1 while (actual_thread_count >= desired_thread_count): time.sleep(2) # wait to run, if necessary lock.acquire() # one update at a time actual_thread_count = actual_thread_count + 1 lock.release() thread.start_new_thread(find,(lock,i,user)) finally: while (actual_thread_count > 0): time.sleep(2) # wait for threads to finish f.close() except IOError: sys.exit("Unable to open the file " + the_file)
Update 2: Note the addition above of an extra loop after the finally. This keeps the program alive until the threads finish. Without this the program would end while the last threads are still running and they wouldn't finish their computations.