Sunday, November 4, 2007

down with regex

I wrote a quick python script to parse CSV files and add them to zone files. In general I think I'm going to try to write more robust tools that I can reuse instead of quick little programs. A friend of mine looked at the code and offered some advice. One thing I realize is that I turn to regular expressions too much.

Regular expressions tend to be overkill especially for simple things. User input should almost never be turned into a regex. A lot of string operations can be effectively resolved more simply. Look at the string and try to make some rule based on index math and substrings.

I wanted to know if a string ended with a substring. I tried this:

host_re = re.compile('\.domain\.tld$')
if (host_re.search(host)): 
  # do something
Instead we started up with:
def ends_with(x, y):
    return len(x) == x.rfind(y) + len(y)
If y is found within x we get the index, or location, where it was found. We add the index to the length of y and this value must equal the length of x. This is better because a regex tends to introduce complications. Here's a variation of the above which covers if the other string is longer:
pos = host.rfind(zone_line)
if (pos > -1 and len(host) == pos + len(zone_line)):
   # do something

No comments: