Friday, October 22, 2010

regex

I had an interesting regular expression problem this week.

The problem we are trying to solve is simple: to find out the root directory for a software repository.

The assumption of the problem is that all the repository contains a directory like "foo" or "bbb". So, given a full path, we need to find out the parent directory of "foo", or "bbb". For example, the given directory is "/home/myhome/test/foo/dir1/deb", so it should return "/home/myhome/test" as the root directory of the repository. Another example, "/home/myhome/ttt/bbb/test", should return "/home/myhome/ttt".

Our original regex to match the root directory is pretty simple, like

/(.*)\/foo/

/(.*)\/bbb/

we match the input twice. The first one first and then the second one.

However, we are getting error in a case like "/home/myhome/football/bbb". It returns "/home/myhome", while "/home/myhome/football" should be returned.

Then I tried to use /(.*)\/foo\// to match the path. This new one works for the a.m. test cases, but will fail when the input is "/home/myhome/football/foo".

My final solution is simple, just append a "/" at the end of the input and use /(.*)\/foo\// to match the path. After that, just remove the trailing "/".

The solution itself is not hard. But it took me sometime to come up with the solution, as I didn't think of changing the input a little bit for a match. I could use more condition statements to check the input. However, the final regex is simpler and beautiful.

Sometimes, you just have to jump out of the regular solution. The beauty and simplicity of the code should be our goal.