Perl fun
August 3rd 2008 09:11 pm
So in the course of developing the next code feature I plan to add to kdesvn-build (nothing major, just adding a persistent data store) I came across what I consider an oddity:
Imagine that we had a hash table, mapping module names to the count of consecutive build failures. Now let’s say we want to pare down this hash table to a list of modules with more than a given number of consecutive build failures (3 as an example). This could be done using Perl’s grep function to strip out list entries that don’t match a given criteria (in this case, hash keys whose associated failure count is not >3).
Now I didn’t feel like making this test inline (although it would not have changed the end result) so let’s assume that I put my comparison function separately in an anonymous subroutine:
my $matchRoutine = sub {
my $mod = shift;
return (exists $moduleFailures{$mod}) and
($moduleFailures{$mod} > 3);
};
my @moreFailures = grep { &{$matchRoutine}($_) }(@searchList);
Now if we use this code to search for hash table entries with an appropriate number of failures, you’ll find that it instead returns all entries in the list that are present in the %moduleFailures hash table at all!
So what happened? The key is in the and operator. Most Perl programmers know that Perl has two sets of logical operators. C-style (&& || !) and English (and or not). These are described in the perlop Perldoc page. Basically the reason for the two different sets is syntactical convenience (for example being able to do print FILE "text" or die without having to put parentheses around your expression.
The way this bites you here is that Perl evaluates return (foo) and (bar); as (return (foo)) and (bar);. I’m not sure how the parser allows for using the return value of the return statement without so much as a warning but there we have it.
The solution, of course, is either to fully parenthesize your expression, or to use && instead of and in this case.

Another option would be to remove the return:
grep { exists $moduleFailures{$_} and $moduleFailures{$_} > 3 } @searchList;
or even shorter (but this will produce warnings if run with perl -w):
grep { $moduleFailures{$_} > 3 } @searchList;
Gabriel: Your first form is what I’d actually prefer but I try to avoid using constructs that are too Perl-specific (in comparison with C++). (Remember in this case the return is happening from a subroutine, not inline in the grep call where it would be much more understandable).
In my case the second form isn’t an option (there’s actually a couple of layers of hash table, and I want to avoid auto-vivifying the layers in between).