When is a bash script too big?

Tuesday, March 9th, 2010

I quite like writing shell scripts, I’ve been doing it for years and often when I start to address an automation question I naturally start to think in terms of “automating the command-lines”.

However, some problems very rapidly run out of bash’s ability (or often, out of its readability). Bash and the associated tools I often use (grep, cut, sed, awk, sort and so on) are very line-oriented; it’s difficult to get them to make decisions based on any part of the input except the current line and any summary history you’re building up.

So, one reason to switch to a different language is where you need to grok a whole set of structured files in one go. For this I often go to perl, and read files directly into a hash of hashes (of hashes of …). Then I can walk through that structure in any order, rather than having to touch the input files again. It may eat memory, but these are only scripts we’re talking about, not programs.

Another reason to switch is less obvious, but as soon as you try to make your bash script look like it’s written in some “proper” functional language, you’re into a losing battle. For me, this generally happens around the 250+ lines mark; if the script is that long, it’s going to have a lot of structure. Bash can do functions, but it’s difficult trying to make variables act like they’re scoping properly, and as for passing parameters that might end up being executed … well, that becomes painful. You have to construct the original calling parameters with levels of quoting that explicitly need to know how much dereferencing they’re going to encounter.

An example; I have a script here that makes a lot of changes to important files, so I thought it would be a good idea to be able to log each command invocation. So in the main code, instead of saying “rm file“, I’d say “log-cmd rm file“. The log-cmd function could keep track of verboseness … and the associated debugmsg function would only produce output if a debug setting were true, too …

debugmsg() {
    if $DEBUG
    then printf -- "$*\n" >&2
    fi
}

log-cmd() {
    if $VERBOSE
    then
        printf "log-cmd executing $*\n" >&2
    fi
    if $NOEXEC
    then
        debugmsg "Execution of '$*' blocked by no-exec flag -n"
    else
        debugmsg "$($*)"
    fi
}

All that looks pretty reasonable, and indeed it works just fine with a simple invocation such as log-cmd ls -l … but when you start to want to redirect that output somewhere you’re fighting hard against bash …

log-cmd ls -l > outputfile creates outputfile OK … but it’s empty, because the redirection was interpreted by bash as being separate to the call to log-cmd. And log-cmd doesn’t return any data …

log-cmd "ls -l > output" just breaks hard, with ls complaining that there’s no file called “>” … because bash is now not helping to interpret the command-line.

At this point, stop trying to fight against bash, go and grab perl, python or perhaps ruby, and do the job properly! Because if you want this sort of feature, you probably need others … and while bash is capable of getting the job done, there are far too many mistakes to be made on the way.