The while-read loop controversy…

For about as long as I’ve been able to spell “bash”, I’ve seen the debates on the ‘Net about the proper way to use the shell to loop through a text file line-by-line (rather than item-by-item).

Of the small handful of common methods, it often comes down to this:

Using Cat
With this method, you call cat to pipe the contents of the file into the while loop.

cat $inputFile | while read loopLine
do
  (some stuff)
done

Using Redirection
With this method, you are redirecting the file into the loop, as indicated by the redirection arrow to the done statement in last line.

while read loopLine
do
  (some stuff)
done < $inputFile

So Which Way?
Once you talk the purists down off the ledge about you not using the IFS variable, and they get over the fact that you aren’t awk in the first place, you can move on to the discussion regarding which while read approach you’ll use; since most people do it that way anyway, and it’s easier to understand.  There.  I said it.

Between the two methods I describe above, it’s often said that the cat method is “wasteful”, but easier to understand for the person who comes after you. This is apparently because you see right away — as you read through the code in order — the thing that is getting passed into the loop, rather than having wonder or look for it.

Conversely, the redirected method is much more efficient (since you aren’t executing cat), but someone might not easily understand how it’s happening since they have to scroll down to see the input file , or may not understand what is being looped.

Both points are kinda’ true.  But when it comes down to it, the redirected method is just not that hard to understand, and I almost always use  it…

Sometimes, Joel…
…except in on situation, which is why I write this post; and this is almost always never mentioned in the argue-posts I read on this:  What if you need to manipulate the content *before* is gets parsed by the while read loop?  For instance, backslashes in the line, newlines in the wrong place, etc., read in from a group of files.

Take this example; I have a few files with a path in one of the fields that is to be parsed, like this:

servername volumename folder1\folder2

…and I want to read in the contents of all the few files into the one loop with an ls and a wildcard.

Using the standard redirection method in this case, the backslash is interpreted as an “escape”, and is parsed and dropped.  Since it’s a filesystem path, I obviously need that backslash; so the way I solved this situation was to use cat, and for each time I encountered the backslash, pass it to sed to add a second backslash as an escape before it gets parsed by the shell while read loop.

Here’s how I did that:

for inputfile in `ls $fewFiles`
do
  cat $inputfile | sed -e s/'\\'/'\\\\'/g | while read loopLine
  do
    # Then I grab the foldername from the short parent path
    item=`echo $line |awk -F '\' '{ print $2 }'`
    (some stuff)
  done
done

Of course, you notice that the backslash even has to be escaped in the sed command, as with the double-backslash that I use to replace it…

I know, I know what you’re thinking…  Just use Perl…
😉

2 Comments

  1. philip

    Hai Jeremy,

    I think you might want to correct the last line of your first example and remove the redirect.

    Regards,

    Philip.

  2. Jeremy Pavlov

    @ philip

    Of course you were right… Great catch!
    I’m a bit embarrassed; just a cut/paste error.
    Fixed…

    Thanks!
    -Jeremy

Leave a Comment

Your email address will not be published. Required fields are marked *