For about as long as I’ve been able to spell “bash”, I’ve seen the debates on the ‘Net about the proper way to use the shell to loop through a text file line-by-line (rather than item-by-item).
Of the small handful of common methods, it often comes down to this:
Using Cat
With this method, you call cat
to pipe the contents of the file into the while
loop.
cat $inputFile | while read loopLine do (some stuff) done
Using Redirection
With this method, you are redirecting the file into the loop, as indicated by the redirection arrow to the done
statement in last line.
while read loopLine do (some stuff) done < $inputFile
So Which Way?
Once you talk the purists down off the ledge about you not using the IFS variable, and they get over the fact that you aren’t awk
in the first place, you can move on to the discussion regarding which while read
approach you’ll use; since most people do it that way anyway, and it’s easier to understand. There. I said it.
Between the two methods I describe above, it’s often said that the cat
method is “wasteful”, but easier to understand for the person who comes after you. This is apparently because you see right away — as you read through the code in order — the thing that is getting passed into the loop, rather than having wonder or look for it.
Conversely, the redirected method is much more efficient (since you aren’t executing cat
), but someone might not easily understand how it’s happening since they have to scroll down to see the input file , or may not understand what is being looped.
Both points are kinda’ true. But when it comes down to it, the redirected method is just not that hard to understand, and I almost always use it…
Sometimes, Joel…
…except in on situation, which is why I write this post; and this is almost always never mentioned in the argue-posts I read on this: What if you need to manipulate the content *before* is gets parsed by the while read loop? For instance, backslashes in the line, newlines in the wrong place, etc., read in from a group of files.
Take this example; I have a few files with a path in one of the fields that is to be parsed, like this:
servername volumename folder1\folder2
…and I want to read in the contents of all the few files into the one loop with an ls
and a wildcard.
Using the standard redirection method in this case, the backslash is interpreted as an “escape”, and is parsed and dropped. Since it’s a filesystem path, I obviously need that backslash; so the way I solved this situation was to use cat, and for each time I encountered the backslash, pass it to sed
to add a second backslash as an escape before it gets parsed by the shell while read
loop.
Here’s how I did that:
for inputfile in `ls $fewFiles` do cat $inputfile | sed -e s/'\\'/'\\\\'/g | while read loopLine do # Then I grab the foldername from the short parent path item=`echo $line |awk -F '\' '{ print $2 }'` (some stuff) done done
Of course, you notice that the backslash even has to be escaped in the sed
command, as with the double-backslash that I use to replace it…
I know, I know what you’re thinking… Just use Perl…
😉
Hai Jeremy,
I think you might want to correct the last line of your first example and remove the redirect.
Regards,
Philip.
@ philip
Of course you were right… Great catch!
I’m a bit embarrassed; just a cut/paste error.
Fixed…
Thanks!
-Jeremy