20+ years later the Unix shell is still the fastest way to get work done on a bunch of files. I'm still regularly combining grep, awk, sort, uniq, etc to do analysis on data.

One common task is doing work for every line of a file.

for f in `cat list`; do
  ls -l "$f"
done

There are a lot of reasons this idiom is broken. The worst problem is that it doesn't work if the lines in the file list have spaces in them and no amount of quoting will fix it. Also if list is large (32k?) it fails because everything's expanded in the limited command line buffer.

The idiom works often enough that I use it all the time. And when I do I have a problem, I'm always left scratching my head to remember the right way. Well, here it is (in bash):

cat list | while read f; do
  ls -l "$f"
done

The read command in bash is a magic builtin. It reads a line from stdin and assigns its contents to shell variables. It also has a return code when EOF is reached, allowing a clean exit from a loop.

read has a lot of options for how it handles the file input. I'm a bit confused that the above sample works, actually, because the bash docs suggest that each line is parsed via IFS and only the first word assigned to the variable f. But in practice that only seems to happen if you have more than one variable. See the docs for options for line delimiters, assigning to an array, backslash handling, etc.

tech
  2008-03-16 15:39 Z