September 2006


Clyde Development Corp and Programming18 Sep 2006 08:29 pm

I thought this might come in handy while working with large sets of data as I do. It’s a simple shell script that takes a text file as an argument and returns each unique ‘word’ and the total occurences of that word(‘frequency’) on individual lines.
Remember to chmod +x or it won’t run.


[pound][bang][slash]bin[slash]bash
# wf: Crude word frequency analysis on a text file.
# Check for input file on command line.
ARGS=1
E_BADARGS=65
E_NOFILE=66
if [ $# -ne "$ARGS" ]
# Correct number of arguments passed to script?
then
echo "Usage: `basename $0` filename"
exit $E_BADARGS
fi
if [ ! -f "$1" ] # Check if file exists.
then
echo "File \"$1\" does not exist."
exit $E_NOFILE
fi
cat "$1" | xargs -n1 | \
# List the file, one word per line.
tr A-Z a-z | \
# Shift characters to lowercase.
sed -e 's/\.//g' -e 's/\,//g' -e 's/ /\
/g' | \
# Filter out periods and commas, and
#+ change space between words to linefeed,
sort | uniq -c | sort -nr
# Finally prefix occurrence count and sort numerically.
exit 0

My Site17 Sep 2006 01:41 pm

Watch this clip and decide if the runner(me) is out or safe. I was called out, but I’m positive I was safe. Just thought I’d get some outside feedback, so tell me what you think.
Thanks.