Friday, February 27, 2009

grep usage

grep usage
------------------
Here is an example shell command that invokes GNU grep:

grep -i 'hello.*world' hello.h hello.c

This lists all lines in the files `hello.h' and `hello.c' that contain the string `hello' followed by the string `world'; this is because `.*' matches zero or more characters within a line. See section 5. Regular Expressions. The `-i' option causes grep to ignore case, causing it to match the line `Hello, world!', which it would not otherwise match. Invoking grep, for more details about how to invoke grep.

Here are some common questions and answers about grep usage.

How can I list just the names of matching files?

grep -l 'main' *.c

lists the names of all C files in the current directory whose contents mention `main'. How do I search directories recursively?

grep -r 'hello' /home/test

searches for `hello' in all files under the directory `/home/test'. For more control of which files are searched, use find, grep and xargs. For example, the following command searches only C files:

find /home/test -name '*.c' -print | xargs grep 'hello' /dev/null

This differs from the command:

grep -r 'hello' *.c

which merely looks for `hello' in all files in the current directory whose names end in `.c'. Here the `-r' is probably unnecessary, as recursion occurs only in the unlikely event that one of `.c' files is a directory. What if a pattern has a leading `-'?


grep -e '--cut here--' *

searches for all lines matching `--cut here--'. Without `-e', grep would attempt to parse `--cut here--' as a list of options. Suppose I want to search for a whole word, not a part of a word?

grep -w 'hello' *

searches only for instances of `hello' that are entire words; it does not match `Othello'. For more control, use `<' and `>' to match the start and end of words. For example:

grep 'hello>' *

searches only for words ending in `hello', so it matches the word `Othello'. How do I output context around the matching lines?


grep -C 2 'hello' *

prints two lines of context around each matching line. How do I force grep to print the name of the file?

Append `/dev/null':
grep 'eli' /etc/passwd /dev/null

gets you:
/etc/passwd:bill:TWEFWEG.IMe.:98:11:Bill Gates:/home/do/bgates:/bin/bash Why do people use strange regular expressions on ps output?

ps -ef | grep '[c]ron'

If the pattern had been written without the square brackets, it would have matched not only the ps output line for cron, but also the ps output line for grep. Note that some platforms ps limit the ouput to the width of the screen, grep does not have any limit on the length of a line except the available memory. Why does grep report "Binary file matches"?
If grep listed all matching "lines" from a binary file, it would probably generate output that is not useful, and it might even muck up your display. So GNU grep suppresses output from files that appear to be binary files. To force GNU grep to output lines even from files that appear to be binary, use the `-a' or `--binary-files=text' option. To eliminate the "Binary file matches" messages, use the `-I' or `--binary-files=without-match' option. Why doesn't `grep -lv' print nonmatching file names?
`grep -lv' lists the names of all files containing one or more lines that do not match. To list the names of all files that contain no matching lines, use the `-L' or `--files-without-match' option.
I can do OR with `|', but what about AND?

grep 'paul' /etc/motd | grep 'franc,ois'

finds all lines that contain both `paul' and `franc,ois'. How can I search in both standard input and in files?
Use the special file name `-':

cat /etc/passwd | grep 'alain' - /etc/motd

How to express palindromes in a regular expression? It can be done by using the back referecences, for example a palindrome of 4 chararcters can be written in BRE.

grep -w -e '(.)(.).21' file

It matches the word "radar" or "civic".
Guglielmo Bondioni proposed a single RE that finds all the palindromes up to 19 characters long.


egrep -e '^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?).?987654321$' file

Note this is done by using GNU ERE extensions, it might not be portable on other greps. Why are my expressions whith the vertical bar fail?


/bin/echo "ba" | egrep '(a)1|(b)1'

The first alternate branch fails then the first group was not in the match this will make the second alternate branch fails. For example, "aaba" will match, the first group participate in the match and can be reuse in the second branch. What do grep, fgrep, egrep stand for ?
grep comes from the way line editing was done on Unix. For example, ed uses this syntax to print a list of matching lines on the screen.


global/regular expression/print
g/re/p

fgrep stands for Fixed grep, egrep Extended grep.

No comments: