Grep(1): print lines matching pattern

Name

grep,

egrep, fgrep – print lines that match a pattern

Synopsis

grep [OPTIONS] PATTERN [

FILE…] grep [OPTIONS] [-e PATTERN | -f ARCHIVE] [ARCHIVE…]

Description

grep searches the named input file (or

standard input if no files are named, or if a single minus-hyphen (-) is given as the filename) for lines that contain a match to the given pattern. By default, grep prints matching lines.

In addition, two variant programs egrep and fgrep are available. egrep is the same as grep-E. fgrep is the same as grep -F. Direct invocation such as egrep or fgrep is deprecated, but is provided to allow historical applications that depend on them to run without modification.

Options

Generic program information

–help Print

a usage message that briefly summarizes these command-line options and error reporting address, and then exit

. -V, –version Print the grep version number in the standard output stream. This version number must be included in all error reports (see below).

Matcher Selection

-E, –extended-regexp Interprets PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.) -F, -fixed-strings Interprets PATTERN as a list of fixed strings, separated by new lines, any of which must match. (-F is specified by POSIX.) -G, –basic-regexp Interpret PATTERN as a basic regular expression (BRE, see below). This is the default. -P, –perl-regexp Interprets PATTERN as a regular Perl expression. This is highly experimental and grep-P can warn of unimplemented features.

Matching control

–e PATTERN, -regexp=PATTERN Use PATTERN as a pattern. This can be used to specify multiple search patterns or to protect a pattern that begins with a hyphen (–). (-e is specified by POSIX.) -f FILE, -file=FILE Get FILE patterns, one per line. The empty file contains zero patterns and therefore does not match anything. (-f is specified by POSIX.) -i, –ignore-case Ignore case sensitivity in both the PATTERN and the input files. (-i is specified by POSIX.) -v, –invert-match Reverse the matching direction, to select mismatched lines. (-v is specified by POSIX.) -w, –word-regexp Select only those lines that contain matches that form complete words. The proof is that the matching substring must be at the beginning of the line or preceded by a constituent character other than a word. Similarly, it must be at the end of the line or followed by a constituent character other than a word. The constituent characters of the word are letters, digits and the underscore. -x, –line-regexp Select only those matches that exactly match the entire line. (-x is specified by POSIX.) -y

Deprecated synonym of -i.

General output control

-c, -count Suppress normal output; Instead, print a matching line count for each input file. Using the –v, –invert-match option (see below), count the lines that don’t match. (-c is specified by POSIX.) -color[=WHEN], -colour[=WHEN] Surround matching (non-empty) strings, matching lines, context lines, file names, line numbers, byte offsets, and separators (for fields and groups of context lines) with escape sequences to display them in color in the terminal. Colors are defined by the environment variable GREP_COLORS. The deprecated environment variable is GREP_COLOR still supported, but its settings do not take precedence. WHEN is never, always, or automatic. -L, –files-without-match Suppress normal output; Instead, print the name of each input file from which no output would normally have been printed. The scan will stop at the first match. -l, –files-with-matches Suppress normal output; Instead, print the name of each input file from which the output would normally have been printed. The scan will stop at the first match. (-l is specified by POSIX.) -m NUM, –max-count=NUM Stop reading a file after the matching NUM lines. If the entry is a standard entry from a regular file and NUM match lines are generated, grep ensures that the standard entry is placed just after the last matching line before exiting, regardless of the presence of final context lines. This allows a call process to resume a search. When grep stops after the matching NUM lines, it generates any final context lines. When the –c or -count option is also used, grep does not generate a count greater than NUM. When you also use the -v or –invert-match option, grep stops after generating mismatched NUM lines. -or, -only-matching Print only the matching (non-empty) parts of a matching line, with each of those parts on a separate output line. -q, –quiet, -silent Quiet; Do not write anything to the standard output. Exit immediately with the zero state if any match is found, even if an error was detected. See also the -s or –no-messages option. (-q is specified by POSIX.) -s, -no-messages Suppresses error messages about non-existent or unreadable files. Portability note: Unlike GNU grep, 7th Edition Unix grep did not conform to POSIX, because it lacked -q and its -s option behaved like GNU grep‘s -q option. USG -style grep also lacked –q but its -s option behaved like GNU grep. Portable shell scripts should avoid both -q and -s and should redirect standard and error output to /dev/null instead. (-s is specified by POSIX.) Output line

prefix control

-b, –byte-offset Print the byte offset based on 0 within the input file before each output line. If -o (-only-matching) is specified, print the offset of the matching part itself. -H, –with-filename Print the file name for each match. This is the default value when there is more than one file to search. -h, –no-filename Suppress the prefix of the file names in the output. This is the default when there is only one file (or only one standard entry) to search for. -label=LABEL Display the entry that actually comes from the standard input as input from the LABEL file. This is especially useful when implementing tools such as zgrep, for example, gzip -cd foo.gz | grep -label=foo -H something. See also the -H option. -n, –line-number Prefix each output line with the line number based on 1 within its input file. (-n is specified by POSIX.) -T, –initial-tab Make sure that the first character of the actual line content is in a tab stop, so that the alignment of the tabs looks normal. This is useful with options that prefix your output to the actual content: -H, –n, and -b. To improve the likelihood that lines in a single file will all start in the same column, this also causes the line number and byte offset (if present) to be printed at a minimum size field width. -u, –unix-byte-offsets Unix-style byte offset report. This switch causes grep to report byte offsets as if the file were a Unix-style text file, that is, with deleted CR characters. This will produce identical results to running grep on a Unix machine. This option has no effect unless the -b option is also used; has no effect on platforms other than MS-DOS and MS-Windows. -Z, -null Generates a zero byte (the ASCII character NUL) instead of the character that normally follows a file name. For example, grep -lZ generates a zero byte after each file name instead of the usual new line. This option makes the output unambiguous, even in the presence of file names that contain unusual characters such as new lines. This option can be used with commands such as find -print0, perl -0, sort -z, and xargs -0 to process arbitrary file names, even those that contain newline characters.

Context Line Control

-One NUM, –after-context=NUM Print final context NUM lines after matching lines. Places a line that contains a group (–) separator between contiguous groups of matches. With the –o -only-match option, this has no effect and a warning is given. -B NUM, –before-context=NUM Print NUM lines from the initial context before matching lines. Places a line that contains a group (–) separator between contiguous groups of matches. With the –o -only-match option, this has no effect and a warning is given. -C NUM, –NUM, -context=NUM Print NUM lines from the output context. Places a line that contains a group (–) separator between contiguous groups of matches. With the –o -only-match option, this has no effect and a warning is given.

Selecting files and directories

-a, -text Process a binary file as if it were text; This is equivalent to the –binary-files=text option. –binary-files=TYPE If the first few bytes of a file indicate that the file contains binary data, assume that the file is of type TYPE. By default, TYPE is binary, and grep typically generates a one-line message indicating that a binary file matches, or no message at all if there is no match. If TYPE is non-match, grep assumes that a binary file does not match; this is equivalent to the -I option. If TYPE is text, grep processes a binary file as if it were text; This is equivalent to the -a option. Warning: grep -binary-files=text could generate binary garbage, which can have unpleasant side effects if the output is a terminal and if the terminal driver interprets part of it as commands. -D ACTION, -devices=ACTION If an input file is a device, FIFO, or socket, use ACTION to process it. By default, ACTION is read, which means that devices are read as if they were regular files. If ACTION is ignored, devices are silently bypassed. -d ACTION, -directories=ACTION If an input file is a directory, use ACTION to process it. By default, ACTION is read, which means that directories are read as if they were ordinary files. If ACTION is omitted, directories are silently ignored. If ACTION is recurring, grep reads all files under each directory, recursively; This is equivalent to the -R option. -exclude=GLOB Skip files whose base name matches GLOB (by matching wildcards). A filename glob can use *, ?, and [… ] as wildcards and to literally quote a wildcard or backslash. –exclude-from=FILE Ignore files whose base name matches any of the filename globs read from FILE (using wildcard matching as described in -exclude). –exclude-dir=DIR Exclude directories that match the DIR pattern from recursive searches. -Process

a binary file as if it contained no matching data; this is equivalent to the –binary-files=without-match option.

–include=GLOB Look only for files whose base name matches GLOB (using wildcard matching as described in -exclude). -R, –r, –recursive Read all files under each directory, recursively; This is equivalent to the -d recurse option.

Other options

–line-buffered Use line buffering in the output. This can cause a performance penalty. -mmap If possible, use the mmap(2) system call

to read the input, instead of the default read(2) system call. In some situations, -mmap produces better performance. However, -mmap can cause undefined behavior (including kernel dumps) if an input file shrinks while grep is running or if an I/O error occurs.

-U, -binary Treat files as binary. By default, in MS-DOS and MS-Windows, grep guesses the file type by looking at the contents of the first 32 KB read from the file. If grep decides that the file is a text file, it removes the CR characters from the contents of the original file (so that regular expressions with ^ and $ work correctly). Specifying -U overrides this guess, causing all files to be read and passed to the matching mechanism verbatim; If the file is a text file with CR/LF pairs at the end of each line, this will cause some regular expressions to fail. This option has no effect on platforms other than MS-DOS and MS-Windows. -z, –null-data Treat the input as a set of lines, each terminated by a zero byte (the ASCII character NUL) instead of a new line. Like the –Z or –null option, this option can be used with commands such as sort -z to process arbitrary file names.

Regular expressions

A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, by using several operators to combine smaller expressions.

grep understands three different versions of regular expression syntax: “basic,” “extended” and “perl.” In GNU grep, there is no difference in the functionality available between basic and extended syntax. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; The differences for basic regular expressions are summarized below. Perl regular expressions provide additional functionality and are documented in pcresyntax(3) and pcrepattern(3), but may not be available on all systems.

The fundamental building blocks are regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match each other. Any meta-character with special meaning can be cited by preceding it with a backslash.

The dot . matches any individual character.

Character classes and bracket expressions

An expression in square brackets is a list of characters enclosed by [ and ]. Matches any individual character in that list; If the first character in the list is the caret ^ then it matches any character that is not in the list. For example, the regular expression [0123456789] matches any digit.

Within a bracketed expression, a range expression consists of two characters separated by a hyphen. Matches any individual character that you sort between the two characters, inclusive, using the locale and character set collection sequence. For example, in the default locale of C, [a-d] is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is usually not equivalent to [abcd]; it could be equivalent to [aBbCcDd], for example. To get the traditional interpretation of bracketed expressions, you can use the C locale by setting the LC_ALL environment variable to the C value.

Finally, certain named character classes are predefined within bracketed expressions, as follows. Their names are self-explanatory, and are [:alnum:], [:alpha:], [:cntrl:], [:d igit:], [:graph:], [:lower:], [:p rint:], [:p unct:], [:space:], [:upper:], and [:xdigit:]. For example, [[:alnum:]] means [0-9A-Za-z], except that the latter form depends on the C locale and ASCII character encoding, while the former is independent of locale and character set. (Note that the square brackets in these class names are part of the symbolic names and must be included in addition to the brackets that enclose the expression in square brackets.) Most metacharacters lose their special meaning within bracketed expressions. To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal, place it at the end.

Anchor

The caret ^ and the dollar sign $ are metacharacters that match respectively the empty string at the beginning and end of a line.

The backslash character and special expressions

The symbols \< and > respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string on the edge of a word, and B matches the empty string as long as it is not on the edge of a word. The symbol \w is synonymous with [[:alnum:]] and W is synonymous with [^[:alnum:]].

Repetition

A regular expression can be followed by one of several repeat operators: ? The previous item

is optional and matches at most once

The previous item will be matched zero or more times.

The previous item

will be matched one or more

times. {n}

The previous item matches exactly n times.

{n,

}

The previous item matches n or more times.

{,m}

The previous element matches at most m times.

{n,m} The previous element matches at least n times, but not more than m

times.

Concatenation

Two regular expressions can be concatenated;

The resulting regular expression matches any string formed by the concatenation of two substrings that respectively match the concatenated expressions.

alternation

The infix operator can join two regular expressions |; The resulting regular expression matches any string that matches any of the alternate expressions.

precedence

Repetition takes precedence over concatenation, which in turn takes precedence over alternation. An entire expression can be enclosed in parentheses to override these precedence rules and form a subexpression.

Previous references and subexpressions

The inverse reference n, where n is a single digit, matches the substring previously matched by the subexpression nin parentheses of the regular expression.

Basic regular expressions vs extended regular expressions In basic regular expressions

, the metacharacters ?, +, {, |, (, and ) lose their special meaning; instead, versions with backslashes \?, \+, \{, \|, \( and ) are used. Traditional egrep did not support the

{ metacharacter, and some egrep implementations support \{ instead, so portable scripts must avoid { in grep -E patterns and must use [{] to match a {. GNU grep -E attempts to support traditional usage by assuming that { is not special if it were the beginning of an invalid interval specification.

For example, the grep -E ‘{1’ command looks up the two-character string {1 instead of reporting a syntax error in the regular expression. POSIX.2 allows this behavior as an extension, but portable scripts should prevent it.

Environment

variables The

behavior of grep is affected

by the following environment variables.

The locale of category LC_foo is specified by examining the three environment variables LC_ALL, LC_foo, LANG, in that order. The first of these variables that is set specifies the locale. For example, if LC_ALL is not set, but is LC_MESSAGES set to pt_BR, the Brazilian Portuguese locale is used for category LC_MESSAGES. The C locale is used if none of these environment variables are set, if the locale catalog is not installed, or if grep was not compiled with National Language Support (NLS).

GREP_OPTIONS This variable specifies the default options to be placed before the explicit options. For example, if GREP_OPTIONS is ‘-binary-files=without-match -directories=skip’, grep behaves as if the two options –binary-files=without-match and -directories=skip were specified before any explicit options. The specifications of the options are separated by white spaces. A backslash escapes the next character, so you can use it to specify an option that contains white space or a backslash. GREP_COLOR This variable specifies the color used to highlight matching (non-empty) text. It is deprecated in favor of GREP_COLORS, but is still admitted. The mt, ms and mc capabilities of GREP_COLORS take precedence over it. You can specify only the color used to highlight the corresponding nonempty text on any matching line (a line selected when the -v command-line option is omitted or a context line when -v is specified). The default value is 01;31, which means bold red foreground text in the default terminal background. GREP_COLORS Specifies the colors and other attributes used to highlight various parts of the output. Its value is a colon-separated list of capabilities with the default value ms=01;31:mc=01;31:sl=:cx=:fn=35:ln=32:bn=32:se=36 with Boolean capabilities rv and ne omitted (i.e., false). The supported capabilities are as follows. sl=

SGR substring for selected whole lines (that is, matching lines when the -v command-line option is omitted, or mismatched lines when -v is specified). However, if the boolean capability rv and the command-line option -v are specified, it applies to context matching lines. The default value is empty (that is, the default color pair of the terminal).

cx=

SGR substring for entire context lines (that is, mismatched lines when the -v command-line option is omitted, or matching lines when -v is specified). However, if you specify the boolean capability rv and the command-line option -v, it applies to selected lines that do not match. The default value is empty (that is, the default color pair of the terminal).

Boolean value rv that reverses (swaps) the meanings of the sl= and cx= capabilities when the -v command-line option is specified. The default value is false (that is, capacity is ignored).

mt=01;31

SGR substring to match nonempty text on any matching line (that is, a line selected when the -v command-line option is omitted or a context line when -v is specified). Setting this is equivalent to setting ms= and mc= at the same time to the same value. The default is a foreground of bold red text over the background of the current line.

ms=01;31

SGR substring to match nonempty text on a selected line. (This is only used when the –v command-line option is omitted.) The effect of capacity sl= (or cx= si rv) remains active when this is activated. The default is a foreground of bold red text over the background of the current line.

mc=01;31

SGR substring to match nonempty text on a context line. (This is used only when the -v command-line option is specified.) The effect of capacity cx= (or sl= si rv) remains active when this is activated. The default is a foreground of bold red text over the background of the current line.

fn=35

SGR substring for file names that prefix any line of content. The default is a foreground of magenta text over the default terminal background.

ln=32

SGR substring for line numbers that prefix any line of content. The default is a foreground of green text over the default background of the terminal.

bn=32

SGR substring for byte offsets that prefix any line of content. The default is a foreground of green text over the default background of the terminal.

se=36

SGR substring for separators that are inserted between selected line fields (:), between context line fields, (-), and between adjacent line groups when specifying nonzero context (–). The default is a foreground of cyan text over the default background of the terminal.

A Boolean value that prevents erasure to the end of the line using Clear Inline (EL) on the right (33[K) each time a colored element ends. This is necessary on terminals where EL is not supported. Otherwise, it is useful on terminals for which the Boolean Terminfo back_color_erase (bce) capability does not apply, when the chosen highlight colors do not affect the background or when EL is too slow or causes too much flickering. The default value is false (that is, capacity is ignored).

Note that Boolean capabilities do not have =… . .part. They are ignored (that is, false) by default and become true when specified.

See the Select Graphic Copy (SGR) section in the documentation for the text terminal that is used for allowed values and their meaning as character attributes. These substring values are integers in decimal representation and can be concatenated with semicolons. grep is responsible for assembling the result into a complete SGR sequence (33[… m). Common values for concatenating include 1 for bold, 4 for underline, 5 for blinking, 7 for reverse, 39 for default foreground color, 30 to 37 for foreground colors, 90 to 97 for 16-color mode foreground colors, 38;5;0 to 38; 5;255 for 88-color and 256-color mode foreground colors, 49 for default background color, 40 to 47 for background colors, 100 to 107 for 16-color mode background colors, and 48;5;0 to 48; 5;255 for 88-color and 256-color mode background colors.

LC_ALL, LC_COLLATE, LANG These variables specify the locale of the LC_COLLATE category, which determines the collation sequence used to interpret range expressions such as [a-z]. LC_ALL, LC_CTYPE, LANG These variables specify the locale of the LC_CTYPE category, which determines the type of characters, such as which characters are white space. LC_ALL, LC_MESSAGES, LANG These variables specify the locale of the LC_MESSAGES category, which determines the language that grep uses for messages. The default C locale uses American English messages. POSIXLY_CORRECT If set, grep behaves as POSIX.2 requires; otherwise, grep behaves more like other GNU programs. POSIX.2 requires that options following file names be treated as file names; By default, these options are swapped at the front of the operand list and treated as options. In addition, POSIX.2 requires that unrecognized options be diagnosed as “illegal,” but since they are not actually against the law, the default is to diagnose them as “invalid.” POSIXLY_CORRECT also disables _N_GNU_nonoption_argv_flags_, described below. _N_GNU_nonoption_argv_flags_ (Here N is the numeric process ID of grep‘s.) If the ith character of this environment variable is 1, do not consider the ithoperand of grep to be an option, even though it appears to be. A shell can place this variable in the environment for each command it runs, specifying which operands are the result of wildcard expansion of the file name and therefore should not be treated as options. This behavior is only available with the GNU C library and only when no POSIXLY_CORRECT is set.

Output status

Typically, the output state is 0 if the selected lines are found, and 1 otherwise. But the output state is 2 if an error occurred, unless you use the -q or –quiet or -silent option and a selected line is found. Note, however, that POSIX only requires, for programs such as grep, cmp, and diff, that the output state in case of error be greater than 1; Therefore, it is advisable, for the sake of portability, to use a logic that proves this general condition rather than a strict equality with 2.

This is free software; see source for copying conditions. There is NO guarantee; not even for MARKETABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Error reports

Email error reports

to <bug-grep@gnu.org>, a

mailing

list whose website is <http://lists.gnu.org/mailman/listinfo/bug-grep>. grep‘s Savannah bug tracker is located in <http://savannah.gnu.org/bugs/?group=grep>.

Known bugs

Large repeat counts in the {n,m} build can cause grep to use a lot of memory. In addition, certain other dark regular expressions require exponential time and space, and can cause grep to run out of memory.

The above references are very slow and can require exponential time.

Notes

GNU is not Unix

, but Unix is a beast; its plural form is Unixen.

Referenced by

bzgrep(1), flowdumper(1), fortune(6), gnome-search-tool(1), grepmail(1), ip(8), ksh93(1), look(1), makeindex(1), mirrordir(1), mksh(1), nawk(1), nget(1), pdsh(1), perlfunc(1), perlglossary(1), procmail(1), procmailex(5), procmailrc(5), procmailsc(5), quilt(1), regex(3), sudo(8), sudoers(5), tcpstat(1), trace-cmd-record(1), uwildmat(3), wildmat(3), xzgrep(1)

Blogs

Grep(1): print lines matching pattern – Linux man page