Linux Egrep Command Help and Examples

egrep command

In Unix-like operating systems, the egrep command looks for a text pattern, using extended regular expressions to match. Running egrep is equivalent to running grep with the -E option.

This page covers the GNU/Linux version of egrep.

egrep

syntax

[options] PATTERN [FILE…]

–

A NUM,–after-context=NUM options Print NUM lines from the end context after matching lines. Place a line containing – between contiguous groups of parties. -a, –text Process a binary file as if it were text; This is equivalent to the –binary-files=text option. -B NUM,–before-context=NUM Print initial context NUM lines before matching lines. Place a line containing – between contiguous groups of parties. -C NUM, -context=NUM Print NUM lines from the output context. Place a line containing – between contiguous groups of parties. -b, –byte-offset Print the byte offset in the input file before each output line. –binary-files=TYPE If the first few bytes of a file indicate that the file contains binary data, assume that the file is of type TYPE. By default, TYPE is binary, and grep typically generates a one-line message indicating that a binary file matches, or no message at all if there is no match. If TYPE is non-match, grep assumes that a binary file does not match; this is equivalent to the -I option. If TYPE is text, grep processes a binary file as if it were text; This is equivalent to the -a option. Warning: grep -binary-files=text could generate binary garbage, which can have unpleasant side effects if the output is a terminal and if the terminal driver interprets part of it as commands. -colour[=WHEN],-color[=WHEN] Surround the matching string with the find marker in GREP_COLOR environment variable. WHEN can be ‘never‘, ‘always‘, or ‘auto‘ –c, -count Suppress normal output; Instead, print a matching line count for each input file. Using the –v, –invert-match option (see below), count the lines that don’t match. -D ACTION,-devices=ACTION If an input file is a device, FIFO, or socket, use ACTION to process it. By default, ACTION is read, which means that devices are read as if they were ordinary files. If ACTION is ignored, devices are silently bypassed. -d ACTION,-directories=ACTION If an input file is a directory, use ACTION to process it. By default, ACTION is read, which means that directories are read as if they were ordinary files. If ACTION is omitted, directories are silently ignored. If ACTION is recurring, grep reads all files under each directory, recursively; This is equivalent to the -R option. -e PATTERN,-regexp=PATTERN Use PATTERN as a pattern; Useful for protecting patterns that begin with “–“. -F, -fixed-strings Interpret PATTERN as a list of fixed strings, separated by new lines, that can match. -P, –perl-regexp Interprets PATTERN as a regular Perl expression. -f FILE, -file=FILE Get FILE patterns, one per line. The empty file contains zero patterns and therefore does not match anything. -G, –basic-regexp Interpret PATTERN as a basic regular expression (see below). This is the default. -H, –with-filename Print the file name for each match. -h, -no-filename Suppresses the prefix of file names in the output when searching for multiple files. -help Send a short help message. -I Process a binary file as if it did not contain matching data; This is equivalent to the –binary-files=without-match option. -i, –ignore-case Ignore case sensitivity in both the PATTERN and the input files. –L,-files-without-match Suppress normal output; Instead, print the name of each input file from which no output would normally be printed. The scan stops at the first match. -l, –files-with-matches Suppress normal output; Instead, print the name of each input file from which the output would normally be printed. The scan stops at the first match. -m NUM,–max-count=NUM Stop reading a file after the matching NUM lines. If the entry is a standard entry from a regular file and NUM match lines are generated, grep ensures that the standard entry is placed after the last matching line before exiting, regardless of the presence of final context lines. This allows a call process to resume a search. When grep stops after the matching NUM lines, it generates any final context lines. When the –c or -count option is also used, grep does not generate a count greater than NUM. When you also use the -v or –invert-match option, grep stops after generating mismatched NUM lines. -mmap If possible, use the mmap system call to read the input, instead of the default read system call. In some situations, -mmap produces better performance. However, –mmap can cause undefined behavior (including kernel dumps) if an input file shrinks while grep is running or if an I/O error occurs. -n, –line-number Prefix each output line with the line number within its input file. -or, -only-matching Display only the portion of a matching line that matches PATTERN. -label=LABEL Displays the input that actually comes from the standard input as the input from the LABEL file. This is especially useful for tools like zgrep, for example, gzip -cd foo.gz |grep -H -label=foo something –line-buffered Use line buffering. This may incur a performance penalty. -q, -quiet, -silent Be quiet; Do not write anything to the standard output. Exit immediately with the zero state if any match is found, even if an error was detected. See also the -s or –no-messages option. -R, –r, –recursive Read all files under each directory, recursively; This is equivalent to the -d recurse option. Modified by:-include=PATTERN Recurse in directories only looking for files that match PATTERN. –exclude=PATTERN Recurse in directories skip matching PATTERN file. -s, -no-messages Suppresses error messages about non-existent or unreadable files. Portability note: Unlike GNU grep, traditional grep did not conform to POSIX.2, because traditional grep lacked a -q option and its –s option behaved like the -q option of GNU grep. Shell scripts intended to be portable to traditional grep should avoid both –q and -s and should redirect the output to /dev/null instead. -U, -binary Treat files as binary. By default, in MS-DOS and Microsoft Windows, grep guesses the file type by looking at the contents of the first 32 KB read from the file. If grep decides that the file is a text file, it removes the CR characters from the contents of the original file (so that regular expressions with ^ and $ work correctly). Specifying -U overrides this guess, causing all files to be read and passed to the matching mechanism verbatim; If the file is a text file with CR/LF pairs at the end of each line, this causes some regular expressions to fail. This option has no effect on platforms other than MS-DOS and MS-Windows. -u, –unix-byte-offsets Unix-style byte offset report. This switch causes grep to report byte offsets as if the file were a Unix-style text file, that is, with deleted CR characters. This produces identical results to running grep on a Unix machine. This option has no effect unless the -b option is also used; has no effect on platforms other than MS-DOS and MS-Windows. -V, -version Print the grep version number to the standard error. This version number must be included in all error reports (see below). -v, –invert-match Reverse the matching direction, to select mismatched lines. -w, –word-regexp Select only those lines that contain matches that form complete words. The proof is that the matching substring must be at the beginning of the line or preceded by a constituent character other than a word. Similarly, it must be at the end of the line or followed by a constituent character other than a word. The constituent characters of the word are letters, digits and the underscore. -x, –line-regexp Select only those matches that exactly match the entire line. -and Obsolete synonym of -i. -Z, -null Generates a zero byte (the ASCII NULL character) instead of the character that normally follows a file name. For example, grep -lZ generates a zero byte after each file name instead of the usual new line. This option makes the output unambiguous, even in the presence of file names that contain unusual characters such as new lines. This option can be used with commands such as find -print0, perl -0, sort -z, and xargs -0 to process arbitrary file names, even those that contain newline characters.

Regular expressions

A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, using several operators to combine smaller expressions.

Grep understands two different versions of regular expression syntax: “basic” and “extended.” In GNU grep, there is no difference in the functionality available using either syntax. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions, which are used in egrep; The differences for basic regular expressions are summarized below.

The fundamental building blocks are regular expressions that match a single character. Most characters, including all letters and digits, are regular expressions that match each other. Any metacharacter with special meaning can be cited by preceding it with a backslash.

An expression in square brackets is a list of characters enclosed by [ and ]. Matches any individual character in that list; If the first character in the list is the caret ^ then it matches any character that is not in the list. For example, the regular expression [0123456789] matches any digit.

Within a

bracketed expression, a range expression consists of two characters separated by a hyphen (“-“). Matches any individual character that you sort between the two characters, inclusive, using the locale and character set collection sequence. For example, in the default locale of C, [a-d] is equivalent to [abcd]. Many locales sort characters in dictionary order, and in these locales [a-d] is often not equivalent to [abcd]; it could be equivalent to [aBbCcDd], for example. For traditional interpretation of bracketed expressions, you can use the C locale by setting the LC_ALL environment variable to the C value.

Finally, certain named character classes are predefined within bracketed expressions, as follows. Their names are self-explanatory, and are [:alnum:], [:alpha:], [:cntrl:], [:d igit:], [:graph:], [:lower:], [:p rint:], [:p unct:], [:space:], [:upper:], and [:xdigit:]. For example, [[:alnum:]] means [0-9A-Za-z], except that the latter form depends on the C locale and ASCII character encoding, while the former is independent of locale and character set. (Note that the square brackets in these class names are part of the symbolic names and must be included in addition to the square brackets that delimit the list of square brackets.) Most metacharacters lose their special meaning within lists. To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal, place it at the end.

The dot . matches any individual character. The symbol \w is synonymous with [[:alnum:]] and W is synonymous with [^[:alnum]].

The caret ^ and the dollar sign (“$“) are metacharacters that match respectively the empty string at the beginning and end of a line. The symbols \< and > respectively match the empty string at the beginning and end of a word. The symbol \b matches the empty string on the edge of a word, and B matches the empty string as long as it is not on the edge of a word.

A regular expression can be followed by one of several repeat operators:

? The above item is optional and matches at most once. * The previous item will match zero or more times. + The previous item will match one or more times. {n} The above item matches exactly n times. {n,} The previous item matches n or more times. {n,m} The above item coincides at least n times, but not more than m times.

Two regular expressions can be concatenated; the resulting regular expression matches any string formed by the concatenation of two substrings that respectively match the concatenated subexpressions.

The infix | operator can join two regular expressions; the resulting regular expression matches any string that matches any of the subexpressions.

Repetition takes precedence over

concatenation, which in turn takes precedence over alternation. A full subexpression can be enclosed in parentheses to override these precedence rules.

The inverse reference n, where n is a single digit, matches the substring previously matched by the subexpression n in parentheses of the regular expression.

In basic regular expressions, the metacharacters ?, +, {, |, (, , and ) lose their special meaning; instead,

use the backslash versions \?, \+, \{, \|, $ and $.

Traditional egrep

did not support the { metacharacter, and some egrep implementations support \{ instead, so portable scripts must avoid { in egrep patterns and must use [ {] to match a {

GNU egrep attempts to support traditional usage by assuming that { is not special if it would be the beginning of an invalid range specification. For example, the egrep ‘{1’ shell command looks up the two-character string {1 instead of reporting a syntax error in the regular expression. POSIX.2 allows this behavior as an extension, but portable scripts should prevent it.

Environment variables

Grep behavior is affected

by the following environment variables: A locale LC_foo is specified by examining the three

environment

variables

LC_ALL, LC_foo, LANG, in that order. The first of these variables that is set specifies the locale. For example, if LC_ALL is not set, but is LC_MESSAGES set to pt_BR, Brazilian Portuguese is used for the LC_MESSAGES locale. The C locale is used if none of these environment variables are set, if the locale catalog is not installed, or if grep was not compiled with national language support (NLS).

GREP_OPTIONS

This variable specifies the default options to be placed before the explicit options. For example, if GREP_OPTIONS is ‘-binary-files=without-match -directories=skip’, grep behaves as if the two options –binary-files=without-match and -directories=skip were specified before the explicit options. The specifications of the options are separated by white spaces. A backslash escapes the next character, so you can use it to specify an option that contains white space or a backslash.

GREP_COLOR

Specifies the marker to highlight

. LC_ALL, LC_COLLATE, LANG These variables specify the LC_COLLATE locale, which determines the

collation sequence used to interpret range expressions such as [a-z]. LC_ALL, LC_CTYPE

, LANG These variables specify the LC_CTYPE locale, which determines the

type of characters, such as which characters are white space

LC_ALL

, LC_MESSAGES,

LANG

These variables specify the LC_MESSAGES locale, which determines the language grep uses for messages.

The default C locale uses American English messages.

POSIXLY_CORRECT

If set, grep behaves as

POSIX.2 requires; otherwise, grep behaves more like other GNU programs. POSIX.2 requires that options following file names be treated as file names; By default, these options are swapped at the front of the operand list and treated as options. In addition, POSIX.2 requires that unrecognized options be diagnosed as “illegal,” but since they are not actually against the law, the default is to diagnose them as “invalid.” POSIXLY_CORRECT also disables _N_GNU_nonoption_argv_flags_, described below.

_N_GNU_nonoption_argv_flags_

(Here N is the numerical process ID of grep’s.) If the ith character of this environment variable is 1, do not consider the ithoperand of grep to be an option, even though it appears to be. A shell can place this variable in the environment for each command it runs, specifying which operands are the result of wildcard expansion of the file name and therefore should not be treated as options. This behavior is only available with the GNU C library and only when no POSIXLY_CORRECT is set.

Examples

egrep “support|help|windows” myfile.txt Look for support help patterns and windows in the myfile.txt file. egrep ‘^[a-zA-Z]+$’ myfile.txt Match the lines in myfile.txt starting one line with an alphabetic word that also ends the line. egrep -c ‘^begin|end$’ myfile.txt Count the

number of lines in myfile.txt beginning with the word ‘begin’ or ending with the word ‘end’

Related commands

fgrep – Filter text that matches a fixed character string.

grep — Filter text that matches a regular expression. sed — A utility for filtering and transforming text. sh — The Bourne shell shell interpreter.

Blogs

Linux Egrep Command Help and Examples – Computer Hope