Transcript Lecture 10

Lecture 10
Introduction to AWK
COP 3353 Introduction to UNIX
1
What is AWK
• Important early text manipulation language
– Created by Al Aho, Peter Weinberger & Brian
Kernighan
• This Unix utility manipulates text files that are
viewed as arranged in columns
• awk splits each line of input (from standard input
or a set of files) based on whitespace (the default)
and processes each line - the field separator need
not be whitespace but can also be a specified
character
• There are also other flavors of awk such as nawk
and gawk
2
Awk Command Structure
awk [options] ‘program’ [file(s)]
awk [options] -f programfile [files(s)]
• A program can be one or more pairs of the
following:
pattern { procedure }
• BEGIN and END constructs can also be used
• An important option is -Fc where c is the field
separator to use. For example awk -F: . . .
indicates that the separator is”:”
• Example
awk -F: ‘/this/ { print $2 }’ file1
3
Awk Program Processing
• awk scans each input line for pattern and when a match
occur the associated actions defined by procedure are
executed. The general form of a program is:
BEGIN { initial statements }
pattern { procedure }
pattern { procedure }
END { final statements }
– If the pattern is missing, the procedure is applied to each line
– If procedure is missing, then the matched lines are written to
standard output
• Fields are referred to by the variables $1, $2, …, $n. $0
refers to the entire record (the line).
• Statements following BEGIN are done before any patternprocedures; statements after END are done after all
pattern-procedures.
• In most programs there is only one pattern {procedure}
4
awk patterns
• awk patterns can be of the following form
/regular expression/
relational expression
field-matching expression
• Example patterns
/this/
/^alpha*/
NF > 2
$1 == $2
$1 ~ /m$/
5
Example pattern-procedures
• Print the second field of each line
{ print $2 }
• Print the first field of all lines that contain the
pattern alpha
/alpha/ { print $1 }
• Print all records containing more than two fields
NF > 2
• Add numbers in second column if first field
matches the word “add”
• $1 ~ /^add$/ { total += $2 }
END { print “total is”, total }
6
awk Regular Expressions
• Regular expressions are formed in the same way
as they are for extended grep. All the operators
are available
• Note that regular expressions must be placed with
the slashes: /<regular expression>/
• Examples
/D[Rr]\./
#matches any line containing DR. or Dr.
/^alpha/
#matches any line starting with alpha
/^[a-zA-Z]+/ #matches any line starting with a sequence of
#letters (one or more)
7
awk Relational Expressions
• Relational expressions can consist of strings, numbers,
arithmetic / string operators, relational operators, defined
variables, and predefined variables.
–
–
–
–
–
–
$1, …, $n, are the fields of the record
$0 is the entire line
NF is the number of fields in the current line
NR is the number of the current line
FS is the field separator
FILENAME is the current filename
• many relational operators are available
NF > 5 && $1 == $2
/while/ || /do/
• Note: variables can be assigned with the “=“ operator
FS = “,”
total = 5
8
awk field matching expressions
• Field matching expressions can check if a regular
expression matches “~” or does not match “!~” a
field.
• Examples
$1 ~ /D[Rr]\./ #first field matches DR. or Dr. ?
$1 !~ /From/ #first field does not match From ?
9
awk procedures
• An awk procedure specifies the processing of a
line that matches a given pattern. An awk
procedure is contained within the “{“ and “}” and
consists of statements separated by semicolons or
newlines.
• awk is a full programming language, and contains
control statements (such as: do while, for, if,
break, continue, etc.)
• Note that BEGIN can be used to initialize
variables and END can be used to do post
processing after all records have been processed
10
awk examples
• #print the first two fields of each line if the first field
matches the string /this/
awk ‘/this/ { print $2, $1 }’ file1
• #sum the values of the fields in the second column and
print out the final sum, if the first field matches add
awk ‘BEGIN { sum=0 } /add/ { sum += $2 } \
END{ print sum }’ file2
• # illustrating if statements and the or operator
awk ‘/green/ || /yellow/ \
{if ($1==“green") print $1 ; \
else if ($1=="yellow") print "SLOW DOWN";}’ \
file3
11