Transcript for each

Revision Lecture
Mauro Jaskelioff
AWK Program Structure
• AWK programs consists of patterns
and procedures
Pattern_1
Pattern_2
Pattern_3
…
Pattern_n
{ Procedure_1}
{ Procedure_2}
{ Procedure_3}
…
{ Procedure_n}
• Additionally, a program can contain function
definitions (but we don’t need to worry about them
now)
Example program
BEGIN
{
$7 ~ /bash/
}
{
$4 == 0
}
{
}
{
}
FS= ":"
print “Example v0.1"
print $1 " uses bash"
print "user " $1 " belongs to the root group"
print "--------------------------------"
• Don’t mind details! Try to recognize the general
structure described on the previous slide.
AWK Input
• AWK input consists of records and fields
• Records are separated by a record
separator RS
• By default the RS is a newline, so each
record is a line of input
• Each record consists of zero or more
fields, separated by a field separator FS
• By default the FS is blank space.
• The current record is $0. Each of its fields
is $1, $2, …
Example of inputs
Consider the following
input file:
• Default RS and
default FS
if $0=“Red,255 0 0”
then $1=“Red,255”,
$2=“0” and $3=“0”
• With FS=‘,’
if $0=“Red,255 0 0”
then $1=“Red” and
$2=“255 0 0”
Red,255 0 0
Green,0 255 0
Blue,0 0 255
Red,255 0 0
Green,0 255 0
Blue,0 0 255
Red,255 0 0
Green,0 255 0
Blue,0 0 255
AWK’s Main loop (simplified)
for each input record r do
parse r
for each pattern pati do
if r matches pati then
execute proci
Patterns
A pattern can be:
• Relational expression
– Use relational operators, e.g. $1 > $2
awk -F: ‘$1 > $2 {print $0}’ /etc/passwd
– Can do numeric or string comparisons
awk -F: ‘$1==“gdm” {print $0}’ /etc/passwd
• An empty pattern
awk -F: ‘{print $0}’ /etc/passwd
– Always True
– Equivalent to a true expression. For example,
the command above is the same as:
awk -F: ‘1 < 2 {print $0}’ /etc/passwd
Patterns (2)
• Pattern-matching expression
– E.g. quoted strings, numbers, operators,
defined variables…
– ~ means match, !~ means don’t match
awk -F: '$1 ~ /.dm.*/ {print $0}' /etc/passwd
awk -F: '$0 ~ /^...:/ {print $0}' /etc/passwd
awk -F: '$1 !~ /^g/ {print $0}' /etc/passwd
• /regular expression/
– Equivalent to $0 ~ /regular expression/
awk -F: ‘/^...:/ {print $1}’ /etc/passwd
Special patterns
• Two special patterns:
– BEGIN
• Specifies procedures that take place before the first
input line is processed
awk ‘BEGIN {print “Version 1.0”}’ dataFile
– END
• Specifies procedures that take place after the last
input record is read
awk ‘END {print “end of data”}’ dataFile
• This means we need to refine description
of the main loop (see next slide)
AWK’s refined Main loop
for each BEGIN pattern do
execute corresponding procedure
for each input record r do
parse r
for each pattern pati do
if r matches pati then
execute proci
for each END pattern do
execute corresponding procedure
This is the previous
version of the main loop
Procedures
• Procedures consist of the usual
assignment, conditional, and looping
statements found in most
languages.
• These are separated by newlines or
semi-colons and are contained within
curly brackets { }
• A procedure can be empty. The
empty procedure prints $0.
awk Built-in Variables
• awk has a number of built in
variables:
– FILENAME - current filename
– FS - Field separator
– NF - Number of fields in current record
– NR - Number of current record
– RS - Record separator
– $0 - Entire input record
– $n - nth field in current record
Control Structures
• if (condition) statement
• if (condition) statement else
statement
• for (expr1; expr2; expr3) statement
• for (index in array) statement
– More about this when we review arrays.
• while (condition) statement
For-While equivalence
for (expr1; expr2; expr3) statement
is equivalent to:
expr1;
while (expr2) {
statement;
expr3
}
awk Operators
Symbol
Meaning
$
Field reference
++ --
Increment, decrement
+-!
Addition, subtraction, negation
*/%
Multiplication, division, modulus
< <= > >= != ==
Relational operators
~ !~
Match regular expression and negation
in
Array membership
&& ||
Logical and, Logical or
?:
If-then-else for expressions
x == y ? “Equal” : “Not equal”
= += -= *= /= %=
Assignment
Arrays in awk
• awk has arrays with elements subscripted
with strings (associative arrays)
• Assign arrays in one of two ways:
– Name them in an assignment statement
• myArray[i]=n++
• myArray["Red"]="255 0 0"
– Use the split(str,arr,fs) function which splits
the string str into elements of array arr, using
field separator, fs. It returns the number of
fields used.
• n=split(input, words, " ")
Example of split
m=split("Blue 0 0 255",colors," ")
results in:
m ← 4
colors[1]
colors["2"]
colors[3]
colors["4"]
←
←
←
←
"Blue"
"0"
"0"
"255"
• Since indexes are really strings it's legal to
write them enclosed in quotes
Reading elements in an array
• Using a for loop:
for (index in array)
print array[index]
– Since indexes are strings, this is the only
way to loop through all elements of an
array
• Using the operator in:
if (index in array)
...
– we use this to test if an index exists.