Friday, August 5, 2011

AWK- An Introduction


AWK is an extremely versatile programming language for working on files. AWK is one of the early tools to appear in Version 7 Unix and gained popularity as a way to add computational features to a Unix pipeline. There are three variations of AWK:

AWK - the original from AT&T
NAWK - A newer, improved version from AT&T
GAWK - The Free Software foundation's version

The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string data type, associative arrays (that is, arrays indexed by key strings), and regular expressions.

The essential organization of an AWK program follows the form:

pattern { action }

The pattern specifies when the action is performed. Like most UNIX utilities, AWK is line oriented.

The AWK program below:

BEGIN { print "START" }
      { print         }
END   { print "STOP"  }

adds one line before and one line after the input file. This isn't very useful, but with a simple change, we can make this into a typical AWK program:

BEGIN { print "File\tOwner"," }
{ print $9, "\t", $3}
END { print " - DONE -" }

The below code can be saved to filename.awk

#!/bin/awk -f
BEGIN { print "File\tOwner" }
{ print $9, "\t", $3}
END   { print " - DONE -" }

The characters "\t" Indicates a tab character so the output lines up on even boundaries. The "$9" and "$3" have a meaning similar to a shell script. Instead of the ninth and third argument, they mean the ninth and third field of the input line. Change the permission with the chmod command, (i.e. "chmod +x filename.awk"), and the script becomes a new command.

There are several arithmetic operators, similar to C. These are the binary operators, which operate on two variables:


Binary Operators



Operator
Type
Meaning
+
Arithmetic
Addition
-
Arithmetic
Subtraction
*
Arithmetic
Multiplication
/
Arithmetic
Division    
%
Arithmetic
Modulo
<space>
String
Concatenation

Using variables with the value of "7" and "3," AWK returns the following results for each operator when using the print command:



Expression
Result
7+3
10
7-3   
4
7*3
21
7/3   
2.33333
7%3   
1
7 3
73

The "+" and "-" operators can be used before variables and numbers. If X equals 4, then the statement:

print -x; will print "-4."

AWK also supports the "++" and "--" operators of C. Variables can be assigned new values with the assignment operators. The second type of expression in AWK is the conditional expression.

Arithmetic values can also be converted into Boolean conditions by using relational operators:

Relational Operators



Operator
Meaning
==
Is equal
!=
Is not equal to
> 
 Is greater than
>=
Is greater than or equal to
< 
Is less than
<=
Is less than or equal to

Two operators are used to compare strings to regular expressions:


Regular Expression Operators



Operator
Meaning 
~
Matches
!~
Doesn't match 

There are only a few commands in AWK. The list and syntax follows:

if ( conditional ) statement [ else statement ]
while ( conditional ) statement
for ( expression ; conditional ; expression ) statement
for ( variable in array ) statement
break
continue
{ [ statement ] ...}
variable=expression
print [ expression-list ] [ > expression ]
printf format [ , expression-list ] [ > expression ]
next 
exit

Awk's built-in variables include the field variables: $1, $2, $3, and so on ($0 represents the entire record). They hold the text or values in the individual text-fields in a record.
Other variables include:

NR: Keeps a current count of the number of input records.
NF: Keeps a count of the number of fields in an input record. The last field in the input record can be designated by $NF.
FILENAME: Contains the name of the current input-file.
FS: Contains the "field separator" character used to divide fields on the input record. The default, "white space", includes any space and tab characters. FS can be reassigned to another character to change the field separator.
RS: Stores the current "record separator" character. Since, by default, an input line is the input record, the default record separator character is a "newline".
OFS: Stores the "output field separator", which separates the fields when Awk prints them. The default is a "space" character.
ORS: Stores the "output record separator", which separates the output records when Awk prints them. The default is a "newline" character.
OFMT: Stores the format for numeric output. The default format is "%.6g".

The print command can display the results of calculations and/or function calls:

print 3+2
print foobar(3)
print foobar(variable)
print sin(3-2)

Output may be sent to a file:

print "expression" > "file name"

or through a pipe:

print "expression" | "command"

This post includes only some basics of AWK and if you want to know more about AWK refer Internet.

Thanks

AJAY

No comments:

Post a Comment

Comments with advertisement links will not be published. Thank you.