AWK language course aims to explain AWK in 15 minutes to let you find awesome tool friend despite it's given name. The correct pronunciation is [auk] after smaller seabirds Parakeet auklets.
AWK language (is):
- (mainly) text processing language
 - available on most UNIX-like systems by default, on Windows there is either native binary or cygwin one
 - syntax is influenced by 
candshellprogramming languages - programs from single line to multiple library files
 - several implementations available, notably 
gawkandmawk - solves generaly same problems as similar text-processing tools 
sed,grep,wc,tr,cut,printf,tail,head,cat,tac,bc,column, ... 
AWK language use-cases are:
- computing int / floating point math formulas (based on input)
 - general text-processing
- cutting pieces from input text stream
 - reformatting input text stream
 
 - (shell) meta-programming generator
 
AWK language capabilities:
- text-processing functions
 - regular expression support
 - math functions
 - dynamic typing, support for
- integer / long
 - floats
 - associative arrays (including multi-dimensional array support)
 
 - external execution support
 
Every AWK execution consist of folowing three phases:
- [1] 
BEGIN{ ... }are actions performed at the beginning before first text character is read- multiple blocks allowed (normally single)
 
 - [2] 
[condition]{ ... }are actions performed on everyAWK record(default text line)- every 
AWK recordis automatically split intoAWK fields(by default words) - multiple blocks allowed
 
 - every 
 - [3] 
END{ ... }are actions performed at the end of the execution after last text character is read- multiple blocks allowed (normally single)
 
 
$ echo -e "AWK is still useful\ntext-processing  technology!" | \
>   awk 'BEGIN{wcnt=0;print "lineno/#words/3rd-word:individual words\n"}
>             {printf("% 6d/% 6d/% 8s:%s\n",NR,NF,$3,$0);wcnt+=NF}
>          END{print "\nSummary:", NR, "lines/records,", wcnt, "words/fields"}'
lineno/#words/3rd-word:individual words
     1/     4/   still:AWK is still useful
     2/     2/        :text-processing  technology!
Summary:2 lines/records, 6 words/fields- 
Passing text data to AWK:
- from pipe: 
cat input-data.txt | awk <app> - from file[s] read by awk itself: 
awk <app> input-data.txt 
 - from pipe: 
 - 
AWK application execution styles (
-f):- on command-line 
awk '{ ... }' input-data.txt - in separate files 
awk -f myapp.awk input-data.txt 
 - on command-line 
 - 
specifying an AWK variable on command-line
-v var=val - 
specifying
AWK fieldseparatorFSvariable or-F <FS>switch 
Global variables are documented here, most common ones are:
$0value of currentAWK record(whole line without line-break)$1,$2, ...$NFvalues of first, second, ... lastAWK field(word)
FSSpecifies the inputAWK fieldseparator, i.e. how AWK breaks input record into fields (default: a whitespace).RSSpecifies the inputAWK recordseparator, i.e. how AWK breaks input stream into records (default: an universal line break).OFSSpecifies the output separator, i.e. how AWK print parsed fields to the output stream usingprint()(default: single space).ORSSpecifies the output separator, i.e. how AWK print parsed records to the output stream usingprint()(default: line break)FILENAMEcontains the name of the input file read by awk (read only global variable)
AWK functions are documented, the most important ones are:
print,printf()andsprintf()- printing functions
 
length()- length of an string argument
 
substr()- splitting string to a substring
 
split()- split string into an array of strings
 
index()- find position of an substring in a string
 
sub()andgsub()- (regexp) search and replace (once respectivelly globally)
 
~operator andmatch()- regexp search
 
tolower()andtoupper()- convert text to lowercase resp. uppercase
 
- Hello world
 - Word count using wc and awk
 - Pattern search using grep and awk
 - Uniq words in awk
 - Computing the average
 - Text stream FSM machine
 - Manipulation with text columns
 - Shell metaprogramming with awk
 - Why is cut very limited to awk
 - Memory hungry application
 - CPU intensive application
 - Debugging / profiling AWK application
 - GNU AWK network programing
 - 30 seconds of AWK code
 
Prefer general awk before an specific AWK implementation:
- use general 
awkfor portable programs - otherwise use the particular implementation e.g. 
gawk 
General rule of thumb is to create AWK program as a *.awk file if equivalent one-liner is not well readable.
If you have troubles to understand one line awk program then feel free to use GNU AWK's profiling functionality i.e. -p option to receive pretty printed AWK code (in awkprof.out).
- comment properly
 - indent similarly as in c/c++ programmimng languages
 - use functions whenever possible
 - stay explicit avoiding awk default (implicit) actions which make AWK application hard to understand
- example: 
length > 80should be rather written'length($0) > 80 { print }'or'length($0) > 80 { print $0 }' 
 - example: 
 
- don't forget to always use apostrophe 
'quotation when writing awk oneline applications to avoid shell expansion (for instance$1)awk "{print $1}"should beawk '{print $1}'
 - use one of the recommended implementations as old implementations are quite limited (old 
awkornawk) - string / array indexing from 
1(index(),split(),$i, ...) - GNU AWK implementation understand localization & utf-8/unicode and thus replacing with 
[g]sub()can lead to unwanted behavior unless you force gawk to drop such support via exporting environment variableLC_ALL=C- other awk implementations may not support utf-8/unicode:
 
 
# awk implementation versions
GNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.1)
mawk 1.3.4 20161107
BusyBox v1.22.1 (2016-02-03 18:22:11 UTC) multi-call binary.
$ echo "Zřetelně" | gawk '{print toupper($0)}'
ZŘETELNĚ
$ echo "Zřetelně" | mawk '{print toupper($0)}'
ZřETELNě
$ echo "Zřetelně" | busybox awk '{print toupper($0)}'
ZřETELNě
- extended reqular expressions are available just for gawk (and for older version has to be explicitly enabled):
 
$ ps auxwww | gawk '{if($2~/^[0-9]{1,1}$/){print}}'
root         1  0.0  0.0 197064  4196 ?        Ss   Oct31   2:21 /usr/lib/systemd/systemd --switched-root --system --deserialize 24
root         4  0.0  0.0      0     0 ?        S<   Oct31   0:00 [kworker/0:0H]
$ ps auxwww | gawk --re-interval '{if($2~/^[0-9]{1,1}$/){print}}'
root         1  0.0  0.0 197064  4196 ?        Ss   Oct31   2:21 /usr/lib/systemd/systemd --switched-root --system --deserialize 24
root         4  0.0  0.0      0     0 ?        S<   Oct31   0:00 [kworker/0:0H]
$ ps auxwww | mawk '{if($2~/^[0-9]{1,1}$/){print}}'
$


