One of the most common actions is to print, or output,
some or all of the input. You use the print
statement
for simple output. You use the printf
statement
for fancier formatting. Both are described in this chapter.
print
Statement
The print
statement does output with simple, standardized
formatting. You specify only the strings or numbers to be printed, in a
list separated by commas. They are output, separated by single spaces,
followed by a newline. The statement looks like this:
print item1, item2, ...
The entire list of items may optionally be enclosed in parentheses. The
parentheses are necessary if any of the item expressions uses the `>'
relational operator; otherwise it could be confused with a redirection
(see section Redirecting Output of print
and printf
).
The items to be printed can be constant strings or numbers, fields of the
current record (such as $1
), variables, or any awk
expressions.
Numeric values are converted to strings, and then printed.
The print
statement is completely general for
computing what values to print. However, with two exceptions,
you cannot specify how to print them--how many
columns, whether to use exponential notation or not, and so on.
(For the exceptions, see section Output Separators, and
section Controlling Numeric Output with print
.)
For that, you need the printf
statement
(see section Using printf
Statements for Fancier Printing).
The simple statement `print' with no items is equivalent to
`print $0': it prints the entire current record. To print a blank
line, use `print ""', where ""
is the empty string.
To print a fixed piece of text, use a string constant such as
"Don't Panic"
as one item. If you forget to use the
double-quote characters, your text will be taken as an awk
expression, and you will probably get an error. Keep in mind that a
space is printed between any two items.
Each print
statement makes at least one line of output. But it
isn't limited to one line. If an item value is a string that contains a
newline, the newline is output along with the rest of the string. A
single print
can make any number of lines this way.
print
StatementsHere is an example of printing a string that contains embedded newlines (the `\n' is an escape sequence, used to represent the newline character; see section Escape Sequences):
$ awk 'BEGIN { print "line one\nline two\nline three" }' -| line one -| line two -| line three
Here is an example that prints the first two fields of each input record, with a space between them:
$ awk '{ print $1, $2 }' inventory-shipped -| Jan 13 -| Feb 15 -| Mar 15 ...
A common mistake in using the print
statement is to omit the comma
between two items. This often has the effect of making the items run
together in the output, with no space. The reason for this is that
juxtaposing two string expressions in awk
means to concatenate
them. Here is the same program, without the comma:
$ awk '{ print $1 $2 }' inventory-shipped -| Jan13 -| Feb15 -| Mar15 ...
To someone unfamiliar with the file `inventory-shipped', neither
example's output makes much sense. A heading line at the beginning
would make it clearer. Let's add some headings to our table of months
($1
) and green crates shipped ($2
). We do this using the
BEGIN
pattern
(see section The BEGIN
and END
Special Patterns)
to force the headings to be printed only once:
awk 'BEGIN { print "Month Crates" print "----- ------" } { print $1, $2 }' inventory-shipped
Did you already guess what happens? When run, the program prints the following:
Month Crates ----- ------ Jan 13 Feb 15 Mar 15 ...
The headings and the table data don't line up! We can fix this by printing some spaces between the two fields:
awk 'BEGIN { print "Month Crates" print "----- ------" } { print $1, " ", $2 }' inventory-shipped
You can imagine that this way of lining up columns can get pretty
complicated when you have many columns to fix. Counting spaces for two
or three columns can be simple, but more than this and you can get
lost quite easily. This is why the printf
statement was
created (see section Using printf
Statements for Fancier Printing);
one of its specialties is lining up columns of data.
As a side point,
you can continue either a print
or printf
statement simply
by putting a newline after any comma
(see section awk
Statements Versus Lines).
As mentioned previously, a print
statement contains a list
of items, separated by commas. In the output, the items are normally
separated by single spaces. This need not be the case; a
single space is only the default. You can specify any string of
characters to use as the output field separator by setting the
built-in variable OFS
. The initial value of this variable
is the string " "
, that is, a single space.
The output from an entire print
statement is called an
output record. Each print
statement outputs one output
record and then outputs a string called the output record separator.
The built-in variable ORS
specifies this string. The initial
value of ORS
is the string "\n"
, i.e. a newline
character; thus, normally each print
statement makes a separate line.
You can change how output fields and records are separated by assigning
new values to the variables OFS
and/or ORS
. The usual
place to do this is in the BEGIN
rule
(see section The BEGIN
and END
Special Patterns), so
that it happens before any input is processed. You may also do this
with assignments on the command line, before the names of your input
files, or using the `-v' command line option
(see section Command Line Options).
The following example prints the first and second fields of each input record separated by a semicolon, with a blank line added after each line:
$ awk 'BEGIN { OFS = ";"; ORS = "\n\n" } > { print $1, $2 }' BBS-list -| aardvark;555-5553 -| -| alpo-net;555-3412 -| -| barfly;555-7685 ...
If the value of ORS
does not contain a newline, all your output
will be run together on a single line, unless you output newlines some
other way.
print
When you use the print
statement to print numeric values,
awk
internally converts the number to a string of characters,
and prints that string. awk
uses the sprintf
function
to do this conversion
(see section Built-in Functions for String Manipulation).
For now, it suffices to say that the sprintf
function accepts a format specification that tells it how to format
numbers (or strings), and that there are a number of different ways in which
numbers can be formatted. The different format specifications are discussed
more fully in
section Format-Control Letters.
The built-in variable OFMT
contains the default format specification
that print
uses with sprintf
when it wants to convert a
number to a string for printing.
The default value of OFMT
is "%.6g"
.
By supplying different format specifications
as the value of OFMT
, you can change how print
will print
your numbers. As a brief example:
$ awk 'BEGIN { > OFMT = "%.0f" # print numbers as integers (rounds) > print 17.23 }' -| 17
According to the POSIX standard, awk
's behavior will be undefined
if OFMT
contains anything but a floating point conversion specification
(d.c.).
printf
Statements for Fancier Printing
If you want more precise control over the output format than
print
gives you, use printf
. With printf
you can
specify the width to use for each item, and you can specify various
formatting choices for numbers (such as what radix to use, whether to
print an exponent, whether to print a sign, and how many digits to print
after the decimal point). You do this by supplying a string, called
the format string, which controls how and where to print the other
arguments.
printf
Statement
The printf
statement looks like this:
printf format, item1, item2, ...
The entire list of arguments may optionally be enclosed in parentheses. The
parentheses are necessary if any of the item expressions use the `>'
relational operator; otherwise it could be confused with a redirection
(see section Redirecting Output of print
and printf
).
The difference between printf
and print
is the format
argument. This is an expression whose value is taken as a string; it
specifies how to output each of the other arguments. It is called
the format string.
The format string is very similar to that in the ANSI C library function
printf
. Most of format is text to be output verbatim.
Scattered among this text are format specifiers, one per item.
Each format specifier says to output the next item in the argument list
at that place in the format.
The printf
statement does not automatically append a newline to its
output. It outputs only what the format string specifies. So if you want
a newline, you must include one in the format string. The output separator
variables OFS
and ORS
have no effect on printf
statements. For example:
BEGIN { ORS = "\nOUCH!\n"; OFS = "!" msg = "Don't Panic!"; printf "%s\n", msg }
This program still prints the familiar `Don't Panic!' message.
A format specifier starts with the character `%' and ends with a
format-control letter; it tells the printf
statement how
to output one item. (If you actually want to output a `%', write
`%%'.) The format-control letter specifies what kind of value to
print. The rest of the format specifier is made up of optional
modifiers which are parameters to use, such as the field width.
Here is a list of the format-control letters:
c
d
i
e
E
printf "%4.3e\n", 1950prints `1.950e+03', with a total of four significant figures of which three follow the decimal point. The `4.3' are modifiers, discussed below. `%E' uses `E' instead of `e' in the output.
f
printf "%4.3f", 1950prints `1950.000', with a total of four significant figures of which three follow the decimal point. The `4.3' are modifiers, discussed below.
g
G
o
s
x
X
%
When using the integer format-control letters for values that are outside
the range of a C long
integer, gawk
will switch to the
`%g' format specifier. Other versions of awk
may print
invalid values, or do something else entirely (d.c.).
printf
FormatsA format specification can also include modifiers that can control how much of the item's value is printed and how much space it gets. The modifiers come between the `%' and the format-control letter. In the examples below, we use the bullet symbol "*" to represent spaces in the output. Here are the possible modifiers, in the order in which they may appear:
-
printf "%-4s", "foo"prints `foo*'.
space
+
#
0
width
printf "%4s", "foo"prints `*foo'. The value of width is a minimum width, not a maximum. If the item value requires more than width characters, it can be as wide as necessary. Thus,
printf "%4s", "foobar"prints `foobar'. Preceding the width with a minus sign causes the output to be padded with spaces on the right, instead of on the left.
.prec
printf "%.4s", "foobar"prints `foob'.
The C library printf
's dynamic width and prec
capability (for example, "%*.*s"
) is supported. Instead of
supplying explicit width and/or prec values in the format
string, you pass them in the argument list. For example:
w = 5 p = 3 s = "abcdefg" printf "%*.*s\n", w, p, s
is exactly equivalent to
s = "abcdefg" printf "%5.3s\n", s
Both programs output `**abc'.
Earlier versions of awk
did not support this capability.
If you must use such a version, you may simulate this feature by using
concatenation to build up the format string, like so:
w = 5 p = 3 s = "abcdefg" printf "%" w "." p "s\n", s
This is not particularly easy to read, but it does work.
C programmers may be used to supplying additional `l' and `h'
flags in printf
format strings. These are not valid in awk
.
Most awk
implementations silently ignore these flags.
If `--lint' is provided on the command line
(see section Command Line Options),
gawk
will warn about their use. If `--posix' is supplied,
their use is a fatal error.
printf
Here is how to use printf
to make an aligned table:
awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list
prints the names of bulletin boards ($1
) of the file
`BBS-list' as a string of 10 characters, left justified. It also
prints the phone numbers ($2
) afterward on the line. This
produces an aligned two-column table of names and phone numbers:
$ awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list -| aardvark 555-5553 -| alpo-net 555-3412 -| barfly 555-7685 -| bites 555-1675 -| camelot 555-0542 -| core 555-2912 -| fooey 555-1234 -| foot 555-6699 -| macfoo 555-6480 -| sdace 555-3430 -| sabafoo 555-2127
Did you notice that we did not specify that the phone numbers be printed as numbers? They had to be printed as strings because the numbers are separated by a dash. If we had tried to print the phone numbers as numbers, all we would have gotten would have been the first three digits, `555'. This would have been pretty confusing.
We did not specify a width for the phone numbers because they are the last things on their lines. We don't need to put spaces after them.
We could make our table look even nicer by adding headings to the tops
of the columns. To do this, we use the BEGIN
pattern
(see section The BEGIN
and END
Special Patterns)
to force the header to be printed only once, at the beginning of
the awk
program:
awk 'BEGIN { print "Name Number" print "---- ------" } { printf "%-10s %s\n", $1, $2 }' BBS-list
Did you notice that we mixed print
and printf
statements in
the above example? We could have used just printf
statements to get
the same results:
awk 'BEGIN { printf "%-10s %s\n", "Name", "Number" printf "%-10s %s\n", "----", "------" } { printf "%-10s %s\n", $1, $2 }' BBS-list
By printing each column heading with the same format specification used for the elements of the column, we have made sure that the headings are aligned just like the columns.
The fact that the same format specification is used three times can be emphasized by storing it in a variable, like this:
awk 'BEGIN { format = "%-10s %s\n" printf format, "Name", "Number" printf format, "----", "------" } { printf format, $1, $2 }' BBS-list
See if you can use the printf
statement to line up the headings and
table data for our `inventory-shipped' example covered earlier in the
section on the print
statement
(see section The print
Statement).
print
and printf
So far we have been dealing only with output that prints to the standard
output, usually your terminal. Both print
and printf
can
also send their output to other places.
This is called redirection.
A redirection appears after the print
or printf
statement.
Redirections in awk
are written just like redirections in shell
commands, except that they are written inside the awk
program.
There are three forms of output redirection: output to a file,
output appended to a file, and output through a pipe to another
command.
They are all shown for
the print
statement, but they work identically for printf
also.
print items > output-file
awk
program can write a list of
BBS names to a file `name-list' and a list of phone numbers to a
file `phone-list'. Each output file contains one name or number
per line.
$ awk '{ print $2 > "phone-list" > print $1 > "name-list" }' BBS-list $ cat phone-list -| 555-5553 -| 555-3412 ... $ cat name-list -| aardvark -| alpo-net ...
print items >> output-file
awk
output is
appended to the file.
If output-file does not exist, then it is created.
print items | command
awk
expression. Its value is converted to a string, whose contents give the
shell command to be run.
For example, this produces two files, one unsorted list of BBS names
and one list sorted in reverse alphabetical order:
awk '{ print $1 > "names.unsorted" command = "sort -r > names.sorted" print $1 | command }' BBS-listHere the unsorted list is written with an ordinary redirection while the sorted list is written by piping through the
sort
utility.
This example uses redirection to mail a message to a mailing
list `bug-system'. This might be useful when trouble is encountered
in an awk
script run periodically for system maintenance.
report = "mail bug-system" print "Awk script failed:", $0 | report m = ("at record number " FNR " of " FILENAME) print m | report close(report)The message is built using string concatenation and saved in the variable
m
. It is then sent down the pipeline to the mail
program.
We call the close
function here because it's a good idea to close
the pipe as soon as all the intended output has been sent to it.
See section Closing Input and Output Files and Pipes,
for more information
on this. This example also illustrates the use of a variable to represent
a file or command: it is not necessary to always
use a string constant. Using a variable is generally a good idea,
since awk
requires you to spell the string value identically
every time.
Redirecting output using `>', `>>', or `|' asks the system to open a file or pipe only if the particular file or command you've specified has not already been written to by your program, or if it has been closed since it was last written to.
Many awk
implementations limit the number of pipelines an awk
program may have open to just one! In gawk
, there is no such limit.
You can open as many pipelines as the underlying operating system will
permit.
gawk
Running programs conventionally have three input and output streams already available to them for reading and writing. These are known as the standard input, standard output, and standard error output. These streams are, by default, connected to your terminal, but they are often redirected with the shell, via the `<', `<<', `>', `>>', `>&' and `|' operators. Standard error is typically used for writing error messages; the reason we have two separate streams, standard output and standard error, is so that they can be redirected separately.
In other implementations of awk
, the only way to write an error
message to standard error in an awk
program is as follows:
print "Serious error detected!" | "cat 1>&2"
This works by opening a pipeline to a shell command which can access the
standard error stream which it inherits from the awk
process.
This is far from elegant, and is also inefficient, since it requires a
separate process. So people writing awk
programs often
neglect to do this. Instead, they send the error messages to the
terminal, like this:
print "Serious error detected!" > "/dev/tty"
This usually has the same effect, but not always: although the
standard error stream is usually the terminal, it can be redirected, and
when that happens, writing to the terminal is not correct. In fact, if
awk
is run from a background job, it may not have a terminal at all.
Then opening `/dev/tty' will fail.
gawk
provides special file names for accessing the three standard
streams. When you redirect input or output in gawk
, if the file name
matches one of these special names, then gawk
directly uses the
stream it stands for.
awk
execution (typically
the shell). Unless you take special pains in the shell from which
you invoke gawk
, only descriptors 0, 1 and 2 are available.
The file names `/dev/stdin', `/dev/stdout', and `/dev/stderr' are aliases for `/dev/fd/0', `/dev/fd/1', and `/dev/fd/2', respectively, but they are more self-explanatory.
The proper way to write an error message in a gawk
program
is to use `/dev/stderr', like this:
print "Serious error detected!" > "/dev/stderr"
gawk
also provides special file names that give access to information
about the running gawk
process. Each of these "files" provides
a single record of information. To read them more than once, you must
first close them with the close
function
(see section Closing Input and Output Files and Pipes).
The filenames are:
$1
getuid
system call
(the real user ID number).
$2
geteuid
system call
(the effective user ID number).
$3
getgid
system call
(the real group ID number).
$4
getegid
system call
(the effective group ID number).
getgroups
system call.
(Multiple groups may not be supported on all systems.)
These special file names may be used on the command line as data
files, as well as for I/O redirections within an awk
program.
They may not be used as source files with the `-f' option.
Recognition of these special file names is disabled if gawk
is in
compatibility mode (see section Command Line Options).
Caution: Unless your system actually has a `/dev/fd' directory
(or any of the other above listed special files),
the interpretation of these file names is done by gawk
itself.
For example, using `/dev/fd/4' for output will actually write on
file descriptor 4, and not on a new file descriptor that was dup
'ed
from file descriptor 4. Most of the time this does not matter; however, it
is important to not close any of the files related to file descriptors
0, 1, and 2. If you do close one of these files, unpredictable behavior
will result.
The special files that provide process-related information may disappear
in a future version of gawk
.
See section Probable Future Extensions.
If the same file name or the same shell command is used with
getline
(see section Explicit Input with getline
)
more than once during the execution of an awk
program, the file is opened (or the command is executed) only the first time.
At that time, the first record of input is read from that file or command.
The next time the same file or command is used in getline
, another
record is read from it, and so on.
Similarly, when a file or pipe is opened for output, the file name or command
associated with
it is remembered by awk
and subsequent writes to the same file or
command are appended to the previous writes. The file or pipe stays
open until awk
exits.
This implies that if you want to start reading the same file again from
the beginning, or if you want to rerun a shell command (rather than
reading more output from the command), you must take special steps.
What you must do is use the close
function, as follows:
close(filename)
or
close(command)
The argument filename or command can be any expression. Its value must exactly match the string that was used to open the file or start the command (spaces and other "irrelevant" characters included). For example, if you open a pipe with this:
"sort -r names" | getline foo
then you must close it with this:
close("sort -r names")
Once this function call is executed, the next getline
from that
file or command, or the next print
or printf
to that
file or command, will reopen the file or rerun the command.
Because the expression that you use to close a file or pipeline must exactly match the expression used to open the file or run the command, it is good practice to use a variable to store the file name or command. The previous example would become
sortcom = "sort -r names" sortcom | getline foo ... close(sortcom)
This helps avoid hard-to-find typographical errors in your awk
programs.
Here are some reasons why you might need to close an output file:
awk
program. Close the file when you are finished writing it; then
you can start reading it with getline
.
awk
program. If you don't close the files, eventually you may exceed a
system limit on the number of open files in one process. So close
each one when you are finished writing it.
mail
program, the message is not
actually sent until the pipe is closed.
mail
program. If you
output several lines redirected to this pipe without closing it, they make
a single message of several lines. By contrast, if you close the pipe
after each line of output, then each line makes a separate message.
close
returns a value of zero if the close succeeded.
Otherwise, the value will be non-zero.
In this case, gawk
sets the variable ERRNO
to a string
describing the error that occurred.
If you use more files than the system allows you to have open,
gawk
will attempt to multiplex the available open files among
your data files. gawk
's ability to do this depends upon the
facilities of your operating system: it may not always work. It is
therefore both good practice and good portability advice to always
use close
on your files when you are done with them.