This appendix contains information mainly of interest to implementors and
maintainers of gawk
. Everything in it applies specifically to
gawk
, and not to other implementations.
See section Extensions in gawk
Not in POSIX awk
,
for a summary of the GNU extensions to the awk
language and program.
All of these features can be turned off by invoking gawk
with the
`--traditional' option, or with the `--posix' option.
If gawk
is compiled for debugging with `-DDEBUG', then there
is one more option available on the command line:
-W parsedebug
--parsedebug
This option is intended only for serious gawk
developers,
and not for the casual user. It probably has not even been compiled into
your version of gawk
, since it slows down execution.
gawk
If you should find that you wish to enhance gawk
in a significant
fashion, you are perfectly free to do so. That is the point of having
free software; the source code is available, and you are free to change
it as you wish (see section GNU GENERAL PUBLIC LICENSE).
This section discusses the ways you might wish to change gawk
,
and any considerations you should bear in mind.
You are free to add any new features you like to gawk
.
However, if you want your changes to be incorporated into the gawk
distribution, there are several steps that you need to take in order to
make it possible for me to include to your changes.
gawk
. If your version of
gawk
is very old, I may not be able to integrate them at all.
See section Getting the gawk
Distribution,
for information on getting the latest version of gawk
.
gawk
.
(The GNU Coding Standards are available as part of the Autoconf
distribution, from the FSF.)
gawk
coding style.
The C code for gawk
follows the instructions in the
GNU Coding Standards, with minor exceptions. The code is formatted
using the traditional "K&R" style, particularly as regards the placement
of braces and the use of tabs. In brief, the coding rules for gawk
are:
int
, on the
line above the line with the name and arguments of the function.
if
, while
, for
, do
, switch
and return
).
for
loop initialization and increment parts, and in macro bodies.
NULL
and '\0'
in the conditions of
if
, while
and for
statements, and in the case
s
of switch
statements, instead of just the
plain pointer or character value.
TRUE
, FALSE
, and NULL
symbolic constants,
and the character constant '\0'
where appropriate, instead of 1
and 0
.
gawk
, I may not bother.
gnu@prep.ai.mit.edu
.
gawk
source tree with your version.
(I find context diffs to be more readable, but unified diffs are
more compact.)
I recommend using the GNU version of diff
.
Send the output produced by either run of diff
to me when you
submit your changes.
See section Reporting Problems and Bugs, for the electronic mail
information.
Using this format makes it easy for me to apply your changes to the
master version of the gawk
source code (using patch
).
If I have to apply the changes manually, using a text editor, I may
not do so, particularly if there are lots of changes.
Although this sounds like a lot of work, please remember that while you may write the new code, I have to maintain it and support it, and if it isn't possible for me to do that with a minimum of extra work, then I probably will not.
gawk
to a New Operating System
If you wish to port gawk
to a new operating system, there are
several steps to follow.
gawk
, and the other ports. Avoid gratuitous
changes to the system-independent parts of the code. If at all possible,
avoid sprinkling `#ifdef's just for your port throughout the
code.
If the changes needed for a particular system affect too much of the
code, I probably will not accept them. In such a case, you will, of course,
be able to distribute your changes on your own, as long as you comply
with the GPL
(see section GNU GENERAL PUBLIC LICENSE).
gawk
are maintained by other
people at the Free Software Foundation. Thus, you should not change them
unless it is for a very good reason. I.e. changes are not out of the
question, but changes to these files will be scrutinized extra carefully.
The files are `alloca.c', `getopt.h', `getopt.c',
`getopt1.c', `regex.h', `regex.c', `dfa.h',
`dfa.c', `install-sh', and `mkinstalldirs'.
gawk
on their systems. If no-one
volunteers to maintain a port, that port becomes unsupported, and it may
be necessary to remove it from the distribution.
gawk
for your system.
Following these steps will make it much easier to integrate your changes
into gawk
, and have them co-exist happily with the code for other
operating systems that is already there.
In the code that you supply, and that you maintain, feel free to use a coding style and brace layout that suits your taste.
AWK is a language similar to PERL, only considerably more elegant. Arnold Robbins Hey! Larry Wall
This section briefly lists extensions and possible improvements
that indicate the directions we are
currently considering for gawk
. The file `FUTURES' in the
gawk
distributions lists these extensions as well.
This is a list of probable future changes that will be usable by the
awk
language programmer.
gawk
print its warnings and
error messages in languages other than English.
It may be possible for awk
programs to also use the multiple
language facilities, separate from gawk
itself.
awk
array.
PROCINFO
Array
gawk
)
may be superseded by a PROCINFO
array that would provide the same
information, in an easier to access fashion.
lint
warnings
gawk
to the array ENVIRON
may be
propagated to subprocesses run by gawk
.
This is a list of probable improvements that will make gawk
perform better.
dfa
dfa
pattern matcher from GNU grep
has some
problems. Either a new version or a fixed one will deal with some
important regexp matching issues.
mmap
mmap
system call, its use would provide
much faster file input, and considerably simplified input buffer management.
malloc
malloc
could potentially speed up gawk
,
since it relies heavily on the use of dynamic memory allocation.
rx
regexp library
rx
regular expression library could potentially speed up
all regexp operations that require knowing the exact location of matches.
This includes record termination, field and array splitting,
and the sub
, gsub
, gensub
and match
functions.
Here are some projects that would-be gawk
hackers might like to take
on. They vary in size from a few days to a few weeks of programming,
depending on which one you choose and how fast a programmer you are. Please
send any improvements you write to the maintainers at the GNU project.
See section Adding New Features,
for guidelines to follow when adding new features to gawk
.
See section Reporting Problems and Bugs, for information on
contacting the maintainers.
awk
programs: gawk
uses a Bison (YACC-like)
parser to convert the script given it into a syntax tree; the syntax
tree is then executed by a simple recursive evaluator. This method incurs
a lot of overhead, since the recursive evaluator performs many procedure
calls to do even the simplest things.
It should be possible for gawk
to convert the script's parse tree
into a C program which the user would then compile, using the normal
C compiler and a special gawk
library to provide all the needed
functions (regexps, fields, associative arrays, type coercion, and so
on).
An easier possibility might be for an intermediate phase of awk
to
convert the parse tree into a linear byte code form like the one used
in GNU Emacs Lisp. The recursive evaluator would then be replaced by
a straight line byte code interpreter that would be intermediate in speed
between running a compiled program and doing what gawk
does
now.