We believe that our needs can be met most effectively by capitalizing on software that has already been developed wherever possible. To that end, we make regular use in our operation of several pieces of freely-available software, mostly from the GNU Project of the Free Software Foundation. Because this software is not purchased, we see no conflict with the AO recommendation against local purchases of UNIX software. The free software we use, while free, is not "public-domain." The authors or their organizations do maintain copyright and, in the case of the FSF, explicitly use licensing terms to insist that the software remain freely available and that source code for it can always be obtained. These restrictions are fine with us.
The availability of source for the freeware we use has allowed us, in a few cases, to find a freeware utility close to what we need and make it exactly what we need, in a fraction of the time it would have taken us to develop it from scratch. We maintain our modifications in the form of patch files (see below) to the original freeware source. Where we have added more than one feature, each feature we have added is in its own patch file. We also create separate patch files of any platform-specific changes we made to get the utility to work under ISC or Unisys UNIX. What we will make available to other courts is the original, unmodified source code, plus our various patch files.
A patch file is a listing of the changes made to one or more source files, and it's easy for a human to read, as well as for the machine to work from. Our reasons for maintaining the freeware source as "original + patchfiles" are as follows:
The freeware utilities we have collected, and our reasons for using each one, are described below.
diff command
does, and also has a -c (context)
option.
With this option, the diff listing produced includes a
few lines of context above and below every change.
It is this
added information that allows the patch program above to
apply our patches in various orders, or mixed and matched, when
the actual line numbers of changes could vary depending on the
other patches previously installed.
GNU diff also includes
other useful features not offered by the standard diff; see the
man page.
Our only
changes to diff
are to accomodate the platform.
dd,
used mainly to copy
data between various files and devices, using specified block
sizes.
The author wanted a faster dd, and got it by using a
double-buffer scheme.
We were looking for a dd that could
handle running into the end of a tape and allow the data to
continue on a new volume.
Certain programs, like cpio,
have this ability built in, but by teaching ddd this trick, we can
use it with any other program whether designed for multiple
volumes or not.
Also, programs like cpio have a fixed method
of prompting for the next volume (typically a question at the
terminal).
Our version of ddd will use any given shell
procedure to indicate the need for a new volume, and determine
when to proceed.
This allows us to build procedures that run
in background or not directly attended.
ddd does not include all the advanced features of dd. The author wanted simple and fast. We liked the simplicity since it was easy to add the end-of-volume checks we wanted. At present, ddd can be fast or multivolume but not both; if you use the ieof= or ofull= multivolume options, ddd reverts to single-buffering to sidestep the complexities of synchronizing two processes when one of them hits the end of a volume. The task is not impossible in theory and we will probably eliminate the restriction at some point. But at that time we will probably discard ddd and transplant our modifications to the GNU dd, which is fully-featured, better written, and part of a set of GNU utilities we're using anyway.
We got ddd before we started using patch,
and our
modifications
to ddd are not as carefully classified as the changes we've
made to other freeware.
pax
understands both the
tar and
cpio -c archive formats.
We were interested in extracting some summary information from our old civil/criminal archive tapes, which consisted of files stored within tar archives nested within cpio archives. We wanted to scan these files and extract information from the tape "on the fly," without first extracting the files to disk. To do so required modifying a program that understands cpio format, and one that understands tar format. Finding pax, which understands both, we accomplished our goal with one program and one modification. Our general-purpose modification to pax allows files within any tar or cpio archive to be processed on the fly by any desired combination of UNIX commands. The commands have access to the names and other status information concerning the files being processed.
We also added the complementary ability to pax; it can filter each individual file through an arbitrary bunch of UNIX commands on the fly while writing an archive. In this mode, pax reads standard input for not just filenames, but complete specifications of file name and status in a special format, which GNU find and gawk can provide. This gives complete control over what the headers in the archive will say about each file.
As a separate modification to pax, we added some privilege checks to be made if pax is running as the superuser for a user who is not. This allows us to make a separate, setuid copy of pax for use in our backup procedure, so an otherwise-nonprivileged operator can back up the system without logging in as the superuser. pax detects this situation and drops its privilege for certain actions the backup operator could otherwise exploit. The setuid copy of pax, called bakpax, is not executable by any user except through our backup front-end, which enforces further restrictions.
We also corrected some minor mistakes in the original code, and corrected the blocking of data to conform to the ISO 9945 standard.
find tool.
GNU find provides a lot of added features,
including full
regular expression
matching, timestamp testing to the minute,
determining whether files have been read since last inode-change,
finer control over the portions of the directory tree to
search, and the ability to print any combination of desired
information
about matching files in any desired format.
This
last capability is used by pax to create customized
command sequences for processing archive files on the fly.
We added the
-abstract option allowing GNU find to apply any of
its tests and actions to "hypothetical" files as well as real
ones, allowing other programs like pax to take
advantage of 'find's capabilities.
We also added privilege checks allowing us to use a setuid copy of GNU find in our backup procedure. The setuid find allows the nonprivileged operator to scan the entire filesystem, passing the file names to the setuid pax described above, which can archive the files. Like pax, find recognizes this situation and drops its privilege for certain operations an operator could exploit. Also, like bakpax, the setuid copy of find (we call it gnufind) is executable only through our backup front-end program, which enforces further restrictions.
find / -whatever -print0 | xargs -0 something
will do the same thing as your usual find | xargs, but will
work right even if some file names contain newlines.
(We find
we have a surprising number of those in users' civ/crim report
directories.)
awk has a few features not found in the
standard System V awk,
but its greatest benefit to us may be
simply that it's different.
The AT&T awk, while ordinarily
quite adequate in speed, uses an algorithm somewhere with a
pathological worst case which can be exercised (we know!) by
certain programs doing repeated associative array operations.
On large files, these programs start fast and rapidly
deteriorate, sometimes processing only a few records per
minute.
All AT&T-derived awks I've seen do this.
Since the
FSF had to write gawk without reference to the AT&T source,
they picked a different algorithm or implementation that simply
doesn't have that problem.
An awk program we wrote to convert
our final Courtran docket tapes, which ran for hours without
making a dent in the first tape, did the entire set in 27
minutes running under gawk.
Happily, gawk also stacks up pretty well on speed for programs
that don't exercise the bug in awk.
gawk also tries to support
unlimited output redirections, which is a handy feature, and we
made two minor modifications so that works right.
We have also
found and
patched
several other flaws in gawk that caused it to
behave differently from nawk in certain situations, or to consume
excessive amounts of memory.
As far as adding brand-new features to the language--it's tempting. On the other hand, part of awk's appeal has always been its simplicity and high power-to-manual-weight ratio. If an everything-and-the-kitchen-sink-too language is desired, there's always perl. So we've settled on four new additions to the gawk language, which we feel are simple, general, applicable to a wide variety of problems, and consistent with awk's mission. We'll probably take two of them back out later, replacing them with a single more-general feature to cover them both.
The two most significant are probably
coprocesses and the exec() builtin.
With these, just about any other special requirement can be met by setting
up appropriate coprocesses rather than by further extending the awk
language.
For example, the original awk had no easy way to make a string
upper or lower case, so SysV.4 awk
extended the language with the toupper()
and tolower() builtins.
The extensions would probably not have been
necessary had it been easy to set up a coprocess that execs
tr, feed it
strings as desired, and read what comes back.
That's exactly the
kind of thing made possible
by our coprocess and exec() additions.
Another addition we've been using widely is a cal() builtin for doing
calendar calculations.
We found a great piece of freeware by W.M. McKeeman
that converts dates from the Gregorian calendar to Julian day numbers and
back, valid from 1582 until way in the future, added the ability to deal
with the UNIX date/time structures, and built the whole thing in for any
gawk program to use as a built-in function.
It can be used to test the
validity of an entered date, calculate the interval between two dates,
ensure that one date precedes or follows another, etc.
All of our
locally-developed ICMS reports, among other things, now use this feature
of gawk to check date parameters in the requests.
The scripts are now
simpler than they were before, despite much more thorough validity tests.
The last function we added to gawk
is for dealing with file information.
When we modified
pax and
GNU find to work with each other as coprocesses,
we did it by defining a compact way of representing
all the information known about a file,
and teaching both pax and find to exchange file
information in that format.
The statxr() builtin in gawk allows gawk
programs also to understand that format, and examine individual file
status fields by name without the usual bother of trying to parse the
output of
ls
or what not and wondering what happens if a file turns
up with newlines or special characters in its name.
To avoid starting to turn gawk into perl, we did not
add any facility for
gawk itself to stat a file or
change a file's status.
statxr() simply
provides a way to reliably and simply exchange complete file information
with other programs such as find and pax.
Even so, statxr scores lowest
of our four modifications on the scale of generality and usefulness as a
part of gawk.
We may at some future time find a way to move it back out
of gawk and
provide the same ability with some combination of a more
general-purpose builtin and a separate program that can be used as a
coprocess.
When we do that, we can probably handle cal() the same way.
In developing a significant body of code in gawk 2.13, we have discovered a few more bugs that we have not yet corrected. They seem to be connected with:
Version 2.13 is by now a pretty old gawk, and they've probably fixed many of the bugs in current versions. The catch is that porting our coprocess, etc., modifications to a newer gawk would take some time, so it hasn't been undertaken to date.
I
have recently been learning Tcl
which seems to be able to do a lot of what we've been using our modified
gawk to do, and possibly better. I got into it because it was
included with SCO OpenServer 5, but it is also freely available, so I obtained
the source and built it for the Interactive systems as well.
The CGI scripts for our www-based documentation are largely written in Tcl,
as are the scripts that merge our live and archived index information for
use by the public access clients.
It seems to be more solid than gawk, and might be a better choice for future
scripts.
The AT&T SysV.3
mkdir and
mv commands (that we know of)
contain flaws in the testing of permissions.
Both commands
were setuid in earlier versions of UNIX and needed to explicitly
check the access of the user's real ids.
Both commands have
been made no longer setuid in this version of UNIX, which
provides new system calls eliminating the need; but both
commands have been incompletely updated to perform access
checks correctly based on the user's effective ids.
As a
result, these commands will often refuse (with a "Permission
denied" message) to create a directory or rename a file where
the real user could not, even though the effective user can.
The situation crops up frequently in applications front-ended
by invoke.
The GNU versions of these utilities simply use
the appropriate system calls and let the kernel take care of
access checking rather than trying to second-guess it; they
work fine and completely eliminate the problem.
When making a named pipe (FIFO) with the AT&T
mknod, the
FIFO gets created with the real rather than effective ids, and
can be inaccessible to the process that created it.
The GNU
mkfifo command uses the effective ids, in accordance with the
POSIX practice, eliminating the problem.
egreg to get the Julian day, subtract 1,
and use egreg again to get the corresponding date.
Your script
logic does not need to worry about wrapping around the beginning
of the month, week, or year.
McKeeman's original program is called greg; egreg
is our
"extended"
greg, which has one added feature in that it can do
a (simplistic but effective) search for the latest date satisfying
other specified fields; this simplifies finding the last day or week
of a month or year, etc.
This is triggered simply by putting a
too-big value in one of the fields; to find the last date in February
1993, say 'egreg 1993 2 32' and it will reply '1993 2 28 1 5 59 2449047'.
(It's a Sunday(1), week 5 of the year, day 59 of the year.)
The above is a cop-out approach to one of a wide variety of possible
date calculation problems.
Since McKeeman designed greg to act as a
logic programming predicate, it would be ideal to incorporate it into
an LP language and be able to say "find me a date satisfying these
arbitrary constraints." Anyone know of a freeware Prolog?
In the meantime, incorporating egreg into any programming language
would be more flexible than using it as a clumsy external command
from shell scripts.
My current thinking is to eventually graft it
into gawk.
greg are now available as the cal() builtin function
in gawk.
We no longer use egreg in
any new development--gawk is used instead--
and egreg is still sitting around only to satisfy any scripts we
developed earlier and haven't gotten to updating yet.
Our biggest reason for building it (confession time) was to build
cat
into it as a builtin command, so we could have a shell that
had cat as a builtin command.
We had a big old
hairy script
intended to process a ton o' files on the fly from a tape archive,
and building cat into the shell reduced its
fork/exec
overhead by a factor of about 11.
Caution: there are some reasons you might not want to make bash the only shell on your system. For starters, compare the size of the executable to the size of /bin/sh. Also, the : command in bash always has a 'success' exit status, whereas in sh and ksh it reflects the status of any command substitution in that simple-command. Many of our scripts depend on that and, therefore, break when run under bash.
more.
This one is very feature-rich and
has been tested long and hard on many different terminal types.
Our
purpose in building it was to replace the page
program that comes with
CHASER, since that program hasn't worked well on most of the terminals and
emulators we have.
By using less, we can also provide a CHASER 'view
recent' option that allows paging back, and case-insensitive searches so
a user needn't worry about whether to search for order or ORDER.
All of the key mappings and prompts in less
can be completely redefined,
so we were able to give it a new personality pretty similar to stock CHASER.
So widely-used and mature is
The install script offers to build a version of
To simplify making up new
Files compressed with gzip seem to consistently begin with the bytes 1f 8b 08 00
and this observation has been used to update the
magic file on some of our systems
so the file tool can identify such files.
This program autoconfigures so well it needed absolutely no changes to
compile and run on any of our systems.
None.
less
that it compiled the first time with
no local changes and not even a warning.
(OK, for SCO it needed an
#include less without shell or
editor escapes, etc.
For PACER we certainly don't want those escapes, but
they can be disabled just as effectively by using -k
with a key-definition
file that invalidates those keys, and that way we can have both original
less
functionality and the PACER personality with only one executable.
less
personalities, we divided our PACER key
definitions into two files.
The first, nokeys.less, simply invalidates
every single keystroke sequence built into less (verified
by checking the
source where the default command table is created).
If you invoke less
with only this definition file, you will be able to do absolutely nothing,
not even quit.
Then another key definition file can be created containing
just the key definitions desired for the new personality.
Multiple -k
options are allowed, and later ones take precedence, so by invoking
less -k nokeys.less -k mykeys.less you get a brand new personality.
compress.
There are reports
of a legal opinion that anyone who uses
compress is infringing with each use.
gunzip/zcat can uncompress files in
gzip's own format, compress format, or
pack
format, since the uncompression algorithms are not patented.
It can
automatically detect which format it is trying to uncompress.
The algorithm
used by gzip often yields better compression than
compress.
It also
includes a CRC, allowing it to determine if the file has been corrupted.
sz to read a file already open on the
specified handle;
the command line argument specifies only the name to be
passed to the receiving system.
This allows a script to control the name
under which a file will be received without having to worry about what the
file is named on the system of origin.