Eastern Michigan ICMS Docket Fiche system

The ICMS Docket Fiche system in use at the Eastern District of Michigan was initially developed in May of 1992 to address four concerns:

The court had performed, by that time, several archiving sessions on the ICMS database, generating a separate docketfiche file for each. No effort had been made to create a single, merged fiche file. The task of merging several fiche files--known to have some, unquantified, overlap of cases--into a single file containing the most up-to-date version of each case promised to be non-trivial.
The court desired a nightly-updated complete collection of docket fiche, including archived ICMS and old Courtran dockets as well as live data, and it seemed likely that the stock ICMS update process, working with a single large fiche file, would be prohibitively expensive given the projected size of the fiche file.
After the initial merge of existing fiche files, the first automatic update from live data was expected to require several nights of processing time. A solution was desired that would be tolerant of interruption and able to resume where it left off in a later run. The stock ICMS software maintains a small file containing the date of the last run. Each run regenerates all dockets touched after that date: it assumes that all earlier dockets have already been generated (i.e., that the last update always ran to completion). This would not meet our needs; also, it would be prone to misbehavior if the fichelastdate file should for any reason contain the wrong date.
A design was wanted where any individual docket could be updated and replaced very inexpensively. This would allow the court to implement a public access/docket lookup system without an information currency problem: the vast majority of dockets could be retrieved immediately, while a docket for a case touched that day could be updated on the fly and then retrieved.

The design chosen was a directory structure of individual docket files, each one compressed (.Z). The UNIX timestamp on each file reflects the currency of that docket. The directory structure has a few levels of branching based on parts of the case number, to reduce directory search time by limiting the number of entries in any single directory. The structure can be split over two or more filesystems if the number of docket files is expected to exceed the roughly 65,000 limit on number of i-nodes in one filesystem.

The solution implemented includes the following components:

A way to do the initial conversion of one or more existing fiche file(s) into the fiche tree format. split.awk reads an existing fiche file, as created by the stock ICMS software, on standard input; its output is a cpio -c archive representing the final tree structure, where each docket file is compressed and its timestamp reflects the as-of date. This output can be un-cpioed (preserving modification times) to produce the fiche tree itself.
If this process is repeated for several existing fiche files, and the proper options are given to cpio (or pax) to update existing files only if their timestamps are older, the result will be a merged fiche tree containing the most recent version of each docket, as required in (1) above.
The output of split.awk can be piped directly into cpio (or pax); there is no need to have enough storage for the intermediate cpio archive. If the existing fiche file is on tape, the resulting data flow is directly from tape to the final compressed files in the fiche tree. The intermediate storage used is roughly on the order of the largest single docket.
A way to do the nightly automatic maintenance. The getnewer executable directly compares the last_update for each live case to the timestamp of the corresponding fiche file, producing the case numbers of cases in need of updates. The maintain script feeds these case numbers to the stock dkt.rpt whose output, in the format of a fiche archive, is then split and un-cpioed just as in (1). These processes run concurrently using pipes; here again, negligible intermediate storage is needed.
As regards the performance concern (2) we find this implementation to perform adequately; some quantitative details are further below.
Because each file's timestamp is considered individually, the process does not involve any separate file with a last-run date. It is not prone to the misbehavior possible if such a file were incorrect. If the process is killed, it will pick up on its next run with only those cases that still require updating. This satisfies the robustness consideration (3). If a court is turning on automatic updates for the first time and many dockets must be generated, it is perfectly legitimate to let the process run till it gets in the way, kill it, run it again the next night, kill it, etc., until a run produces no further updates. Once the vast majority of dockets have been brought up to date, ongoing nightly maintenance runs are brief. Should any fiche file disappear for any reason, it is simply replaced on the next run.
Because updating a single docket requires overwriting just one small file, single-case updates on the fly are practical (4 above).
A way to generate a fiche file in the original ICMS format from an existing fiche tree. The vendorrun script allows the court to produce a tape for output to microfiche whenever needed, in the same format that would be produced by the stock software. Choices such as sort order and placement of breaks can be made at vendorrun time and changed from one run to the next; they are not tied to a fixed ordering within an existing fiche file.
A daemon to implement a Simple Docket Lookup Protocol over a network connection. Given a suitable client, sdlpd allows any computer on the DCN to request dockets. These are retrieved in real time from the fiche tree. SDLP requests are done by case number; the client is not expected to know anything about the structure of the fiche tree. The SDLP server also knows how to look up party names and case titles in the inverted files used by PACER/CHASER if these are available.
One sample UNIX client, sdlpinq, is provided, which provides a user interface very much like PACER/CHASER flatinq.
A daemon to process requests for on-the-fly updates to individual docket files. The ondemand script has its own respawn entry in /etc/inittab so it is always present and reading a FIFO in the root of the docket tree. Another process, such as sdlpd, requests an update by simply writing the case office, year, docket type, and number on the FIFO, and ondemand takes care of firing up the maintain script (2 above) to update the file.
If an error in the database has been corrected, a DBA can echo the affected case number onto this FIFO to cause the docket file to be replaced with a corrected version.

Performance

While performance was one of the design concerns for this system, we can state no conclusions on how it compares to the stock ICMS fiche software. We have used our system exclusively since 1992, first on the U5000/95, later on the 486. Our only experience with the stock software has been on the U5000/90, pre-1992, and with only partial fiche files. We have never been adventuresome enough to try it on our full, merged fiche collection.

We can provide the data on the size and configuration of our system at the time of this writing and the performance figures for our own implementation; perhaps another court which has figures for the stock software can then do a rough comparison.

The Eastern District of Michigan maintains a complete fiche tree for purposes of public access: all dockets, live or archived, which have ever been on-line. In the summer of 1993, a local project converted all Courtran criminal dockets received on tape from DC into compatible formats and merged them into this tree.

The tree is split over two filesystems on a single Micropolis 1528 disk on a Dell 486/33SE configured with three such disks. These are 1 KB filesystems; every file occupies an integer multiple of two 512-byte blocks. The loss of usable space when small files are stored in larger fixed allocation units is known as internal fragmentation.

Our tree contains about 94,000 individual dockets and consumes about 266 MB total, including space lost to internal fragmentation. The total uncompressed size, for comparison with a stock ICMS fiche file, is roughly 536 MB.

As of revision 3.2 of split.awk, the Free Software Foundation gzip is used rather than compress to compress the docket files. This has improved the compression ratio. Because gzip can transparently uncompress the older compressed files as well as gzipped files, we made no effort to recompress existing files at the time of the change. Rather, those dockets which have been created or updated since the change are gzipped, while many other files in the tree remain compressed. Therefore, our overall compression ratio is better than when only compress was in use, but not as good as would likely be observed in a court that uses the 3.2 or higher split.awk from the outset.

Each filesystem is configured with 420,000 512-byte blocks and 65,488 (the maximum) i-nodes. The number of blocks was chosen according to the ratio (~ 6.4 blocks / i-node) which roughly described our fiche tree in practice at the time the filesystems were set up. The figure is now closer to 5.7 blocks per i-node, attributable to two factors:

the advent of 3.2 split.awk, and
the recent restoration of several thousand older dockets many of which are smaller than average.

If we converted all files to gzip format, the figure would likely drop further.

Of the 94,000 dockets in the tree, a typical nightly maintenance run updates 600 to 1000, and completes in an hour and a half to three hours, varying somewhat with the load of batched ICMS reports and other competition.

A vendorrun is an infrequent operation and has not been carefully timed; it seems to take a couple or three hours on an unloaded system, with output going directly to an Exabyte 8500 SCSI 8mm tape.

Interactive performance of the SDLP server and client to look up individual dockets varies with network speed, and mostly reflects the time to transmit the compressed docket. Within our own Building LAN, a typical docket is retrieved in under a second. The current server and client are both written in an interpreted language and have not been heavily optimized for speed; nor is that a priority, as we find current performance acceptable.