Literate Programming

Literate programming is the practice of combining the actual source code of a program with its documentation in ways which permit the documentation and source code to be produced as a high-quality published document and to serve as the actual basis of the computer program. This area tends to overlap with digital typesetting and document markup, as it perforce combines a computer programming language with some sort of document markup system. While many different programming languages have been integrated into literate programming systems, the dominant markup system has been Knuth's TeX typesetting system.

The original and canonical literate programming system is WEB, written by the originator of the concept, Dr. Donald Knuth. This system, which combines the Pascal programming language with the TeX typesetting system is how the TeX system itself was written. There have been a number of derivative literate programming systems which share the concept that the literate programming system should be language aware in its handling of the programming language at hand. Of these, my personal preference is John Krommes FWEB system, which handles multiple languages, including a verbatim mode, in the same web. The author's documentation is on line (as of 5 DEC 99) at http://w3.pppl.gov/~krommes/fweb_toc.html . The package itself, including some pre-compiled binaries, is available by FTP from ftp.pppl.gov/pub/fweb.

One of the limitations of language aware literate programming systems is that a new set of processors must be created or adapted for each programming  language used. This is not a trivial process, although systems such as Norman  Ramsey's Spider WEB generator are intended to make this easier. Another approach to this is to ignore the programming language in use. Such language independent literate programming do not offer the language specific typesetting of the programming language in documentation offered by their language-aware cousins. But they can be used with any programming  language, and some programmers prefer that the typeset documentation retain  the indentation and appearance of the input file. My personal preference from this class of literate programming systems is Preston Briggs Nuweb  system, although a certain amount of adaptation is required to get it fully functional on non-Unix systems. A source for Nuweb is the CTAN archive net, specifically the web/nuweb/nuweb0.87b/ directory, although as mentioned above some work is required to get this to work on non-Unix systems.  Nuweb is currently being maintained on sourceforge; the version there appears intended for Unix systems.

SGML-based Literate Programming Systems

There is some ongoing work on creating literate programming systems that exploit SGML markup technologies. Elliot Kimber has put together some basic work using SGML architectural forms to allow DSSSL scripts to be documented, and Norman Walsh has exploited this same capability to permit DocBook-based documents to be used as DSSSL source. This approach has some limitations as a literate programming system, but is an interesting hint at some of the capabilities inherent in an SGML based system. There is also a working group on XML-based literate programming, although I am not aware of its status. C.M. Sperberg-McQueen has done some basic work in SGML-based literate programming, although he considers the work unpublished and incomplete. While SGML/XML based systems may be the next logical step in literate programming, I am not currently aware of any functional systems using SGML as their basis.

Partially because of this, and partially to improve my capabilities in the DSSSL programming language, I have done some work on an SGML-based system for literate programming. While it is not complete or completely functional, the "proof of principle" draft is available as a Nuweb-based literate program, in either PDF or the Nuweb source. 

DBLP

The proof of principle experiment mentioned above led me to build a DocBook-based SGML DTD for literate programming, together with DSSSL-based weave and tangle implementations. More information on this, and the unique files needed for the system, is available here.

References and Further Reading

Donald E. Knuth. Literate Programming (CSLI Lecture Notes Number 27). Center for the Study of Language and Information, Stanford University: Stanford CA, 1992. ISBN 0-9370-7380-6. This book compiles a variety of documents by Knuth on the subject of literate programming including what I believe to have been the first significant published article on the subject (in the British Computer Journal). Since Knuth invented literate programming, this book provides a fascinating look at his thinking on the subject.

Wayne Sewell. Weaving a Program: Literate Programming in WEB. Van Nostrand Reinhold, New York: 1989. ISBN 0-442-31946-0. As one might guess from the title, this is a book about programming with the original WEB system, but it also includes a useful overview of other language aware systems and discussion of the issues involved in porting WEB-based systems to other languages. Appendices include the complete source code to the TANGLE and WEAVE processors (which between them make up the original WEB system).

There is an internet news group on this subject, comp.programming.literate , which publishes a FAQ on the subject. An old version of this FAQ is on line at file://rtfm.mit.edu/pub/usenet-by-group/comp.programming.literate/comp.programming.literate_FAQ .

There is a nice introduction to literate programming and literate programming tools at http://vasc.ri.cmu.edu/old_help/Programming/Literate/literate.html.


Back to Index



Last modified: Fri Nov 23 09:32:37 Pacific Standard Time 2001