DBLP: DocBook-Based Literate Programming

Mark Wroth

No charge is made for the use of this system. A nominal charge for reproduction of the media on which the system is provided may be levied, however.
The complete literate programming source of the system is made available to the user.
Modifications of this system are clearly identified as derivative works. Provision must be made to identify the author of the changes, and to provide error reports to the modifier rather than the original author.

Works created using this system (as opposed to modifications of this system itself) are subject to whatever copyright or use provisions may be imposed by the authors of those works; the use of this system shall not be the basis for any change or modification of the rights of such authors.

The author of this system, in permitting its general use, makes no warranty of its suitability for any purpose nor any guarantee of its correctness. In other words, you are welcome to use this system, but you do so at your own risk.

Table of Contents

1. Functional Description

1.1. Purpose of the Functional Description

1.2. Background

1.2.1. Literate Programming
1.2.2. SGML and XML
1.2.3. DocBook

1.3. Objectives

1.4. Existing Methods and Procedures

1.5. Proposed Methods and Procedures

1.5.1. Summary of Improvements
1.5.2. Assumptions and Constraints

1.6. Design Considerations

1.6.1. System Functions
1.6.2. Flexibility

1.7. Environment

1.7.1. Equipment Environment
1.7.2. Support Software Environment
1.7.3. System Development Plan

2. DTD Implementation

2.1. Purpose

2.2. Top Level Organization

2.2.1. The programlisting Customization
2.2.2. The literalchar element

3. SGML Tangle

3.1. Purpose
3.2. Implementation
3.3. Implementation Notes

4. SGML Weave

4.1. Purpose

4.2. Minimum DocBook Customization Layers

4.3. The literalchar Processing Rules

4.4. The programlisting Customizations

4.4.1. Print Customizations
4.4.2. HTML Customizations
4.4.3. The programlisting Proper
4.4.4. HTML Auxiliary Definitions
4.4.5. Original programlisting Code
4.4.6. Original xref HTML Code

5. Sample Literate Program

5.1. The Sample Document

6. System Performance

6.1. Sample Code Output
6.2. Sample Woven Output
6.3. Evaluation

A. Acronyms

B. Bibliography

Bibliography

C. Installation and Usage Tips

Chapter 1. Functional Description

The DocBook-based Literate Programming system provides a mechanism to write literate programs using a minor extension of the SGML DocBook DTD.

The system consists of two main parts:

A DTD that extends DocBook to add the logic needed for literate programming. These are relatively minor extensions to the basic DTD. The details are discussed in Chapter 2.
DSSSL style sheets that, together with a DSSSL engine that implements some of James Clark's extensions, serve as tangle and weave processors. These style sheets are discussed in Chapter 3 and Chapter 4, respectively.

This document also discusses the design considerations behind the implementation, and provides a short sample document that serves as an example of how the DTD is used (and serves as a simple test case).

1.1. Purpose of the Functional Description

This functional description for DocBook-based Literate Programming is written to provide:

The system requirements to be satisfied which will serve as a basis for the system design.
Information on performance requirements, preliminary design considerations, and user impacts including fixed and continuing costs.
A basis for development of system tests.

1.2. Background

1.2.1. Literate Programming

Literate programming is a style of computer programming pioneered by Professor Donald Knuth in the early 1980's (the defining paper was published in The Computer Journal in May 1984, at which point the earliest literate programming system, WEB, was already functional).

The key tenent of literate programming is that computer programs are written to be read and understood by human beings as well as computers—and the organization of the program's source code should allow the program author to explain the purpose and implementation of the code to the human audience. In Knuth's own words, "Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do."[1].

To this end, a literate programming system typically supports several functions:

Mechanisms to extract or otherwise make available to the computer the code instructions making up the computer program in a form usable by the computer (e.g. in a form suitable for submission to the compiler), and to translate the source code and documentation into appropriately rendered documentation.
The ability to write code documentation using the full range of typographic features used in normal typesetting.
The ability to arbitrarily intermingle code and documentation.
The ability to arbitrarily reorder code fragments so that the exposition of the program design may precede in an order appropriate for the human reader, while retaining the instruction order required for the computer to correctly execute the program.

The process of making the code instructions available to the computer is called the tangle phase of processing, while the process of rendering the documentation is called the weave phase. These names derive from the names of the two programs making up the original WEB system that performed these functions.

Some literate programming processors, including Knuth's original WEB system, additionally "pretty print" the programming language instructions (for example, by setting the reserved words of the language in boldface type). This pretty print capability is not universal, and in fact is not desired by some programmers. Another capability sometimes found is the ability to define macros in the literate programming system, usually to supplement the capabilities of the underlying programming language.

Since the introduction of Knuth's WEB system, which was used for the TeX and MetaFONT programs, a variety of other literate programming systems have appeared, as have a number of other mechanisms for improved commenting of source code[2]. A complete review of the available systems is beyond the scope of this discussion. However, experience with the original WEB system, John Krommes' FWEB, and Preston Brigg's Nuweb literate programming systems, the doc LaTeX 2e documentation system, and Normal Walsh's DocBook-based DSSSL style sheet documentation systems (both of which may be characterized as improved commenting systems) has been a significant factor in defining the characteristics desired in this system.

1.2.2. SGML and XML

The Standard Generalized Markup Language (SGML) is defined in ISO Standard ISO 8879:1986, and defines a mechanism for defining markup languages and enforcing certain relationships among the data contained in appropriately marked up documents. Among other things, SGML provides a clear mechanism for making explicit what parts of a document have what role, and does so in a way that encourages the construction of tools able to parse such documents—and hopefully do useful things with the parsed information.

In the late 1990's SGML was supplemented by XML, which is a simplified SGML application designed to make it easier to construct parsers and other processing tools. In general, XML has the same basic functionality as SGML, with some of the lesser-used features of the language omitted. Tools that can process SGML documents can also usually process XML documents; it is not generally the case than an XML tool can process a general SGML document.

1.2.3. DocBook

One of the useful features of SGML is the ability to create document type definitions (DTD) that describe the structure of documents and the markup that makes that structure explicit. This permits the creation of general document types—and the tools to process them.

One such document type is DocBook, a document type created for computer-oriented technical books. DocBook is maintained by OASIS, and has evolved into a flexible and robust document definition, supported by a variety of tools. DocBook exists in both SGML and XML versions.

In particular, Norman Walsh has defined a set of DSSSL style sheets that process DocBook documents and produce printed and HTML output renderings. These style sheets are both extensible and customizable, and serve as a significant base for computer-oriented documentation—such as a literate program.

1.3. Objectives

This project creates a set of extensions to the DocBook SGML DTD to allow its use for literate programming markup.

The resulting system shall

Provide a mechanism to extract program files from the literate programming source in appropriate forms for their use as source code in the intended programming language or languages.
Permit the use of existing DocBook-based tools with only minor modifications (ideally none) to produce documentation of software projects.

1.4. Existing Methods and Procedures

There are a variety of literate programming systems in use at the current time. In general, they can be described in three main categories:

Language-aware systems. These systems are designed to support a single computer programming language, and are marked by the ability to do limited parsing of code sections, usually accompanied by "pretty printing" of the computer source code. Knuth's original WEB system falls into this category. Most language aware systems use TeX as their typesetting system.
Language-independent systems. These systems attempt no parsing of the code sections. Most language-independent systems use TeX as their typesetting systems, although there is some move towards HTML as a documentation language.
Comment-based systems. These systems extend the comment structures of the supported language in an attempt to provide usable documentation. Examples of such systems are the doc system used to document LaTeX 2e packages, and—I believe—the Javadoc system.

Most or all of these existing systems target a specific output format, usually the printed page, rendered via the TeX typesetting system. In part, this is a historical accident; the first literate programming system used TeX. However, attempts to implement other documentation languages (notably an attempt to write a C-language literate programming system with troff as the documentation processor) indicate that the demands on the documentation branch of a literate programming system are relatively taxing. This has undoubtedly contributed to the limited number of output forms supported, although some attempts to support HTML have been made.

However, I am not aware of any literate programming systems based on SGML markup of the source code[3]. This omission seems unfortunate, given the obvious applicability of SGML markup to the process of defining a literate program.

Since the original creation of DBLP at least two XML-based literate programming systems have been developed. Rafael R. Sevilla (<sevillar@team.ph.inter.net>) has developed an implementation that is functionally an addition to DocBook or other systems, implemented using XML namespaces. This system is at http://xml-lit.sourceforge.net/. Separately, Norman Walsh has reimplemented a literate programming system as part of the DocBook CVS source on SourceForge.

1.5. Proposed Methods and Procedures

This project implements an SGML markup-based literate programming system. Literate computer programs are written using some form of text editor—preferably, but not necessarily an SGML-aware editor. The documentation and programming language code is marked up using an extension of the DocBook DTD. Once the literate program is written (partially or completely), it is processed using a variant of the Jade DSSSL engine with either a tangle style-sheet (which is a stand-alone style sheet provided by this project), or a weave style sheet (which extends the DocBook Modular Style Sheets in several small but important ways).

In principle, the DTD defined here could be used by other programs implementing the weave or functionality. This would be entirely in keeping with the principles of SGML. In particular, an extension of this system to XML and re-implementation of the processors in XSL seems a natural extension to this work. However, only the DSSSL implementations mentioned above are implemented as part of this work.

1.5.1. Summary of Improvements

Creating an SGML-based literate programming system makes it possible to exploit the wide variety of SGML (or, with minor variations, XML) tools. In particular, this makes it straightforward to produce different output formats, such as a printed version or an HTML version.

The use of SGML also separates the definition of the document markup from the definition of the processing tools. In principle, this allows many different tools to be used with literate programs written with this system. This advantage is largely theoretical at this point, however.

1.5.2. Assumptions and Constraints

It is assumed that the user of this system is familiar with the use of SGML-based tools, and the DocBook DTD.

The processing DSSSL style sheets assume the presence of selected DSSSL extensions implmented in James Clark's Jade engine (specifically the entity and processing-instruction flow objects). This capability is necessary to use the specific style sheets provided by this project, but the use of the DTD is not affected.

1.6. Design Considerations

Minimize changes to the DocBook DTD to allow processing using existing output tools.
Keep the existing programlisting as the basis for a scrap (again to minimize changes needed in the output processors).
Use added attributes for file output, definition and reference to continued and continuation scraps. Use of both continued and continuation markup is semantically redundant, but will make processing of the weave branch easier—I think.
Use xref for reference to definition scraps, and the xreflabel attribute for the definition scrap title. Again, this is driven by a desire to minimize changes that would impact the output processing tools.
Make maintaining, modifying, and adding this functionality to other document types as easy as possible (see Section 1.6.2).

1.6.1. System Functions

This system performs three basic functions:

Provide a DTD that allows the markup of literate programs, including a flexible system for describing the purpose and implementation of the computer program (based on DocBook) and markup of the program code itself to allow the literate program to produce the computer instructions.
A tangle mechanism that actually produces the computer instructions from the literate programming source code. This implementation, SGMLTangle.dsl is a DSSSL style sheet using extensions to the DSSSL standard as implemented in James Clark's Jade DSSSL engine.
A weave implementation that renders the literate programming source into useful documentation. This style specification, SGMLWeave.dsl, extends Norman Walsh's Modular DocBook Style Sheets. It provides both print and HTML output.

1.6.2. Flexibility

It is the intention of this system to:

Maintain the ability to update the extensions to new versions of DocBook as they are published.
Make the extensions as easy as possible to to move between the SGML and XML versions.
Make it as simple as possible to add the literate programming functionality to other DocBook-based DTDs.
Provide a basis on which other implementations of the tangle and weave functions could be built to support other tool chains.

1.7. Environment

1.7.1. Equipment Environment

The hardware required to use this system is defined by the selected software tools (in particular the SGML processor). Development and testing was accomplished on 32-bit Windows-based computers.

1.7.2. Support Software Environment

Effective use of this system requires three major classes of supporting software. Except as noted with regard to the tangle application, it is not necessary that the actual supporting software used in the development of this system be available.

1.7.2.1. Programming Editor

Some form of text editor is needed to write the literate program. The minimum necessary functionality is the ability to write a plain text output file.[4]

The authoring process will be much easier, however, if the editor supports both SGML markup and the programming language or languages in which the code is being written. A customizable editor such as emacs is probably a useful choice. Most of the work on the DBLP system was done with emacs and the PSGML major mode.

1.7.2.2. SGML Processors

The literate programming DTD extends the DocBook DTD, and therefore requires the underlying DTD. This implementation specifically uses the SGML DTD, Version 4.1 as the basis for its extension.

To use this system for literate programming without additional customization, a DSSSL engine that implements James Clark's \texttt{entity} and \texttt{processing-instruction} extensions to the DSSSL standard is needed. The Jade or OpenJade engines meet this requirement.

The weave style sheet is based on Norman Walsh's Modular DocBook Style Sheets [[WALSH-DSSSL]], which must be available if the SGMLWeave.dsl style specification is used.

As an SGML application, of course, documents marked up with this DTD can, in principle, be used by any SGML-compliant tool.

While this DTD and the associated DSSSL style sheets were written for the SGML version of the DocBook DTD, it should require only minor changes to re-implement this system in XML. This has not, however, been tested.

1.7.3. System Development Plan

This system was originally implemented using the Nuweb literate programming processor, which is a TeX-based system. The choice to implement the system in this way was a "bootstrapping"choice; until this system achieved basic functionality, it would be difficult to implement the system in an SGML-based system.

Once basic functionality has been demonstrated, this system will be published on the World-Wide Web. Future modifications and extensions may or may not be made, depending on the author's use of the system and feedback from other users (if any).

Chapter 2. DTD Implementation

2.1. Purpose

This DTD extends DocBook 4.1 to allow literate programs to be written using the DTD.

2.2. Top Level Organization

<dblp.dtd (ID: SCRAP00)>=

  1 <!--
  2 DBLP.DTD; a literate programming DTD based on DocBook
  3 PUBLIC 
  4 "-//Mark Wroth//DTD DocBook V4.1-Based Extension Literate Programming 1.1//EN"
  5 -->
  6 <The `literalchar' element>
  7 <Add programlisting attributes>
  8 <!ENTITY % programlisting.element "IGNORE">
  9 <!ENTITY % docbook PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
 10 %docbook;
 11 <Redefine the programlisting element>

We put the literalchar element definitions first, so that entity definitions will override those declared by DocBook itself, if necessary.

2.2.1. The `programlisting` Customization

We will set up the programlisting element to ignore the DocBook defined definition and substitute our own. We override the original definition primarily to include the literalchar element; otherwise there are no changes to the basic element definition.

<Redefine the programlisting element (ID: SCRAP01)>=

  1 <!ELEMENT programlisting - - 
  2   ((CO | LineAnnotation | literalchar | %para.char.mix;)+)>

To the attribute list, we add the attributes that we will use for literate programming:

file: The file name the code is to be written to. This attribute is required for the scraps which begin output files.
continuedfrom: The ID of the scrap this scrap continues.
continuedin: The ID of the scrap this file continues.

We also make use of several of the attributes already defined for programlisting:

ID: Unique identifier for this scrap; required for scraps which are continued, continue others, or are the head of a definition section.
xreflabel: The title of the scrap; used for the head of a definition scrap. This fact is used in a number of places in the processing, and those additional uses depart somewhat from the standard DocBook uses. In particular, this implies that no continuation scrap shall have an xreflabel attribute.

<Add programlisting attributes (ID: SCRAP02)>=

  1 <!ENTITY % local.programlisting.attrib "
  2  file          CDATA  #IMPLIED -- file name for output file --
  3  continuedfrom IDREF  #IMPLIED
  4  continuedin   IDREF  #IMPLIED ">

The choice to use both a continuedfrom and a continuedin attribute allows the creation of a doubly-linked list for each code section. This permits the programmer to order the code scraps in any desired sequence, while permitting the processing system to easily traverse the scraps that make up the section.

While one or the other direction of the links between scraps is sematically redundant in an absolute sense (with adequate effort, the backward links could be constructed from the forward set, and vice-versa), including both sets of links makes construction of the processing tools much simpler.

2.2.2. The `literalchar` element

The following definitions are used to provide a workaround to get an actual "less than" character into the SGML output. Since the character has syntactic meaning to the SGML parser, by default it is escaped when placed in the SGML output as character data.

By defining an element to contain the required information, we let the DSSSL processor have access to it. Defining entity references to it simplifies the actual data entry. If particular combinations seem appropriate for a specific programming language it would make sense to define entities which make syntactic sense. This would allow one to use, for example &logicaland; instead of &&[5].

<The `literalchar' element (ID: SCRAP03)>=

  1  
  2 <!ELEMENT literalchar  - o EMPTY  
  3        -- literal data, to be handled in the DSSSL --> 
  4 <!ATTLIST literalchar data CDATA #REQUIRED> 
  5 <!ENTITY  lessthan  "<literalchar data='&#60;'>"    
  6        -- ``less than'' sign--> 
  7 <!ENTITY  greaterthan  "<literalchar data='>'>"    
  8        -- ``greater than'' sign--> 
  9 <!ENTITY  ampersand "<literalchar data='&#38;'>"    
 10        -- ``ampersand'' sign-->

Continued in SCRAP04

We also define entities that map to the relevant SGML markup, for use when we are defining SGML systems.

<The `literalchar' element (ID: SCRAP04)>+=

  1  
  2 <!ENTITY  STAGO  "<literalchar data='<'>"    
  3        -- SGML Tag Open (``less than'' sign) --> 
  4 <!ENTITY  TAGC  "<literalchar data='>'>"    
  5        -- SGML Tag Close (``greater than'' sign) --> 
  6 <!ENTITY  ERO "<literalchar data='&'>"    
  7        -- SGML Entity Reference Open (``ampersand'' sign) -->

Chapter 3. SGML Tangle

3.1. Purpose

The SGMLTangle style sheet performs the tangle phase of literate programming. In other words, it takes the literate programming code and rearranges it so that it is acceptable to the computer as a computer program (assuming, of course, that the programmer has correctly written the program!)

3.2. Implementation

In order to make it as simple as possible to bring this system up on a new machine, this style sheet is presented largely as a single scrap. Notes on the program follow the implementation scrap.

<SGMLTangle.dsl (ID: SCRAP05)>=

  1 <!DOCTYPE style-sheet  
  2   PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN"> 
  3 <style-sheet> 
  4 <style-specification 
  5  id = "tangle"> 
  6 <style-specification-body>
  7 
  8  (declare-flow-object-class entity 
  9    "UNREGISTERED::James Clark//Flow Object Class::entity") 
 10  (declare-flow-object-class formatting-instruction 
 11    "UNREGISTERED::James Clark//Flow Object Class::formatting-instruction")
 12 
 13  (default (process-node-list (element-children)))
 14 
 15  (element programlisting
 16   (make sequence 
 17    (if (attribute-string "file") 
 18      (make entity 
 19        system-id: (attribute-string "file") 
 20        (make sequence 
 21          (process-children) 
 22          (if (attribute-string "continuedin")
 23            (with-mode continuation
 24              (process-element-with-id 
 25                (attribute-string "continuedin")))
 26            (empty-sosofo))))
 27       (empty-sosofo))))
 28   
 29  (element (programlisting xref)
 30    (with-mode definition
 31      (process-element-with-id 
 32        (attribute-string "linkend"))))
 33 
 34  (mode definition
 35    (element programlisting
 36      (make sequence
 37        (process-children)
 38        (if (attribute-string "continuedin")
 39            (with-mode continuation
 40              (process-element-with-id 
 41                (attribute-string "continuedin")))
 42            (empty-sosofo)))))
 43   
 44  (mode continuation
 45    (element programlisting
 46      (make sequence
 47        (process-children)
 48        (if (attribute-string "continuedin")
 49            (with-mode continuation
 50              (process-element-with-id 
 51                (attribute-string "continuedin")))
 52            (empty-sosofo)))))
 53      
 54  (element literalchar 
 55   (make sequence 
 56    (make formatting-instruction 
 57      data: (attribute-string "data")))) 
 58 
 59  <Define element-children function>
 60 
 61 </style-specification-body>
 62 </style-specification> 
 63 </style-sheet>

The (element-children snl) function finds all of the elements that are direct children of singleton nodelist snl. This is useful for filtering, processing, and counting, when you're not interested in any text, PI, or comment nodes. This function (and the default processing rule) were provided by Christopher R. Maden <crism@maden.org> in a message on the DSSSList dated Mon, 02 Apr 2001 22:36:33 -0700, and titled "Re: (dsssl) "Default" processing rule?".

<Define element-children function (ID: SCRAP06)>=

  1  (define (element-children #!optional (snl (current-node)))
  2    (select-by-class (children snl)
  3                     'element))

3.3. Implementation Notes

Chapter 4. SGML Weave

4.1. Purpose

The SGMLWeave program (SGMLWeave.dsl) is a relatively minor customization of Norman Walsh's DSSSL stylesheets for DocBook. In fact, for minimum functionality, the only necessary change to the stylesheets is the addition of a processing rule for the literalchar element, and that processing rule (shown in Section 4.3) is relatively simple.

This is in fact a design goal of the DocBook-based literate programming system, as it exploits the significant efforts of a number of people to develop tools for DocBook.

4.2. Minimum DocBook Customization Layers

We now create three related DSSSL style sheets, designed to make it as easy as possible to use the weave functions.

The first of these is a master style sheet, that uses Jade ability to include multiple style sheets in a single document, selecting among them from the command line. This version, in SGMLWeave.dsl, includes a print style sheet and and HTML style sheet.

<SGMLWeave.dsl (ID: SCRAP07)>=

  1 <!--
  2 PUBLIC "-//Mark Wroth//DOCUMENT DBLP Weave Print Rules 1.0//EN"
  3 
  4 This document is intended to be a minimum DocBook style-sheet
  5 customization layer.
  6 -->
  7 <!DOCTYPE style-sheet
  8   PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN" [
  9  <!ENTITY dbprint.dsl PUBLIC 
 10    "-//Norman Walsh//DOCUMENT DocBook Print Stylesheet//EN"
 11    CDATA DSSSL>
 12  <!ENTITY dbhtml.dsl PUBLIC
 13    "-//Norman Walsh//DOCUMENT DocBook HTML Stylesheet//EN"
 14    CDATA DSSSL>
 15 ]> 
 16 <style-sheet> 
 17 <style-specification id="print" use="dbprint"> 
 18 <style-specification-body>
 19 
 20 <literalchar print processing rule>
 21 <programlisting print customizations>
 22 
 23 </style-specification-body>
 24 </style-specification>
 25 <style-specification id="html" use="dbhtml"> 
 26 <style-specification-body>
 27 
 28 <literalchar print processing rule>
 29 <programlisting HTML customizations>
 30 
 31 </style-specification-body>
 32 </style-specification>
 33 <external-specification id="dbprint" document="dbprint.dsl">
 34 <external-specification id="dbhtml" document="dbhtml.dsl">
 35 </style-sheet>

The second style sheet is a redaction of the master sheet. It contains only the print style sheet, and is written to print.dsl. The primary motivation for having this separate file is that it is easier to write higher level customization layers for single purpose style sheets—at least at this author's level of expertise!

<print.dsl (ID: SCRAP07A)>=

  1 <!--
  2 PUBLIC "-//Mark Wroth//DOCUMENT DBLP Weave Print Rules 1.0//EN"
  3 
  4 This document is intended to be a minimum DocBook style-sheet
  5 customization layer.
  6 -->
  7 <!DOCTYPE style-sheet
  8   PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN" [
  9  <!ENTITY dbprint.dsl PUBLIC 
 10    "-//Norman Walsh//DOCUMENT DocBook Print Stylesheet//EN"
 11    CDATA DSSSL>
 12 ]> 
 13 <style-sheet> 
 14 <style-specification id="print" use="dbprint"> 
 15 <style-specification-body>
 16 
 17 <literalchar print processing rule>
 18 <programlisting print customizations>
 19 
 20 </style-specification-body>
 21 </style-specification>
 22 <external-specification id="dbprint" document="dbprint.dsl">
 23 </style-sheet>

The third style sheet is also a redaction of the master sheet. It contains only the HTML style sheet, and is written to html.dsl. The primary motivation for having this separate file is, as with print.dsl, that it is easier to write higher level customization layers for single purpose style sheets.

<html.dsl (ID: SCRAP07B)>=

  1 <!--
  2 PUBLIC "-//Mark Wroth//DOCUMENT DBLP HTML Print Rules 1.0//EN"
  3 
  4 This document is intended to be a minimum DocBook style-sheet
  5 customization layer.
  6 -->
  7 <!DOCTYPE style-sheet
  8   PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN" [
  9 <!ENTITY dbhtml.dsl PUBLIC
 10    "-//Norman Walsh//DOCUMENT DocBook HTML Stylesheet//EN"
 11    CDATA DSSSL>
 12 ]> 
 13 <style-sheet> 
 14 <style-specification id="html" use="dbhtml"> 
 15 <style-specification-body>
 16 
 17 <literalchar print processing rule>
 18 <programlisting HTML customizations>
 19 
 20 </style-specification-body>
 21 </style-specification>
 22 <external-specification id="dbhtml" document="dbhtml.dsl">
 23 </style-sheet>

4.3. The literalchar Processing Rules

<literalchar print processing rule (ID: SCRAP08)>=

  1 (element literalchar
  2   (make sequence
  3     (literal (attribute-string "data"))))

4.4. The programlisting Customizations

The basis of our customization is the existing programlisting code from [[WALSH-DSSSL]dbverb.dsl]. We will modify this by adding information ahead of the basic element (by identifying the scrap by either the file name or the definition name, depending on the type of scrap), and adding information after the program listing (if the scrap is continued).

4.4.1. Print Customizations

Almost everything about the print stylesheets will remain untouched. However, there are a few changes that need to be made to the processing of the printlisting element. The printlisting element itself gets some additional information added to it based on its place in the literate programming web, and the special use of xref means that we should wrap the cross reference text in some punctuation to indicate that it is a code section reference rather than actual code.

<programlisting print customizations (ID: SCRAP09)>=

  1 <Customize the programlisting proper>
  2 <Customize the (programlisting xref)>
  3 <Print auxiliary definitions>

4.4.1.1. The printlisting Proper

There are essentially two sets of changes we want to make to the processing of the programlisting element: adding header information identifying the code section and the particular scrap, and adding continuation information if the scrap has a continuation.

Mechanically, we do this by wrapping the original code (taken from [[WALSH-DSSSL]dbverb.dsl] inside a make sequence and adding the code to produce the header and continuation text before and after the original code.

<Customize the programlisting proper (ID: SCRAP10)>=

  1 (element programlisting (make sequence
  2   <Provide scrap header information>
  3   <Original programlisting print code>
  4   <Provide optional continuation info>
  5 ))

If the code section is part of an output literate program, the scrap header information is straight forward except for the actual title of the code section. That title is either the file name of the disk file the section will be written to, or the xreflabel of the section. In either case, the title is found in the first scrap of the section, which may or may not be the scrap we are currently processing.

We deal with this as a set of nested if statements to grab the title if we are in the first scrap of the section, or processing the continuedfrom scrap with a special mode that recursively hunts back up the linked list until it gets to the first scrap.[6]

The other refinement is that we will set the header and continuation information in a specific font and size intended to be distinct from the general font and size used in the document. For ease of reference, these are defined in the auxiliary information.

The other potential case is that this code section is not part of a literate program. In that case, no scrap header (or continuation information) is desired. In most cases, literate programming scraps are easy to identify from the fact that they have attributes not found in non-literate programming scraps (specifically the "file" and "continuedfrom" attributes). There is, however, one ambiguous case; a scrap which contains an entire definition section has no attributes unique to the literate programming use. In this case, we will rely on the presence of the "xreflabel" attribute, although there are circumstances where this will produce errors.

<Provide scrap header information (ID: SCRAP11)>=

  1 (if (or (attribute-string "file")
  2         (attribute-string "continuedfrom")
  3         (attribute-string "xreflabel"))
  4     <Write the scrap header>
  5     (empty-sosofo))

<Write the scrap header (ID: SCRAP11A)>=

  1 (make paragraph
  2   (make sequence
  3     font-family-name: scrap-header-font
  4     font-size:        scrap-header-size
  5     (literal "\left-pointing-angle-bracket")
  6     (if (attribute-string "file")
  7       (literal (attribute-string "file"))
  8       (if (attribute-string "xreflabel")
  9         (literal (attribute-string "xreflabel"))
 10         (with-mode scrap-title-mode
 11           (make sequence
 12             (process-element-with-id 
 13               (attribute-string "continuedfrom"))
 14             (literal " ")))))
 15      (make sequence
 16        font-size: (* scrap-header-size 0.75)
 17        (literal " (ID: ")
 18        (literal (attribute-string "id")))
 19      (literal ")\right-pointing-angle-bracket")
 20      (if (attribute-string "continuedfrom")
 21          (literal "+")
 22          (empty-sosofo))
 23      (literal "\identical-to")))

The logic of the continuation listing is simpler; if there is continuation information given, we show the continuation.

<Provide optional continuation info (ID: SCRAP12)>=

  1 (if (attribute-string "continuedin")
  2     (make paragraph
  3       font-family-name: scrap-header-font
  4       font-size:        scrap-header-size
  5       (make sequence
  6         (literal "Continued in ")
  7         (literal (attribute-string "continuedin"))))
  8     (empty-sosofo))

4.4.1.2. The printlisting xref Element

The main customization we want for the xref element is to put the cross-reference text in a running text font, and enclose it in angle brackets. In both cases, this is to distinguish a code section reference from actual code.

For consistency, we will put the reference in the same font and size as the scrap header and continuation information.

<Customize the (programlisting xref) (ID: SCRAP13)>=

  1 (element (programlisting xref) 
  2   (make sequence
  3   font-family-name: scrap-header-font
  4   font-size:        scrap-header-size
  5    (literal "\left-pointing-angle-bracket")
  6    <Original xref print code>
  7    (literal "\right-pointing-angle-bracket")
  8 ))

4.4.1.3. Print Auxiliary Definitions

Here we define the common information such as the size and font used for the header and continuation. Additionally, we define the special mode used to find the code section title here.

<Print auxiliary definitions (ID: SCRAP14)>=

  1 (define scrap-header-size 8pt)
  2 (define scrap-header-font "Georgia")
  3 <Define scrap-title-mode>

The scrap-title-mode is defined to process programlisting elements and either extract the title (found in the file or xreflabel attributes, or to continue up the linked list if neither attribute is present.

<Define scrap-title-mode (ID: SCRAP15)>=

  1 (mode scrap-title-mode
  2   (element programlisting
  3     (make sequence
  4       (if (attribute-string "file")
  5           (literal (attribute-string "file"))
  6           (if (attribute-string "xreflabel")
  7               (literal (attribute-string "xreflabel"))
  8               (process-element-with-id 
  9                 (attribute-string "continuedfrom")))))))

4.4.1.4. Original Modular Style Sheet Print Code

This is the complete DSSSL code for the programlisting element, found in dbverb.dsl.

<Original programlisting print code (ID: SCRAP16)>=

  1 ($verbatim-display$
  2  %indent-programlisting-lines%
  3  %number-programlisting-lines%)

This is the complete DSSSL code found in dblink.dsl for the xref element. It is probably much more complex than is actually needed for the rather specialized use we are making of it, but it is easier to just reproduce it than to try to simplify it.

<Original xref print code (ID: SCRAP17)>=

  1  (let* ((endterm (attribute-string (normalize "endterm")))
  2          (linkend (attribute-string (normalize "linkend")))
  3          (target  (element-with-id linkend))
  4          (xreflabel (if (node-list-empty? target)
  5                         #f
  6                         (attribute-string (normalize "xreflabel") target))))
  7     (if (node-list-empty? target)
  8         (error (string-append "XRef LinkEnd to missing ID '" linkend "'"))
  9         (if xreflabel
 10             (make link 
 11               destination: (node-list-address target)
 12               (literal xreflabel))
 13             (if endterm
 14                 (if (node-list-empty? (element-with-id endterm))
 15                     (error (string-append "XRef EndTerm to missing ID '" 
 16                                           endterm "'"))
 17                     (make link 
 18                       destination: (node-list-address (element-with-id endterm))
 19                       (with-mode xref-endterm-mode 
 20                         (process-element-with-id endterm))))
 21                 (cond
 22                  ((or (equal? (gi target) (normalize "biblioentry"))
 23                       (equal? (gi target) (normalize "bibliomixed")))
 24                   ;; xref to the bibliography is a special case
 25                   (xref-biblioentry target))
 26                  ((equal? (gi target) (normalize "co"))
 27                   ;; callouts are a special case
 28                   (xref-callout target))
 29                  ((equal? (gi target) (normalize "listitem"))
 30                   (xref-listitem target))
 31                  ((equal? (gi target) (normalize "question"))
 32                   (xref-question target))
 33                  ((equal? (gi target) (normalize "answer"))
 34                   (xref-answer target))
 35                  ((equal? (gi target) (normalize "refentry"))
 36                   (xref-refentry target))
 37                  ((equal? (gi target) (normalize "glossentry"))
 38                   ;; as are glossentrys
 39                   (xref-glossentry target))
 40                  ((equal? (gi target) (normalize "author"))
 41                   ;; and authors
 42                   (xref-author target))
 43                  ((equal? (gi target) (normalize "authorgroup"))
 44                   ;; and authorgroups
 45                   (xref-authorgroup target))
 46                  (else 
 47                   (xref-general target)))))))

4.4.2. HTML Customizations

In the HTML style sheet, we need to make the same kinds of customization we did with the print style sheet. The details of the actual customizations differ slightly.

<programlisting HTML customizations (ID: SCRAP18)>=

  1 <Programlisting element HTML customization>
  2 <xref element HTML customization>
  3 <HTML auxiliary definitions>

4.4.3. The programlisting Proper

In the programlisting, we make the same kinds of additions (header and continuation information) as used in the print style sheet.

<Programlisting element HTML customization (ID: SCRAP19)>=

  1 (element programlisting (make sequence
  2  <Provide HTML scrap header information>
  3  <Original programlisting HTML code>
  4  <Provide optional HTML scrap continuation info>
  5 ))

The code is also largely common, with some minor changes for the HTML output and its limited character repertoire. We have the same logical limitation on the identification of definition sections noted above.

<Provide HTML scrap header information (ID: SCRAP20)>=

  1 (if (or (attribute-string "file")
  2         (attribute-string "continuedfrom")
  3         (attribute-string "xreflabel"))
  4     <Write the HTML scrap header>
  5     (empty-sosofo))

<Write the HTML scrap header (ID: SCRAP20A)>=

  1 (make element gi: "P"
  2   (make sequence
  3     (literal "<")
  4     (if (attribute-string "file")
  5       (literal (attribute-string "file"))
  6       (if (attribute-string "xreflabel")
  7         (literal (attribute-string "xreflabel"))
  8         (with-mode scrap-title-mode
  9           (make sequence
 10             (process-element-with-id 
 11               (attribute-string "continuedfrom"))
 12             (literal " ")))))
 13      (make sequence
 14        (literal " (ID: ")
 15        (literal (attribute-string "id")))
 16      (literal ")>")
 17      (if (attribute-string "continuedfrom")
 18          (literal "+")
 19          (empty-sosofo))
 20      (literal "=")))

As with the print form, we apply the simple test that if there is continuation information, we show it.

<Provide optional HTML scrap continuation info (ID: SCRAP21)>=

  1 (if (attribute-string "continuedin")
  2     (make element gi: "P"
  3       (make sequence
  4         (literal "Continued in ")
  5         (literal (attribute-string "continuedin"))))
  6     (empty-sosofo))

4.4.3.1. Customizing the xref Element

<xref element HTML customization (ID: SCRAP22)>=

  1 (element (programlisting xref) (make sequence
  2 (literal "<")
  3 <Original xref HTML code>
  4 (literal ">")
  5 ))

4.4.4. HTML Auxiliary Definitions

Here we employ the same basic structure as with the print style sheet, although there is less to define; this primarily is intended to allow room for future elaboration of the style sheet.

The DSSSL code for the scrap-title-mode is identical with that used in the print style sheet, so we simply re-use it.

<HTML auxiliary definitions (ID: SCRAP23)>=

  1 <Define scrap-title-mode>

4.4.5. Original programlisting Code

<Original programlisting HTML code (ID: SCRAP24)>=

  1 ($verbatim-display$
  2 %indent-programlisting-lines%
  3  %number-programlisting-lines%)

4.4.6. Original xref HTML Code

<Original xref HTML code (ID: SCRAP25)>=

  1 (let* ((endterm   (attribute-string (normalize "endterm")))
  2          (linkend   (attribute-string (normalize "linkend")))
  3          (target    (element-with-id linkend))
  4          (xreflabel (if (node-list-empty? target)
  5                         #f
  6                         (attribute-string (normalize "xreflabel") target))))
  7     (if (node-list-empty? target)
  8         (error (string-append "XRef LinkEnd to missing ID '" linkend "'"))
  9         (make element gi: "A"
 10               attributes: (list
 11                            (list "HREF" (href-to target)))
 12               (if xreflabel
 13                   (literal xreflabel)
 14                   (if endterm
 15                       (if (node-list-empty? (element-with-id endterm))
 16                           (error (string-append
 17                                   "XRef EndTerm to missing ID '" 
 18                                   endterm "'"))
 19                           (with-mode xref-endterm-mode
 20                             (process-node-list (element-with-id endterm))))
 21                       (cond
 22                        ((or (equal? (gi target) (normalize "biblioentry"))
 23                             (equal? (gi target) (normalize "bibliomixed")))
 24                         ;; xref to the bibliography is a special case
 25                         (xref-biblioentry target))
 26                        ((equal? (gi target) (normalize "co"))
 27                         ;; callouts are a special case
 28                         ($callout-mark$ target #f))
 29                        ((equal? (gi target) (normalize "listitem"))
 30                         ;; listitems are a special case
 31                         (if (equal? (gi (parent target)) (normalize "orderedlist"))
 32                             (literal (orderedlist-listitem-label-recursive target))
 33                             (error (string-append "XRef to LISTITEM only supported in ORDEREDLISTs"))))
 34                        ((equal? (gi target) (normalize "question"))
 35                         ;; questions and answers are (yet another) special case
 36                         (make sequence
 37                           (literal (gentext-element-name target))
 38                           (literal (gentext-label-title-sep target))
 39                           (literal (question-answer-label target))))
 40                        ((equal? (gi target) (normalize "answer"))
 41                         ;; questions and answers are (yet another) special case
 42                         (make sequence
 43                           (literal (gentext-element-name target))
 44                           (literal (gentext-label-title-sep target))
 45                           (literal (question-answer-label target))))
 46                        ((equal? (gi target) (normalize "refentry"))
 47                         ;; so are refentrys
 48                         (xref-refentry target))
 49                        ((equal? (gi target) (normalize "glossentry"))
 50                         ;; as are glossentrys
 51                         (xref-glossentry target))
 52                        ((equal? (gi target) (normalize "author"))
 53                         ;; and authors
 54                         (xref-author target))
 55                        ((equal? (gi target) (normalize "authorgroup"))
 56                         ;; and authorgroups
 57                         (xref-authorgroup target))
 58 ; this doesn't really work very well yet
 59 ;                      ((equal? (gi target) (normalize "substeps"))
 60 ;                       ;; and substeps
 61 ;                       (xref-substeps target))
 62                        (else 
 63                         (xref-general target))))))))

Chapter 5. Sample Literate Program

5.1. The Sample Document

This is a very simple document illustrating the used of the DBLP literate programming system. It provides an example of how to construct a simple input file for the system.

<sample.sgm (ID: SCRAP26)>=

  1 <!DOCTYPE article 
  2  PUBLIC "-//Mark Wroth//DTD DocBook V4.1-Based Extension Literate Programming 1.0//EN">
  3 <article id="sample-lp">
  4   <title>A Sample DocBook-Based Literate Program</title>
  5   <section>
  6     <title>Introduction</title>
  7     <para>This is a sample document illustrating basic use of the
  8     DocBook-based literate programming tool.</para>
  9     <para>This document combines human-readable documentation of 
 10     a computer program with the actual computer-readable source 
 11     code.  Depending on how it is processed, it becomes either 
 12     printed (on-line) documentation of the program, or the actual 
 13     source submitted to the computer for compilation (or
 14     interpretation, depending on the language).</para>
 15   </section>
 16   <section>
 17    <title>Source Code</title>
 18 
 19    <para>The source code is divided into a number of
 20    <quote>scraps</quote>, each containing a discrete fragment of
 21    code.  These scraps are assembled into code sections by
 22    concatenating the header scrap with the various continuation
 23    scraps, in an order defined by the programmer. File sections are
 24    written to the file indicated by the programmer, while definition
 25    sections are inserted at a place or places defined by the
 26    programmer. </para>
 27    
 28    <para>The first code scrap defines a file output, specifically 
 29    to <filename>sample.code</filename>.</para>
 30 
 31    <programlisting
 32      id="scrap1"
 33      file="sample.code"
 34      continuedin="scrap2"
 35      >
 36 -- This is sample code in an imaginary language
 37 -- Taken from the first scrap
 38 if a < b then
 39   <xref linkend="scrap3">
 40 fi
 41    </programlisting>
 42 
 43    <para>The next code scrap is a continuation of the first
 44    scrap.</para>   
 45 
 46    <programlisting
 47     id="scrap2"
 48     continuedfrom="scrap1"
 49    >
 50 -- This is continued code, taken from the second scrap
 51 --
 52 set c = a & b 
 53  greater than: >
 54    </programlisting>
 55  
 56    <para>The following code section is an example of a definition
 57    scrap, and will be included in a file output scrap.</para>
 58    <programlisting
 59     id="scrap3"
 60     xreflabel="The Third Scrap"
 61     continuedin="scrap4">
 62 -- Yet more program code from the third scrap
 63    </programlisting>
 64    
 65    <para>Finally, we have a continuation scrap continuing
 66    a definition scrap.</para>
 67 
 68    <programlisting
 69      id="scrap4"
 70      continuedfrom="scrap3"
 71    >
 72 -- This is scrap 4, which continues scrap 3
 73 -- It should appear where scrap 3 was inserted.
 74    </programlisting>
 75  </section>
 76 </article>

As with the DTD, the fact that this output section is written in SGML means that the literate programming file as actually written makes extensive use of entities for those characters that have syntactic meaning. This situation is one of the few where the human documentation and the computer source are noticably different.

Chapter 6. System Performance

Generally speaking, verifying the functionality of the literate programming system consists of checking that the tangle branch correctly produces the intended code, and the weave branch produces appropriate human-readable documentation.

6.1. Sample Code Output

The code output file sample.code produced from the sample document looks like this:

-- This is sample code in an imaginary language
-- Taken from the first scrap
if a < b then
  -- Yet more program code from the third scrap
   -- This is scrap 4, which continues scrap 3
-- It should appear where scrap 3 was inserted.

fi
   -- This is continued code, taken from the second scrap
--
set c = a & b
greater than: >

This demonstrates the key functions needed in the tangle branch, namely that:

Code sections are correctly assembled from the separate scraps identified in the source code.
Definition sections are inserted into code sections at the location identified by the xref tag pointing to the head of the definition section.
File output sections are written to disk in with the desired file names.
Syntactically significant characters (to SGML) including <, >, and &, are written correctly in the output file.

The handling of whitespace in the code scraps is not quite what I expected, but appears to be consistent and reasonable. It appears that whitespace is (correctly) transcribed to the output file, except that an SGML record end (CR-LF pair, under Windows) following the programlisting start tag is not transcribed to the output.

6.2. Sample Woven Output

The weave DSSSL style sheets produce two sets of human-readable documentation from sample.sgm: sample.rtf and a set of HTML files. The HTML files are divided and named by the conventions of the HTML Modular DocBook Style Sheets, which (without customization) produce multiple small files.

In both cases, the weave outputs reasonably produce verbatim program listings, with header and continuation information that matches the actual structure defined by the sample file.

The typographic treatment of the header and continuation is acceptable, although the amount of vertical space introduced in both the print and HTML versions between the header and the body of the programlisting is larger than I would prefer.

6.3. Evaluation

Overall, these implementations of a DTD, tangle, and weave are functional. They do not produce really high-quality typographical output, but they do reasonably display the structure of the code sections. Pending further experience with the system, this seems to be an acceptable implementation.

Appendix A. Acronyms

DSSSL: Document Style Semantics Specification Language. DSSSL is an ISO standard defining how to specify transformations from SGML documents to page-oriented output renderings.
DTD: Document Type Definition, Document Type Declaration
HTML: Hypertext Markup Language
ISO: International Organization for Standardization
OASIS: Organization for the Advancement of Structured Information Standards
OSNL: Optional Singleton Node List
SGML: Standard Generalized Markup Language
SNL: Singleton Node List
XML: Extensible Markup Language

Appendix B. Bibliography

Bibliography

David Carlisle, Re: Issues with literate programming DSSSL Script, DSSSList Digest, Volume 3, Number 241, Thu, 16 Dec 1999 16:27:27 GMT.

Donald Knuth, Literate Programming, Center for the Study of Language and Information, 1992.

Christopher R. Maden, Re: (dsssl) "Default" processing rule?, DSSSList Digest, Volume 4, Number~143, Mon, 02 Apr 2001 22:36:33 (-0700).

OASIS, DocBook.

<http://www.oasis-open.org/docbook>

Norman Walsh, The Modular DocBook Stylesheets.

<http://sourceforge.net/projects/docbook>

Appendix C. Installation and Usage Tips

Catalog Files: The use of catalog files makes installation of the various processing applications and DTDs much easier, by allowing the specifications of where the various files are located in the system file structure to be separated from the documents that need to use them.
Using indirection in catalog files—having a catalog file point to a separate catalog file, rather than independently listing each of the system entities needed—makes the maintenance of the system much easier. A master catalog file, kept in a convenient location, points to catalog files for each of the various packages installed. These package catalog files can be kept in the installation directories. The local catalog file for a document can then reference all of the standard supporting files with a single entry.
The user environment variable SGML_CATALOG_FILES may also be used to cause Jade to point to the needed catalog file or files.

[1]	Literate Programming, originally published in The Computer Journal, May 1984, quoted in [[KNUTH92]p.~99]
[2]	The boundary between "literate programming" systems and "improved commenting" mechanisms is somewhat subjective. However, for the purposes of this discussion, a system is considered a literate programming system if it offers the capabilities listed above.
[3]	There are a number of efforts related to literate programming in SGML or XML, and Robin Cover's excellent web page provides links to many of them. However, with the possible exception of SWEB, whose author disclaims publication and citation of the work, none of them appear to me to be usable general literate programming systems.
[4]	Actually, even this is an overstatement: the editor has to be able to produce SGML input files compatible with the SGML and DSSSL processors being used. While this is usually a plain text file, there are other ways to implement SGML systems.
[5]	The basic suggestion to use a formatting-instruction to address the problem came from David Carlisle `<davidc@nag.co.uk>` in a post to the DSSSList, Vol 3, Number 241, although the actual implementation does not follow his suggestions precisely.
[6]	It is exactly this application that caused us to define the programlisting scraps as a doubly-linked list.