\documentclass{report}
\usepackage{hyperref,makeidx,moreverb,rcs}
\makeindex
\RCS$Id: filter.w,v 1.4 2000/04/18 04:40:21 Mark Exp $
\RCS$Revision: 1.4 $
\RCS$Date: 2000/04/18 04:40:21 $
\title{`Filter': A Grep-like Search Utility}
\author{Joel Polowin and Mark Wroth}
\newcommand{\nuweb}{Nuweb}
\newcommand{\filter}{\texttt{filter}\index{filter}}
\renewcommand{\Diamond}{\relax}
%\title{{\.{filter v. 2.0}    Latest mod: 21:25 Feb 28 1994}
\def\pipe{\hbox{$\vert$}}
\newcommand{\filename}[1]{\texttt{#1}}
\newcommand{\str}[1]{\texttt{"#1"}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}\maketitle

\chapter{User Documentation}

\section{Introduction}

This is {\filter} revision~2.0 dated 28~February 1994, a
\filename{grep}-like text searcher for multiple simultaneous keyword tests.

Copyright \copyright\ 1994 by Joel Polowin, Department of Chemistry,
Queen's University, Kingston, Ontario, Canada.  Permission granted for
free use and distribution; I want credit/blame for writing it.

E-mail: {\tt polowin@@silicon.chem.queensu.ca},

{\tt polowinj@@qucdn.queensu.ca}, 

{\tt Joel.Polowin@@p4.f107.n249.z1.fidonet.org} .

\begin{quotation}
  The \nuweb\ version of this file should be blamed on Mark Wroth,
  {\tt mark@@astrid.upland.ca.us}.  Joel was foolish enough to allow
  me to port his program to \nuweb\ on the Amiga, a permission which I
  have extended to a Windows NT version.  Kudos for the program goes
  to him; blame for the port belongs to me.  

  This is Revision~\RCSRevision, dated \RCSDate\ of the literate
  programming version.

  Literate programming documentation copyright \copyright\ 1995, 2000
  by Mark Wroth.  Permission is granted for use and distribution under
  the terms set above by the code author.
\end{quotation}

If you see something wrong with it or it fails to work, \emph{please} 
let one of us know!

\section{Syntax}

\texttt{filter [filename] [filename ...] string [string ...]}
    
where each string (default max. of 2000) is a term to be searched for
in lines (default max. 600 chars) in file(s) \texttt{filename},
prefixed by one of the following characters:
\begin{description}
\item[+]  to show lines which contain string
\item[-]  to show lines which do not contain string
\item[=]  to show lines which contain string, case sensitive
\item[\_]  (underscore) to show lines which do not contain string,
               case sensitive
\end{description}               

A string as above may be further prefixed with the letter \texttt{o}
to print the line if the current OR the preceding condition is true.

A string including blanks and the prefix may be enclosed in double quotes
on most systems.  Your operating system may have other ways of dealing
with special characters.

\filter\ determines the first string which is a search term instead of
a file name by its beginning with one of the characters
\texttt{+-=\_}.  For files whose names begin with one of these
characters, see below.  Otherwise, the first search term must begin
with one of these, as that first term cannot be \texttt{or}-linked to
a preceding term.

For strings that begin with \str{\$} or \str{\&} (usually names of
files of search terms), see below.

\subsection{Examples}

\texttt{filter * +hawk +handsaw o+hound}

searches all files in the current directory and prints lines that
contain the string \str{hawk} and at least one of \str{handsaw} and
\str{hound}.  This assumes that the operating system and compiler
accept wild-carded file names; else \filter\ will be looking for a
file named \str{*}.  For DOS, one would use \str{*.*}.

\texttt{filter armorial =Vert +argent -gules \_Or -azure -purpur +foil
  > tempfile.txt}
     
searches the file \str{armorial} for lines that contain the string
\str{Vert} (case-sensitive) and `argent' (upper or lower case) but
not `gules' (upper or lower case) and not `Or' (case-sensitive) and
not `azure' or `purpur' (upper or lower case) and DO contain `foil'
(upper or lower case); the resulting lines are saved in file
\str{tempfile.txt}.
  
 
\texttt{type temp1.txt \pipe\ filter +aardvark "o+winged pig" o+wombat 
 "\_\pipe B\pipe"}

The file \str{temp1.txt} is fed through the \filter\ program, which
passes lines that contain \str{aardvark} (upper or lower case) or
the string \str{winged pig} or \str{wombat} (upper or lower
case) and do \emph{not} contain \str{\pipe B\pipe}.  The result is
printed on the screen.  Note use of quotation marks in the command
line to include the space in \str{winged pig} and the special
character \pipe\ in \str{\pipe B\pipe}.


\subsection{File names beginning with one of \str{+-=\_}}

If you absolutely \emph{must} use text file names that begin with one of
these characters, use the character twice when specifying the file
name to \filter.  Thus, the file name \texttt{-stdev.c} would be
written \str{--stdev.c}; \str{++junk.c} would be written
\str{++++junk.c}.

What \filter\ does is to go through each term in the command line and
count the number of identical flag characters beginning each; that
number is reduced by half, rounded down.  An even number specifies a
text file name; an odd number designates a search term.
\str{+junk} has one flag character, is not changed (shortened by
\texttt{1/2 -> 0} characters), and is a search term: print lines
containing \str{junk}.  \str{++junk} has two flag characters, is
shortened to \str{+junk}, and is read as a text file name.
\str{+++junk} is shortened to \str{++junk}, and is a search
term: print lines containing the string \str{+junk}.
\str{+=junk} has one flag character and is a search term: print
lines containing the string \str{=junk}.

This means that wild-carded file names that match files whose names
begin with one of `\str{+-=\_}' will cause trouble.  I'm sorry; the
telepathic monitors of most computer systems are not
software-addressable.  A compulsive urge to use files whose names
begin with punctuation or mathematical symbols can now be treated
successfully in a majority of cases.

Search terms specified in files (see below) are not themselves in the
command line, and if they begin with one of `\str{+-=\_}' those
characters should not be doubled.  Search-term file expansion takes
place after {\filter} determines which command-line strings are file
names.


\subsection{`\$' and `\&' usually flag search-term file names}

A string which begins with \str{\$} or \str{\&} will be expanded
as the name of a file containing a list of search terms.  For example,
the string \str{+\$critters} tells {\filter} to look for a file
\filename{critters}; lines from that file are taken as search terms.
{\filter} adds the prefix to each term, depending on whether the file
name is specified with \str{\$} or \str{\&} and which of \str{+-=\_}
precedes it.  With \str{\$}, \str{+} and \str{=} give or-linked terms,
so that text file lines will be printed if any search-term file line
is matched; \str{-} and \str{\_} are not or-linked, so that text file
lines are printed only if no search-term file line is matched.  With
\str{\&}, \str{+} and \str{=} are not or-linked, so that \emph{all}
search-term lines must be matched to print a text line; while \str{-}
and \str{\_} \emph{are} or-linked, so that any search-term line
\emph{not} matched will allow text-line printing.

To search for actual text strings beginning with `\$' or `{\&}',
double the flag characters.  Thus, to search for the string
\str{\$100}, use the search string \str{+\$\$100}.  To expand
file names beginning with those characters, use three of them:
search-term file \str{\${\&}junk} would be specified with something
like \str{-\$\$\${\&}junk}.  In general, when a search term begins
with a flag character, double each flag character of that kind
beginning the term, and if the term is a file name, add an extra flag
character.

Search-term files may contain file names, which will be expanded in turn.
For this reason, initial `\$' and `{\&}' characters must be doubled even in
nested search-term files.

Note that the or-linking logic can get seriously messed up when terms
beginning with `-' or `\_' are expanded carelessly, as \filter\ has no
good sense of logical precedence.  If file \str{human} contains
\str{man} and \str{woman}, then \str{o-\$human} would expand
as \str{o-man -woman}.

Examples:

\texttt{filter armorial +\$animal o+\$vegetable \_\$mineral}

   If file \filename{animal}reads
\begin{verbatim}
   \$human
   reptile
   amphibian
\end{verbatim}

   and file \filename{human} reads
\begin{verbatim}
   man
   woman
\end{verbatim}
   
   and file \filename{vegetable} reads
\begin{verbatim}
   tree
   grain
\end{verbatim}

   and file \filename{mineral} reads
\begin{verbatim}
   rock
   dirt
\end{verbatim}

   then the above will be expanded to:
   
   \texttt{filter armorial +man o+woman o+reptile o+amphibian o+tree
     o+grain \_rock \_dirt}
   
   \texttt{filter armorial +\${\&}beastie +{\&}{\&}{\&}doggie
     -\$dragon}

   If file \filename{\&beastie} reads
\begin{verbatim}
   unicorn
   \$dragon
   manticore
\end{verbatim}

   and file \filename{\&doggie} reads
\begin{verbatim}
   terrier
   hound
   \$\$paniel
\end{verbatim}

   and file \filename{dragon} reads
\begin{verbatim}
   wyvern
   dragon
   lizard
\end{verbatim}

   then the above will be expanded to
   
   \texttt{filter armorial +unicorn o+wyvern o+dragon o+lizard
     o+manticore +terrier +hound +\$\$paniel -wyvern -dragon -lizard}
   
   which will print lines from file \str{armorial} that contain:
   any of: \str{unicorn}, \str{wyvern}, \str{dragon},
   `\str{lizard}, \str{manticore}; and ALL of: \str{terrier},
   \str{hound}, \str{\$paniel}; and none of: \str{wyvern},
   `\str{dragon}, \str{lizard}.

\section{Revision History}

\subsection{Code}

Version 1.0 September 1992.

1.1 Sep '92 fixed minor bugs

1.2 Sep '92 added 'or'-linking to keywords

1.4 Oct '92 fixed a minor error in string lengths, added size DEFINEs

1.5 Jan '94 increased string lengths, fixed a Stupid Newbie Error re:
            assumption that *argv[] was writable

2.0 Feb '94 added search-term file expansion and multiple text file
            capability, including wildcards when system permits



\subsection{Documentation}

$Log: filter.w,v $
Revision 1.4  2000/04/18 04:40:21  Mark
Basic documentation complete.

Revision 1.3  2000/04/18 04:21:48  Mark
Document compiles.


\chapter{Implementation}

\section{The Program}

@O filter.c @{
#define LENGTH 1201 /* 1 more than max \# characters */
#define ARGS 2000

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>

@<Print the syntax message@>
@<Lowercase a string@>

void main(argc,argv)
int argc;
char *argv[];
{

@<Variables and types of the main function@>

@<Parse the input line@>

   k=0;
   do
      {
      k++;
      if(firststring==1) infile=stdin;
      else
        if(!(infile=fopen(argv[k],"r")))
           {
           fprintf(stderr,"Can't open file %s for reading.\n",argv[k]);
           syntax();
           }
      l=0;

      for (;;)
        {
        if(NULL==fgets(line,LENGTH,infile)) break;
        if(LENGTH==strlen(line)+1) fprintf(stderr,"* Warning: truncated line \n%s\n",line);
        if(lowcase) strlow(strcpy(lowline,line));
        for(i=1; i<=nostring; i++)
           {
           test=0;
           switch(*myargv[i])
              {
              case '=':
              case '_':
                 if(NULL!=strstr(line,(myargv[i]+1))) test=1;
                 break;
              default:
                 if(NULL!=strstr(lowline,(myargv[i]+1))) test=1;
                 break;
              }
           if(test!=flag[i] && orflag[i]==0) break;
           if(test==flag[i])
              while(orflag[i]==1) i++;
           }
        if(i>nostring)
           {
           if(firststring>2 && l==0)
              {
              printf("File %s:\n",argv[k]);
              l=1;
              }
           printf("%s",line);
           }
        }
      if(infile!=stdin) fclose(infile);
      }
   while(k<firststring-1);
   exit(0);
}
@}

@D Lowercase a string @{
void strlow(string)
char *string;
{
   while (*string!='\0')
   {
      *string=tolower(*string);
      string++;
   }
}
@| strlow @}
@D Print the syntax message @{
void syntax ()
{
   fprintf(stderr,"Syntax: filter [filename] [filename ...] string [string ...]\n  where each");
   fprintf(stderr," string (max. of %d) is a term to be searched for",ARGS);
   fprintf(stderr," in lines (max.\n  %d chars) in file",LENGTH-1);
   fprintf(stderr," `filename', prefixed by one of the following characters:\n");
   fprintf(stderr,"    +  to show lines which contain string\n");
   fprintf(stderr,"    -  to show lines which do not contain string\n");
   fprintf(stderr,"    =  to show lines which contain string, case sensitive\n");
   fprintf(stderr,"    _  (underscore) to show lines which do not contain string,\n");
   fprintf(stderr,"            case sensitive\n\n");
   fprintf(stderr,"A string as above may be further prefixed with the letter 'o' to\n");
   fprintf(stderr,"  print the line if the current OR the preceding condition is true.");
   fprintf(stderr,"\nA string including blanks and the prefix may be enclosed in");
   fprintf(stderr," double quotes.\nStrings beginning with `$' or `&' designate"); 
   fprintf(stderr," file expansion; see filter20.doc.\n\nExamples:\n  ");
   fprintf(stderr," filter armorial =Vert +argent ");
   fprintf(stderr,"-gules _Or -azure -purpur +foil > tempfile.txt\n");
   fprintf(stderr,"  type temp1.txt | filter +aardvark \042o+winged pig\042 ");
   fprintf(stderr,"o+wombat \042_|B|\042\n\nFilter utility v.2.0 ");
   fprintf(stderr,"(C) 1994 by Joel Polowin, Chem. ");
   fprintf(stderr,"Dept., Queen's University,\nKingston.  ");
   fprintf(stderr,"Permission granted for free ");
   fprintf(stderr,"use; I want credit/blame for writing it.\n");
   fprintf(stderr,"polowin@@silicon.chem.queensu.ca, polowinj@@qucdn.queensu.ca\n");
   exit(1);
}
@| syntax
@}

We should accumulate these...
@D Variables and types of the main function @{
   char line[LENGTH],lowline[LENGTH];
   FILE *infile;
   int  i,j,k,l,test,firststring,nostring,lowcase;
   void syntax();
   void strlow();
   char flag[ARGS+1],orflag[ARGS+1];
   char *myargv[ARGS+1];
   int  blocksize;
   char *prefix,*filename;
@| line LENGTH infile firststring nostring lowcase blocksize
@}

\section{Parsing the input line}

At least I \emph{think} I got the input parser ... beware of improper
boudaries.

@D Parse the input line @{
   firststring=0;
   for(i=1; i<argc; i++)
      {
      if(*argv[i]=='=' || *argv[i]=='+' || *argv[i]=='-' || *argv[i]=='_')
        {
        test=*argv[i];
        for(j=1; test==*(argv[i]+j); j++)
           ;
        if(j%2 && firststring==0) 
           firststring=i;
        argv[i]+=j/2;
        }
      }

@<Check for no search string@>

@<Check for too many search strings@>

   lowcase=0;
   
   for (i=0; i<ARGS+1; i++) orflag[i]=0;

   blocksize=0;
   for (i=firststring; i<argc; i++) blocksize+=strlen(argv[i])+1;

   @<Allocate memory for the internal arguement list@>   

   @<Transfer the external arguements to the internal list@>

   for (i=1; i<=nostring; i++)
   {
      j=0;
      switch(*myargv[i])
      {
        case '+':
           strlow(myargv[i]);
           lowcase=1;
        case '=':
           flag[i]=1;
           break;
        case '-':
           strlow(myargv[i]);
           lowcase=1;
        case '_':
           flag[i]=0;
           break;
        case 'O':
        case 'o':
           orflag[i-1]=1;
           myargv[i]++;
           i--;
           j=1;
           break;
        default:
           fprintf(stderr,"Error in string no. %d: %s\n",i-1,myargv[i]);
           syntax();
      }

   if((!j) && (((test=*(myargv[i]+1))=='$') || (test=='&')))
      {
      for(j=1; test==*(myargv[i]+1+j); j++)
      ;  /* count identical flag chars */

      l=j/2;

      for (k=0; *(myargv[i]+k)!='\0'; k++)   /* shift string to delete */
        *(myargv[i]+k+1)=*(myargv[i]+k+l+1);  /* half of flag chars */

      if(j%2) /* an odd number of flag chars: expand file */
        {
        switch(*myargv[i])   /* determine prefix for expanded terms */
           {                 /* from current prefix and expansion type */
           case '+':
              if(test=='$') prefix="o+";
              else prefix="+";
              break;
           case '=':
              if(test=='$') prefix="o=";
              else prefix="=";
              break;
           case '-':
              if(test=='$') prefix="-";
              else prefix="o-";
              break;
           case '_':
              if(test=='$') prefix="_";
              else prefix="o_";
              break;
           default:
              fprintf(stderr,"Bugger-up in program!\n");
              exit(1);
           }

        filename=myargv[i]+2;

        if(!(infile=fopen(filename,"r")))
           {
           fprintf(stderr,"Can't open search-term file %s\n",filename);
           exit(1);
           }

        test=*myargv[i];  /* prefix of current term */

        blocksize=0;  /* figure out how much memory to allocate */
        for(j=0;;j++)   /* for new terms */
           {
           if(NULL==fgets(line,LENGTH,infile)) break;
           if(LENGTH==strlen(line)+1) fprintf(stderr,
             "* Warning: truncated search term file %s line\n%s\n",
             filename,line);
           if(line[strlen(line)-1]=='\n') line[strlen(line)-1]='\0';
           blocksize+=strlen(line)+1;
           }

        if (j==0)
           {
           fprintf(stderr,"* Warning: empty search term file %s\n",
              filename);
           j=1;
           }

        blocksize+=j*strlen(prefix);
        fclose(infile);

        if(nostring+j-1>ARGS)
           {
           fprintf(stderr,"File expansion gives too many terms (%d max)\n",
              ARGS);
           exit(1);
           }

        for(k=nostring; k>i; k--)      /* shift old myargv to make room */
           myargv[k+j-1]=myargv[k];

        if(NULL==(myargv[i]=malloc(blocksize)))
           {
           fprintf(stderr,"Can't allocate memory for search term expansion.\n");
           exit(1);
           }

        if(!(infile=fopen(filename,"r")))
           {
           fprintf(stderr,"Can't open file %s for second read.\n",filename);
           exit(1);
           }

        for(k=0;k<j;k++)
           {
           fgets(line,LENGTH,infile);
           if(line[strlen(line)-1]=='\n') line[strlen(line)-1]='\0';
           if(k==0)
              {
              *myargv[i]=test;
              strcpy(myargv[i]+1,line);
              }
           else
              {
              myargv[i+k]=myargv[i+k-1]+strlen(myargv[i+k-1])+1;
              strcpy(myargv[i+k],prefix);
              strcat(myargv[i+k],line);
              }
           }

        fclose(infile);
        nostring+=j-1;
        i--;
        }
      }
   }

@}

Need to have at least one search string.  Print a warning if there
isn't.

@D Check for no search string @{
   if(firststring==0)
      {
      fprintf(stderr,"Must specify a search string.\n");
      syntax();
      }
@}

Similarly, need to not have too many.
@D Check for too many search strings @{
   nostring=argc-firststring;
   if(nostring>ARGS)
      {
      fprintf(stderr,"Too many search strings specified.\n");
      syntax();
      }

@}
@D Allocate memory for the internal arguement list @{
   if (NULL==(myargv[1]=malloc(blocksize)))
      {
      fprintf(stderr,"Can't allocate memory for string storage.\n");
      exit(1);
      }

@}

@D Transfer the external arguements to the internal list @{
   strcpy(myargv[1],argv[firststring]);

   for(i=2; i<=nostring; i++)
      {
      myargv[i]=myargv[i-1]+strlen(myargv[i-1])+1;
      strcpy(myargv[i],argv[i+firststring-1]);
      }
@}

\appendix
\chapter{Index}

This index lists identifiers and other signficant items by 
section of appearance (not by page).

@f

@m

@u

\printindex

\end{document}