\documentclass{report} \usepackage{hyperref,makeidx,moreverb,rcs} \makeindex \RCS$Id: filter.w,v 1.4 2000/04/18 04:40:21 Mark Exp $ \RCS$Revision: 1.4 $ \RCS$Date: 2000/04/18 04:40:21 $ \title{`Filter': A Grep-like Search Utility} \author{Joel Polowin and Mark Wroth} \newcommand{\nuweb}{Nuweb} \newcommand{\filter}{\texttt{filter}\index{filter}} \renewcommand{\Diamond}{\relax} %\title{{\.{filter v. 2.0} Latest mod: 21:25 Feb 28 1994} \def\pipe{\hbox{$\vert$}} \newcommand{\filename}[1]{\texttt{#1}} \newcommand{\str}[1]{\texttt{"#1"}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document}\maketitle \chapter{User Documentation} \section{Introduction} This is {\filter} revision~2.0 dated 28~February 1994, a \filename{grep}-like text searcher for multiple simultaneous keyword tests. Copyright \copyright\ 1994 by Joel Polowin, Department of Chemistry, Queen's University, Kingston, Ontario, Canada. Permission granted for free use and distribution; I want credit/blame for writing it. E-mail: {\tt polowin@@silicon.chem.queensu.ca}, {\tt polowinj@@qucdn.queensu.ca}, {\tt Joel.Polowin@@p4.f107.n249.z1.fidonet.org} . \begin{quotation} The \nuweb\ version of this file should be blamed on Mark Wroth, {\tt mark@@astrid.upland.ca.us}. Joel was foolish enough to allow me to port his program to \nuweb\ on the Amiga, a permission which I have extended to a Windows NT version. Kudos for the program goes to him; blame for the port belongs to me. This is Revision~\RCSRevision, dated \RCSDate\ of the literate programming version. Literate programming documentation copyright \copyright\ 1995, 2000 by Mark Wroth. Permission is granted for use and distribution under the terms set above by the code author. \end{quotation} If you see something wrong with it or it fails to work, \emph{please} let one of us know! \section{Syntax} \texttt{filter [filename] [filename ...] string [string ...]} where each string (default max. of 2000) is a term to be searched for in lines (default max. 600 chars) in file(s) \texttt{filename}, prefixed by one of the following characters: \begin{description} \item[+] to show lines which contain string \item[-] to show lines which do not contain string \item[=] to show lines which contain string, case sensitive \item[\_] (underscore) to show lines which do not contain string, case sensitive \end{description} A string as above may be further prefixed with the letter \texttt{o} to print the line if the current OR the preceding condition is true. A string including blanks and the prefix may be enclosed in double quotes on most systems. Your operating system may have other ways of dealing with special characters. \filter\ determines the first string which is a search term instead of a file name by its beginning with one of the characters \texttt{+-=\_}. For files whose names begin with one of these characters, see below. Otherwise, the first search term must begin with one of these, as that first term cannot be \texttt{or}-linked to a preceding term. For strings that begin with \str{\$} or \str{\&} (usually names of files of search terms), see below. \subsection{Examples} \texttt{filter * +hawk +handsaw o+hound} searches all files in the current directory and prints lines that contain the string \str{hawk} and at least one of \str{handsaw} and \str{hound}. This assumes that the operating system and compiler accept wild-carded file names; else \filter\ will be looking for a file named \str{*}. For DOS, one would use \str{*.*}. \texttt{filter armorial =Vert +argent -gules \_Or -azure -purpur +foil > tempfile.txt} searches the file \str{armorial} for lines that contain the string \str{Vert} (case-sensitive) and `argent' (upper or lower case) but not `gules' (upper or lower case) and not `Or' (case-sensitive) and not `azure' or `purpur' (upper or lower case) and DO contain `foil' (upper or lower case); the resulting lines are saved in file \str{tempfile.txt}. \texttt{type temp1.txt \pipe\ filter +aardvark "o+winged pig" o+wombat "\_\pipe B\pipe"} The file \str{temp1.txt} is fed through the \filter\ program, which passes lines that contain \str{aardvark} (upper or lower case) or the string \str{winged pig} or \str{wombat} (upper or lower case) and do \emph{not} contain \str{\pipe B\pipe}. The result is printed on the screen. Note use of quotation marks in the command line to include the space in \str{winged pig} and the special character \pipe\ in \str{\pipe B\pipe}. \subsection{File names beginning with one of \str{+-=\_}} If you absolutely \emph{must} use text file names that begin with one of these characters, use the character twice when specifying the file name to \filter. Thus, the file name \texttt{-stdev.c} would be written \str{--stdev.c}; \str{++junk.c} would be written \str{++++junk.c}. What \filter\ does is to go through each term in the command line and count the number of identical flag characters beginning each; that number is reduced by half, rounded down. An even number specifies a text file name; an odd number designates a search term. \str{+junk} has one flag character, is not changed (shortened by \texttt{1/2 -> 0} characters), and is a search term: print lines containing \str{junk}. \str{++junk} has two flag characters, is shortened to \str{+junk}, and is read as a text file name. \str{+++junk} is shortened to \str{++junk}, and is a search term: print lines containing the string \str{+junk}. \str{+=junk} has one flag character and is a search term: print lines containing the string \str{=junk}. This means that wild-carded file names that match files whose names begin with one of `\str{+-=\_}' will cause trouble. I'm sorry; the telepathic monitors of most computer systems are not software-addressable. A compulsive urge to use files whose names begin with punctuation or mathematical symbols can now be treated successfully in a majority of cases. Search terms specified in files (see below) are not themselves in the command line, and if they begin with one of `\str{+-=\_}' those characters should not be doubled. Search-term file expansion takes place after {\filter} determines which command-line strings are file names. \subsection{`\$' and `\&' usually flag search-term file names} A string which begins with \str{\$} or \str{\&} will be expanded as the name of a file containing a list of search terms. For example, the string \str{+\$critters} tells {\filter} to look for a file \filename{critters}; lines from that file are taken as search terms. {\filter} adds the prefix to each term, depending on whether the file name is specified with \str{\$} or \str{\&} and which of \str{+-=\_} precedes it. With \str{\$}, \str{+} and \str{=} give or-linked terms, so that text file lines will be printed if any search-term file line is matched; \str{-} and \str{\_} are not or-linked, so that text file lines are printed only if no search-term file line is matched. With \str{\&}, \str{+} and \str{=} are not or-linked, so that \emph{all} search-term lines must be matched to print a text line; while \str{-} and \str{\_} \emph{are} or-linked, so that any search-term line \emph{not} matched will allow text-line printing. To search for actual text strings beginning with `\$' or `{\&}', double the flag characters. Thus, to search for the string \str{\$100}, use the search string \str{+\$\$100}. To expand file names beginning with those characters, use three of them: search-term file \str{\${\&}junk} would be specified with something like \str{-\$\$\${\&}junk}. In general, when a search term begins with a flag character, double each flag character of that kind beginning the term, and if the term is a file name, add an extra flag character. Search-term files may contain file names, which will be expanded in turn. For this reason, initial `\$' and `{\&}' characters must be doubled even in nested search-term files. Note that the or-linking logic can get seriously messed up when terms beginning with `-' or `\_' are expanded carelessly, as \filter\ has no good sense of logical precedence. If file \str{human} contains \str{man} and \str{woman}, then \str{o-\$human} would expand as \str{o-man -woman}. Examples: \texttt{filter armorial +\$animal o+\$vegetable \_\$mineral} If file \filename{animal}reads \begin{verbatim} \$human reptile amphibian \end{verbatim} and file \filename{human} reads \begin{verbatim} man woman \end{verbatim} and file \filename{vegetable} reads \begin{verbatim} tree grain \end{verbatim} and file \filename{mineral} reads \begin{verbatim} rock dirt \end{verbatim} then the above will be expanded to: \texttt{filter armorial +man o+woman o+reptile o+amphibian o+tree o+grain \_rock \_dirt} \texttt{filter armorial +\${\&}beastie +{\&}{\&}{\&}doggie -\$dragon} If file \filename{\&beastie} reads \begin{verbatim} unicorn \$dragon manticore \end{verbatim} and file \filename{\&doggie} reads \begin{verbatim} terrier hound \$\$paniel \end{verbatim} and file \filename{dragon} reads \begin{verbatim} wyvern dragon lizard \end{verbatim} then the above will be expanded to \texttt{filter armorial +unicorn o+wyvern o+dragon o+lizard o+manticore +terrier +hound +\$\$paniel -wyvern -dragon -lizard} which will print lines from file \str{armorial} that contain: any of: \str{unicorn}, \str{wyvern}, \str{dragon}, `\str{lizard}, \str{manticore}; and ALL of: \str{terrier}, \str{hound}, \str{\$paniel}; and none of: \str{wyvern}, `\str{dragon}, \str{lizard}. \section{Revision History} \subsection{Code} Version 1.0 September 1992. 1.1 Sep '92 fixed minor bugs 1.2 Sep '92 added 'or'-linking to keywords 1.4 Oct '92 fixed a minor error in string lengths, added size DEFINEs 1.5 Jan '94 increased string lengths, fixed a Stupid Newbie Error re: assumption that *argv[] was writable 2.0 Feb '94 added search-term file expansion and multiple text file capability, including wildcards when system permits \subsection{Documentation} $Log: filter.w,v $ Revision 1.4 2000/04/18 04:40:21 Mark Basic documentation complete. Revision 1.3 2000/04/18 04:21:48 Mark Document compiles. \chapter{Implementation} \section{The Program} @O filter.c @{ #define LENGTH 1201 /* 1 more than max \# characters */ #define ARGS 2000 #include #include #include #include @ @ void main(argc,argv) int argc; char *argv[]; { @ @ k=0; do { k++; if(firststring==1) infile=stdin; else if(!(infile=fopen(argv[k],"r"))) { fprintf(stderr,"Can't open file %s for reading.\n",argv[k]); syntax(); } l=0; for (;;) { if(NULL==fgets(line,LENGTH,infile)) break; if(LENGTH==strlen(line)+1) fprintf(stderr,"* Warning: truncated line \n%s\n",line); if(lowcase) strlow(strcpy(lowline,line)); for(i=1; i<=nostring; i++) { test=0; switch(*myargv[i]) { case '=': case '_': if(NULL!=strstr(line,(myargv[i]+1))) test=1; break; default: if(NULL!=strstr(lowline,(myargv[i]+1))) test=1; break; } if(test!=flag[i] && orflag[i]==0) break; if(test==flag[i]) while(orflag[i]==1) i++; } if(i>nostring) { if(firststring>2 && l==0) { printf("File %s:\n",argv[k]); l=1; } printf("%s",line); } } if(infile!=stdin) fclose(infile); } while(k tempfile.txt\n"); fprintf(stderr," type temp1.txt | filter +aardvark \042o+winged pig\042 "); fprintf(stderr,"o+wombat \042_|B|\042\n\nFilter utility v.2.0 "); fprintf(stderr,"(C) 1994 by Joel Polowin, Chem. "); fprintf(stderr,"Dept., Queen's University,\nKingston. "); fprintf(stderr,"Permission granted for free "); fprintf(stderr,"use; I want credit/blame for writing it.\n"); fprintf(stderr,"polowin@@silicon.chem.queensu.ca, polowinj@@qucdn.queensu.ca\n"); exit(1); } @| syntax @} We should accumulate these... @D Variables and types of the main function @{ char line[LENGTH],lowline[LENGTH]; FILE *infile; int i,j,k,l,test,firststring,nostring,lowcase; void syntax(); void strlow(); char flag[ARGS+1],orflag[ARGS+1]; char *myargv[ARGS+1]; int blocksize; char *prefix,*filename; @| line LENGTH infile firststring nostring lowcase blocksize @} \section{Parsing the input line} At least I \emph{think} I got the input parser ... beware of improper boudaries. @D Parse the input line @{ firststring=0; for(i=1; i @ lowcase=0; for (i=0; i @ for (i=1; i<=nostring; i++) { j=0; switch(*myargv[i]) { case '+': strlow(myargv[i]); lowcase=1; case '=': flag[i]=1; break; case '-': strlow(myargv[i]); lowcase=1; case '_': flag[i]=0; break; case 'O': case 'o': orflag[i-1]=1; myargv[i]++; i--; j=1; break; default: fprintf(stderr,"Error in string no. %d: %s\n",i-1,myargv[i]); syntax(); } if((!j) && (((test=*(myargv[i]+1))=='$') || (test=='&'))) { for(j=1; test==*(myargv[i]+1+j); j++) ; /* count identical flag chars */ l=j/2; for (k=0; *(myargv[i]+k)!='\0'; k++) /* shift string to delete */ *(myargv[i]+k+1)=*(myargv[i]+k+l+1); /* half of flag chars */ if(j%2) /* an odd number of flag chars: expand file */ { switch(*myargv[i]) /* determine prefix for expanded terms */ { /* from current prefix and expansion type */ case '+': if(test=='$') prefix="o+"; else prefix="+"; break; case '=': if(test=='$') prefix="o="; else prefix="="; break; case '-': if(test=='$') prefix="-"; else prefix="o-"; break; case '_': if(test=='$') prefix="_"; else prefix="o_"; break; default: fprintf(stderr,"Bugger-up in program!\n"); exit(1); } filename=myargv[i]+2; if(!(infile=fopen(filename,"r"))) { fprintf(stderr,"Can't open search-term file %s\n",filename); exit(1); } test=*myargv[i]; /* prefix of current term */ blocksize=0; /* figure out how much memory to allocate */ for(j=0;;j++) /* for new terms */ { if(NULL==fgets(line,LENGTH,infile)) break; if(LENGTH==strlen(line)+1) fprintf(stderr, "* Warning: truncated search term file %s line\n%s\n", filename,line); if(line[strlen(line)-1]=='\n') line[strlen(line)-1]='\0'; blocksize+=strlen(line)+1; } if (j==0) { fprintf(stderr,"* Warning: empty search term file %s\n", filename); j=1; } blocksize+=j*strlen(prefix); fclose(infile); if(nostring+j-1>ARGS) { fprintf(stderr,"File expansion gives too many terms (%d max)\n", ARGS); exit(1); } for(k=nostring; k>i; k--) /* shift old myargv to make room */ myargv[k+j-1]=myargv[k]; if(NULL==(myargv[i]=malloc(blocksize))) { fprintf(stderr,"Can't allocate memory for search term expansion.\n"); exit(1); } if(!(infile=fopen(filename,"r"))) { fprintf(stderr,"Can't open file %s for second read.\n",filename); exit(1); } for(k=0;kARGS) { fprintf(stderr,"Too many search strings specified.\n"); syntax(); } @} @D Allocate memory for the internal arguement list @{ if (NULL==(myargv[1]=malloc(blocksize))) { fprintf(stderr,"Can't allocate memory for string storage.\n"); exit(1); } @} @D Transfer the external arguements to the internal list @{ strcpy(myargv[1],argv[firststring]); for(i=2; i<=nostring; i++) { myargv[i]=myargv[i-1]+strlen(myargv[i-1])+1; strcpy(myargv[i],argv[i+firststring-1]); } @} \appendix \chapter{Index} This index lists identifiers and other signficant items by section of appearance (not by page). @f @m @u \printindex \end{document}