
#*  

ABOUT 

  Creating an [ebnf] style language with [nom] as the compile target.
  I will use a W3C ebnf style with no commas between tokens.

  It would be nice to have a more natural language that *targets* [nom]

  This is compiling simple [ebnf] to [nom] . This is the first example of
  using nom as the target of a nom script. Another strange oed://corollary
  arises: that we can use this new language to implement a compiler for
  itself 

BUGS

  lit: 'a'; compiles the same as a. is this ok? 

  I really need "replace ^'text' 'new';" and "replace 'old'$ 'new';
  where ^ and $ are anchors to the start and end of the workspace.
  without this, alot of my compilation is potentially buggy.
  For example, with lookahead compilation, I want to replace the first
  few tokens of a sequence, but I cant be sure I am only replacing the
  first.

  
DONE

  Compiling uneven alternation sequences using the tape variable LHS and the ';'
  token attribute to save the partially compiled code.
  - finished attrule parsing to assign to @2 @3 etc.
  - variables like $server, which will then interpolate in strings
  - but user defined vars dont interpolate.

TODO
 

TESTING

  * first working program
  >> pep -f syntagma.pss -i "[:alpha:]+{color:'blue'|'green';delete;}digit:[0-9]+;lit:';';delete; option = color digit ';'; eof { print 'at eof\n';exit 0;} " > junk.pss

  The syntagma program below seems to compile correctly with
  this syntagma.pss script. See the phrases section for lots of syntagma
  syntax.

  * an example program, with lexing and a grammar rule 
  --------
    # comments can be written with '#'
    # * multiline comments are also
       ok, following a ebnf style * #
    [:alpha:]+ {
      color:'blue'|'green'|'red'|'orange';
      word: *;    # define default token within a lex block
    }

    space: [:space:];    # a space token contains a single \r\n\t or ' ' etc
    integer: [0-9]+ ; 
    # double quotes, single quotes or classes can be used
    # lit is a special lexing keyword
    lit: ';'|":"|[@#$];  

    # keywords like 'not','empty' or 'delete' are not case sensitive
    NOT EMPTY { delete; }  

    # parse rules start here. The syntagma grammar knows how to 
    # work out where lexing ends and parsing begins. The parse rule assignment
    # operator is '='
    option = color digit ';' ; 

    # parse rules can have alternations with |
    # literal tokens like ':' can be used in parse rules, but must be defined
    # above with the 'lit' or 'literal' lexing keyword
    position = integer ':' integer | integer ';' integer ; 
    # parse-rules can have code that executes when the rule matches
    position = integer ':' integer | integer ';' integer {
      print "found position at line $lines!\n"; 
    }
    EOF { print 'at eof\n'; exit 0;} 
  ,,,,

  >> pep -f syntagma.pss -i 'com = word param; block = word newword;'
  * sample output of syntagma.pss when compiling with nom script 
  ------+
    # sample input BNF rules (white-space doesnt matter):
    #   com = word param ; 
    #   block = word newword ;
    # output (produced by this script)
    pop;pop;
    "word*param*" {
      clear; add "com*"; push; .reparse
    }
    push;push;
    pop;pop;
    "word*newword*" {
      clear; add "block*"; push; push; push; .reparse
    }
    push;push
  ,,,,

  This is pretty cool, because we now have a ebnf-to-nom compiler
  that produces executable and translatable (to go/java/tcl/python/ruby etc)
  [nom] code. The language has a lexing and parsing syntax.

  The syntagma language may not be as efficient as hand coded [nom] because
  it does redundant "pushes" nom://push and "pops" nom://pop between
  code blocks, but it is easier to write and probably less prone to 
  errors. 

  * compiling syntax for syntagma
  ----
    link = quotedtext url { @1 = "<a href=$1>$2</a>"; }
  ,,,,

  I may also allow '.' as a string concatenator. $1 refers to 
  the attribute of the first token on the RHS right-hand-side of the 
  bnf grammar rule. The compiling block takes the place of the ';' in
  the syntax above.

ALTERNATION

  Alternation has now been implemented, including for unequal
  length alternation branches (with no following code block) (18 may 2026)
  Also alternation (all branches same length) with code blocks such as

  >> a = b e | c d { print "parsing"; }

  I have been thinking about how to implement alternation in syntagma
  and nom* for quite some time, and I have come up with a few promising
  ideas. Originally I thought that this was going to be impossible
  or nearly impossible (but it isnt)

  * parsing acrobatics and alternation
  --------
    example1: a = b c | d e f ;
    compile: 
      pop;pop; "b*c*" { clear; add "a*"; push; .reparse } push;push;
      pop;pop;pop; "d*e*f*" { clear; add "a*"; push; .reparse } push;push;push;
    example2: a = b c | d e { #1 = "$1 and $2"; }
    compile:
      pop;pop; "b*c*","d*e*" { 
        clear; get; add " and "; ++; get; --; put;
        clear; add "a*"; push; .reparse 
      } push;push;
  ,,,

  The second example is probably only useful if there are the same
  number of parse tokens in each branch of the alternation ?

  * use a variable on the tape, like this in nom
  >> begin { mark 'LHS'; ++ }

NOTES

  Below is a remarkably simple way to implement 'lookahead' in 
  syntagma.

  had the idea of lookahead grouping in rules, this is implemented,
  but not ruleblocks yet.
  >> a = b +(c|d|e) ;

  so b will reduce to a, but only if b in followed by c,d or e
  The parse stack would be: sequence '=' rsequence lookahead ;

  This could compile as
  >> "b*c*","b*d*","b*e*" { replace "b*" "a*"; push; push; .reparse }

  but there is a problem with the replace if there is another b* pattern.
  but I could solve that with some fancy putting and getting. This 
  system preserves the lookahead tokens (which is ofcourse necessary)

  phantom tokens within blocks are nice. This can be used to enforce
  what sort of things can go in that block, eg, lexrules, textrules

  Since the nom* engine or pep* is completely text based, there
  is only one data type, so [:digit:]+ matches a string of digits,
  but they remain text.
  
  Classes are very simple such as [a-z] or [abc] or [:alnum:] so
  you cant actually combine them like [a-gxyz]. that wont work.
  This is because nom classes are (too) simple. 

TOKENS

  textrules* can only be used inside a block like [:alpha:]+ { ... } because
  the compiler first has to read a block of text to match multi-character
  text. lexrules can occur anywhere in the lexing setion, or in blocks in the
  parsingsection.  Actions can be multiple

   LHS* left-hand-side of the ebnf rule, before the '='
   RHS* right-hand-side of the rule
   alt* an alternation of sequences, eg: a | b | c
   var*        a variable like $counter $lines $server. can be user defined.
   attvar*     a numeric variable like $1,$2 etc refering to an token attribute
   class*    a simple class of characters [a-z] [abcd] [:space:]
   charquoted* is a single quoted character like: 'x'
   quoted*     text between quotes: 'and' 
   interp*     makes special vars interpolate in text
   sequence*   a sequence/list of tokens before the '=' in a rule
   rsequence*  a sequence of parse tokens after the '='
   lattribute* attribute of token on LHS of rule.
   token*    one grammar token (alphabetic word)
   action*   print,delete,quit etc can go in the lex block or rule block
   attrule*  an assignment to a token attribute, eg @1 = $1.$2;
   ruleblock*  code within the {...} after a rule*
   rule*     one grammar parse rule like 'command = name semicolon ;' 
   ruleset*  a list of grammar rules 
   lexrule*     lexing rule, eg: word: [:alpha:]+ ; 
   lexruleset*  a set of lexing rules 
   textrule*    lexing rule involving text like 'and'
   textruleset* a set of textrules (and lexrules) - equivalent to ruleblock* for
                the lexing blocks.
   notset*   used in lookaheads for negativity.
   andset*   using AND logic with classes/charquoted/quoted 
             example: [a-z] and begins 'x'
   orset*    an OR set of classes,quotes etc eg, 'a'|'word'|[a-z]|'b'
   charset*  an OR set of chars eg: 'a'|'b'|'\n' 
   :=*       for attribute assignment.

   literal tokens:
   to       for lexing up to and including end delimiter
   between  for lexing before an end delimiter
   begins
   ends     for text ends-with
   and      for AND logic
   ... many others
   {} grouping 
   () for grouping
   +(...) lookahead groups.
   |  alternation or OR logic
   +  for classes and lexing 
   =  for grammar reduction
   :  for tokenisation (lexing) assignment
   ;  for statement end
   
PROPOSED PHRASES

  # create a time variable
  var $time;
  var $server = 'ssh://etc';

PHRASES

  lexing rules are indicated by the ':' assignment, and 
  grammar rules by the '=' assignment.

  The following phrases appear to be compiling
  ------

  # single line comments allowed
  #<star> multiline comments between these <star>#
  print 'hello'; exit 3; quit; delete; 
  # at eof, delete the pattern space, print text and exit with code '4'
  EOF { delete; print 'yes'; exit 4; }
  # delete all instances of 'green' in the pattern space text.
  delete 'green'; 
  # delete one char from the left of the pattern space
  ltrim; 
  print 'line: $line, char: $char'; # interpolate line number with $line etc
  # use the accumulator counter
  print 'counter is $count';
  # double quotes are allowed.
  print "hello";
  # print text with a newline at the end
  println "hi"; 
  # interpolate special variables in the print string.
  println "the line count is $line";
  # make token 'capword' if the text begins with A-Z
  [:alpha:]+ { capword: begins [A-Z]; }
  lit: 'x'; lit: [0-4]|'a'|'b'|';' ;
  literal: [(){}] ;  # braces as literal tokens

  begins '<' and ends '>' { print "tag"; tag: *; }
  # make token 'word' for all alpha numeric sequences 
  word: [:alnum:]+ ;
  name: * ;          # default lex rule
  match empty { exit; }
  match 'abcd' { print 'hi'; exit; }
  match not empty { print 'Extra char on line $line'; exit 2; }
  [A-Z]+ { 
    # match a,aa,aaa,aaaa etc, same as [a]
    alist: only 'a';
    # match a,ba,ab,aa,bb, etc, same as [ab]
    ablist: only 'ab';
    list: [ab];  the same 
  }

  [:digit:]+ {
    # 'not only doesnt word because it compiles to ![0] 
    0number: begins '0' and not only [0]; 
  }
  # for all alphabetic sequences, if the text begins with '<'
  # and ends with '>' then, if the text begins with '<' make a 
  # "link" parse token, and if not, make a "tag" parse token
  [:alpha]+ {
    match begins '<' and ends '>' { 
      link: begins '<a ';
      tag: *;
    }
  }

  match empty { print 'missing char at char $char'; exit 2; }
  punct: NOT [:alnum:]+ ;  # negated classes
  x: not 'a';
  register: '[' to ']' ;   # 1st item is only 1 char presently
  name: [.:] to '.' ;      # from '.' or ':' to the next '.' 
  item: "/" to ":end" ;  # 2nd item can be a string
  item: '/' TO '/' ;     # ?? same but thows error if no end '/'  
  file: '/' between [:space:]  # up to but not including any space char.
  [:alpha:]+ {
    keyword: 'is'|'to'|'go'; 
    name: 'tree';
    name: [:alpha:];   # this is the default, no plus required
  }
  [a-z]+ {
    num: 'one'|'two';
    # print an error message and quit if no matches
    print 'invalid word\n'; exit 2;
  }

  # negated class blocks
  NOT [:space:]+ { 
    key: '/find/';
    print 'not a space'; exit; 
  }  

  space: ' ';
  newline : '\n';
  # literal: [;:] ;      # def of literal tokens (only in lex part)

  # ------------------------
  # the parsing section - these rules must all come after the
  # lexing rules above

  # check the value of the second word in this parse rule
  phrase = word word {
    "green" == $2 { ...}
    [:space:] == $2 { ...}
    not begins "the" == $story { ...}
  }

  # alternation with same length sequences
  block = '[' statement ']' | '[' statementset ']' ;
  # alternation same length RHS sequences and rule block
  a b = '[' c ']' | '[' c ']' { print "alternation\n"; } 
  # alternation with unequal length sequences, but no rule block.
  a b = c d | e f g;
  # optionals between <...> 
  a = b < x y | p q > c;

  # lookahead syntax with +(...)
  a = ex '*' ex +('/'|'*') ;
  # look ahead with negative rules but tokens must be quoted which
  # is silly unless we are dealing with literal tokens
  a b = c d +(not "f" and not "j");
  a = c d +(not ';' and not '.');

  o = colour shape +(';' | block) ;
  option = name digit ';' ;  # use literal char token in parse rule
  object = colour ':' shape; # lit token, but must define earlier
  () = space word;       # just delete tokens in the parse section

  # check if the stack contains only a list token at end of file
  eof { stack (list) { print "list found\n"; }}
  # check if the parse stack is list or number or float. 
  eof { parse (list|number|float) { print "list found\n"; }}
  EOF: words = words word;   # only reduces at end of stream
  EOF {
    name = first second;
  }
  ,,,,

HISTORY

  25 may 2026
    working on lookahead rules with a rule code block. nearly 
    complete accept for lookahead attribute copy. Just need to 
    get 'push;' list from the LHS token, but may need a variable.
    Or do a fancy "clop;" etc using .reparse continually until
    only "push;" left????
  24 may 2026

    need to turn "charquoted*" into "token*" on the RHS of parse
    rules. Then 'not token*' becomes 'nottoken' 

    optionals seem easier. the hardest is lookahead with rule
    blocks. Optionals can have a block, put it can only have
    actions and lexrules not attrules because we dont know 
    how long the sequence is.

  23 may 2026

    made begin blocks and vardefs etc
    made alternation groups, working for stack(altgroup) but need 
    to parse with " lhs = rsequence (altgroup| etc ". This is 
    so I can build the nom compiled code. I can store the compiled
    code in the altgoup token.

    made a println printline function. made print and println work with
    interpolated text (itext*) token. reformed the comparison syntax to 
    allow no comparison == operator. made some debug rules in the error
    section. 
    todo:

     - begin blocks
     - variable declarations with "var $name;" or "var $name := 'text'; "
     - var decs should go in the begin block.

  22 may 2026
    made a string interpolation token interp*
    need to make begin blocks. redo $1=='green' parsing to make
    it more flexible.
    need to do var declarations in the begin block.

    * made a check attribute value and variable syntax like this
    -------
      "green" == $2 { ...}
      [:space:] == $2 { ...}
      not begins "the" == $story { ...}
    ,,,

  21 may 2026
    lookahead rules with no ruleblock seem to be compiling well.
    Need to add ruleblock, also alternations.
    Also, need to add +(not token) syntax, and 
    +(not ';' and not x) which is a negative lookahead syntax.

  19 may 2026
    made @1,@2 etc. work

    had the idea of lookahead grouping in rules eg
    >> a = b (c|d|e);
    so b will reduce to a, but only if b in followed by c,d or e
    The parse stack would be: sequence '=' rsequence lookahead ;

    This would compile as
    >> "b*c*","b*d*","b*e*" { replace "b*" "a*"; push; push; .reparse }
    but there is a problem with the replace if there is another b* pattern.

  18 may 2026
    wrote the example script /eg/s.url.pss which shows lots of nice
    syntagma syntax.

    implemented unequal length alternation lists with no following 
    code block, such as
    >> a = b c d | e f | g | h | i j ;
    The compilation technique is nothing short of amazing even to me,
    who wrote it. The parse-reduction is actuall 'tail-wise' so 
    that the branches of the alternation start reducing when the 
    ';' literal token is seen. Each branch has a list of 'pop's
    saved in the preceding '|' token attribute, which also indicates
    how many tokens are in that branch. For example, with 
    "...| h | i j ..." the "ij" branch has "pop;pop;" saved in the 
    previous '|' literal token and the "h" branch has "pop;" saved 
    in its previous | token. 

    So, the nom code, compares the 2 pop lists in each '|' token
    and if they are different (meaning the token sequence lists are
    of different lengths) then it immediately compiles the nom
    code for "i j" and saves it in the ';' token attribute (actually
    it adds it to that attribute). So the following 
    >> pop;pop; "i*j*"{ ...LHS...} 
    is added to the ';' attribute and parsing continues.
    But in order to get the code for the LHS* token it actually 
    has to use a "tape variable", which is just a named tape array
    cell at the top of the tape. This is because of the following
    parse sequence
    >> LHS '=' rsequence | rsequence | ... | resequence | rsequence ';'
    Because of the tail-reduction, nom has no idea where LHS is on 
    the stack, and we have to do tail-reductions because of code blocks

    

  17 may 2026
    making attrules for assigning token attributes with @1 := "$2 .. $3";
    lots of progress. alternations with code blocks working.
    rewrote rule parsing, which is now much better and allows
    alternation. I think the nom//until command should really have
    a class argument as well as text, eg: until [abc];
    
  15 may 2026
    added double quotes eg token: "a"|"b";
    still cant do alternation in parse rules like:
    >> colour = r g b | c m k ;
    but the alternation notes section for a way to do it.

    tidied up parsing of 'a' to 'b' etc. made 'match' sometimes
    optional (need to complete). made ignore rule. made better
    grammar* final token parsing. still need a way to match parse
    stack at eof? or try:
    ------
      eof { token = token { print 'yes'; }}
    ,,,,

  14 may
    added 'only': only 'a' means [a]+
    add AND logic, eg: 123number: [:digit:] AND begins [123] 
    which lexes the token '123number' if the text consists only
    of digits and begins with 1,2 or 3.

    added "begins" and "ends" and "not begins" and "begins not" 
    and so on.
  13 may 2026
    I think this is almost good enough to write a sed syntax checker
    as an example of what it can do.

    also need to do, actual composition rules like 
    >> obj = colour shape { $0 := $1.'\n'.$2 ; }

    lots of new syntax, eg: match, star '*' match empty {}
    {} = space word ;  delete token sequences
    word: * ;   # default lexing rules, matches everything even empty

  12 may 2026
    started to adapt this from the toybnf.pss script. Alot
    of progress, all sorts of lexing syntax is now working - see
    the phrases section above. lots and lots of progress - literal 
    tokens, actions like print,exit,delete etc

*#
  begin {
    # I need this variable for variable length sequence alternation
    # such as: a = b c | d ;
    mark "LHS"; clear; ++;
    mark "LHSpush"; clear; ++;
  }

  read; put;
  # line-relative char numbers, but this is overridden by
  # the [:space:] hoover.
  [\n] { nochars; }

  # multiline comments follow the format of nom. (* ... *) look nicer 
  # but I may want to do something with () later
  "#" { 
    (eof) { clear; .reparse } read; 
    !B"#*" { "#\n" { clear; .reparse } whilenot [\n]; } 
    B"#*" { 
      clear; add "starting at line "; lines; put;
      clear; until "*#";
      !E"*#" { 
        clear; add "unterminated multiline comment '#* ... *#'\n"; get;
        print; quit;
      }
    }
    put; clear; add "comment*"; push; .reparse 
  }

  # ignore white-space
  [:space:] { while [:space:]; clear; }

  # literal tokens, () for lookahead token set grouping
  # many of these literal tokens contain "pop;" list which is
  # put there by the rsequence token rules and the notset token    
  # so I will clear the attribute 
  [@0+:{}|()] { add "*"; push; --; put; ++; .reparse }

  # these are used for optionals. like () and +() and | they
  # can also contain a pop; list which indicates the length of the 
  # rsequence which follows. 

  '<','>' { add "*"; push; --; put; ++; .reparse }

  # lex '=' '==' etc
  '=' { while [=]; add "*"; push; --; put; ++; .reparse }

  # I store unequal alternation sequence compiled code here 
  ';' { clear; add " "; put; clear; add ";*"; push; }

  # alternation corresponds directly to noms ',' operator
  # I store pop; lists here, so I need to add the "," nom OR 
  # operator by hand
  '|' { clear; put; clear; add "|*"; push; }

  # the star means everything or anything, not sure about this?
  '*' { clear; add "!''"; put; clear; add "star*"; push; }

  # variables 
  # examples: $1 $2 or $name 
  # 
  "$" { 
    clear; while [:alnum:]; put;
    [:digit:] { 
      nop;
      clip; !"" {
        clear; add "Attribute values ($1,$2,$3...) maximum $9\n";
        print; quit;
      }
      get; 

      # mushroom replacement technique
      replace "9" "++;8;--"; replace "8" "++;7;--";
      replace "7" "++;6;--"; replace "6" "++;5;--";
      replace "5" "++;4;--"; replace "4" "++;3;--";
      replace "3" "++;2;--"; replace "2" "++;1; --";
      replace "1" " get"; add ";"; 
      # remove extra space from lone get.
      " get;" { clop; } put;

      clear; add "attvar*"; push; .reparse
    }


    # special variables. Maybe should have a different syntax
    "line","char","counter","text" {
      # integer accumulator
      "counter" { clear; add "count;"; }
      # automatic number of lines read from input
      "line" { clear; add "lines;"; }
      # automatic number of characters read from input
      "char" { clear; add "chars;"; }
      # text of current tape cell
      "text" { clear; add "get;"; }
      put; clear; add "var*"; push; .reparse
    }

    # I can make the fetch code here, or make it when the var* token
    # is actually used. Same applies above. I am relying on 
    # replace '; get;' '; put;'; 
    # for assignment??

    
    clear; add 'mark "here"; go "'; get; add '"; get; go "here";'; put;
    clear; add "var*"; push; .reparse 
  }

  # digits for token attribute assignment 1-9,
  [1-9] { put; clear; add "digit*"; push; .reparse }

  # [:digit:] { while [:digit:]; put; clear; add "number*"; push; .reparse }

  [:alpha:] { 
    # add the default nom parse token delimiter '*'
    while [:alpha:]; put;

    # these are keywords, but I dont like the capital errors
    # case insensitive. This means that tokens cant use these
    # words??
    lower;

    "begin","parse","stack","only","and","begins","ends",
    "var","match","txt","empty","not","to",
    "check","ignore","next","between","twixt","lit","literal","eof",
    "print","println","trim","ltrim","rtrim","delete","del","exit","quit" { 

      # put the nom command in the attribute
      # fix: divide into 'commands' and others.
      # but should function work on variables, like trim($1) etc????
      "ltrim" { clear; add "clop"; put; clear; add "ltrim"; } 
      "rtrim" { clear; add "clip"; put; clear; add "rtrim"; } 
      "trim" { clear; add "clip; clop"; put; clear; add "trim"; } 
      "exit","quit" { clear; add "quit"; put; clear; add "exit"; }
      "del","delete" { clear; add "clear"; put; clear; add "delete"; }

      "var" { clear; add "declare"; } 
      "parse" { clear; add "stack"; }
      "and" { clear; add "."; put; clear; add "and"; } 
      "begins" { clear; add "B"; put; clear; add "begins"; } 
      "ends" { clear; add "E"; put; clear; add "ends"; } 
      "not" { clear; add "!"; put; clear; add "not"; } 
      "empty" { clear; add "''"; put; clear; add "empty"; } 
      "eof" { clear; add "(eof)"; put; clip; clop; } 
      "twixt" { clear; add "between"; }
      "literal" { clear; add "lit"; }
      add "*"; push; .reparse
    }
    # case sensitive
    clear; get;
    # normal token
    add "*"; put; clear; add "token*"; push; 
  }

  
  "'" { 
    until "'"; put; 
    "''" { clear; add "empty single quote\n"; print; quit; }
    !E"'" { clear; add "unfinished single quote\n"; print; quit; }
    clip; clop; clip; 
    # either 'x' or '\n' etc
    "","\\" { clear; add "charquoted*"; push; .reparse }
    clear; add "quoted*"; push; .reparse
  }

  # try to allow double quotes
  '"' { 
    until '"'; 
    '""' { clear; add "empty double quote\n"; print; quit; }
    !E'"' { clear; add "unfinished double quote\n"; print; quit; }
    # convert to single quotes
    clip; clop; unescape '"'; escape "'"; put;
    clear; add "'"; get; add "'"; put;
    clip; clop; clip; 
    # either 'x' or '\n' etc
    "","\\" { clear; add "charquoted*"; push; .reparse }
    clear; add "quoted*"; push; .reparse
  }

  "[" { 
    until "]"; put; 
    "[]" { clear; add "empty class\n"; print; quit; }
    !E"]" { clear; add "unfinished class\n"; print; quit; }
    clear; add "class*"; push; .reparse 
  }

  !"" { 
    put; clear;
    add "! [syntagma]\n";
    add " bad character '"; get; add "'"; 
    add " at line:"; lines; add " char:"; chars; add "\n";
    add " I just can't go on... sorry, goodbye";
    print; quit;
  }

parse>
  # show the parse-stack reductions. a doubled hash makes it easier
  # to remove from the output with sed '/^##/d'
  add "## line:"; lines; add " char:"; chars; add " "; print; clear; 
  unstack; print; stack; 
  (eof) { add " EOF"; } 
  # show last attribute value if required for debugging.
  add " ("; --; get; ++; add ")"; replace "\n" "\n##    ";
  add "\n"; print; clear;

  # ---------------
  # ERROR parsing. search for 'one token' etc to find these

  # -------------------
  # errors: one token
  pop;

  # -------------------
  # errors: two tokens
  pop;

  !B"var*".!B"attvar*".!B"lattribute*".!B"token*".!B"lit*" {
    E"*:*" {
      clear; add "
      # ----------------------------------------
      # Syntagma :
      #   ':' is the lexing assignment operator. It should be
      #   preceded by a token name or the keyword 'lit'. 
      #   ':' is also used in the attribute assignment token := 
      #   Please dont use 'lit' any other keyword as a token name 
      #   because the syntagma script will not compile, sorry.
      #    example:  lit: '.'|',';  # correct
      #    example: name: '.'|',';  # correct
      #    example: @1 := '$1 / $2';# correct
      #      wrong: var: [a-z]+;    # var is a keyword
      #
      #    keywords are:
      #     begin,parse,stack,only,and,begins,ends,var,match,txt,empty,not,to,
      #     check,ignore,next,between,twixt,lit,literal,eof,
      #     print,println,trim,ltrim,rtrim,delete,del,exit,quit
      # ----------------------------------------
      "; replace "\n    " "\n"; add "\n";
      add " ?* - "; get; add "\n";
      add " :* - "; ++; get; --; add "\n";
      print; quit;

    }
  }

  "begins*class*" {
    clear; 
    add "# The 'begins' keyword cannot be combined with text classes \n";
    add "# on line "; lines; add "\n";
    print; quit;
  }

  "|*)*" {

    clear; add "
    # ----------------------------------------
    # Syntagma :
    #   an alternation | followed by a group bracket ) is 
    #   probably an error. What do you thing?
    # ----------------------------------------
    "; replace "\n    " "\n"; add "\n";
    add " |* - "; get; add "\n";
    add " )* - "; ++; get; --; add "\n";
    print; quit;

  }

  # -------------------
  # errors: three tokens
  pop;

  B"+(*lookgroup*",B"(*altgroup*" { !"+(*lookgroup*".!"(*altgroup*" {
    !E")*".!E"|*" {
      replace "*" " "; ++; ++; ++; put; --; --; --;
      clear; add "
      # ----------------------------------------
      # Syntagma: 
      #   brackets appear to be mismatched: +( and ( should be
      #   terminated with )
      # ----------------------------------------
      "; replace "\n      " "\n"; add "\n";
      add "  (* or +(* - "; get; add "\n";
      add "     group* - "; ++; get; --; add "\n";
      add "         ?* - "; ++; ++; get; --; --; add "\n"; 
      add "parse stack - "; ++; ++; ++; get; --; --; --; add "\n";
      print; quit;
    }}
  }

  B"<*altgroup*".!"<*altgroup*" {
    !E">*".!E"|*" {
      replace "*" " "; ++; ++; ++; put; --; --; --;
      clear; add "
      # ----------------------------------------
      # Syntagma: 
      #   brackets appear to be mismatched: [ should be
      #   terminated with ]
      # ----------------------------------------
      "; replace "\n      " "\n"; add "\n";
      add "         <* - "; get; add "\n";
      add "  altgroup* - "; ++; get; --; add "\n";
      add "         ?* - "; ++; ++; get; --; --; add "\n"; 
      add "parse stack - "; ++; ++; ++; get; --; --; --; add "\n";
      print; quit;
    }
  }


  # -------------------
  # errors: four tokens
  pop;

  # -------------------
  # errors: five tokens
  pop;

  # -------------------
  # errors: six tokens
  pop;

  # -------------------
  # errors: seven tokens
  pop;

  # -------------------
  # errors: 8 tokens or less
  pop;

  # -------------------
  # errors: 9 tokens or less
  pop;

  # -------------------
  # errors: 10 tokens or less
  pop;

  # -------------------
  # errors: 11 tokens or less
  pop;

  # no lexing 
  (eof) {

    # incomplete programs 
    "token*","sequence*","notset*","nottoken*","notsequence*","var*","attvar*",
    "attrule*","tomatch*","betweenmatch*","pattern*","orset*",
    "charset*","andset*","class*","quoted*","charquoted*","number*" {
      swap; add " is a '"; get; add "'"; add "\n\n"; print; quit;
    }

    "LHS*=*rsequence*+(*lookgroup*)*{*ruleblock*" {
      clear; add "
      # ----------------------------------------
      # Syntagma: 
      #   lookahead tokens with code 
      # ----------------------------------------
      "; replace "\n      " "\n"; add "\n";
      add "       LHS* - "; get; add "\n";
      add "         =* - "; ++; get; add "\n";
      add " rsequence* - "; ++; get; add "\n"; 
      add "        +(* - "; ++; get; add "\n"; 
      add " lookgroup* - "; ++; get; add "\n"; 
      add "         )* - "; ++; get; add "\n"; 
      add "         {* - "; ++; get; add "\n"; 
      add " ruleblock* - "; ++; get; add "\n"; 
      print; quit;
    }

    "LHS*=*rsequence*<*altgroup*>*rsequence*",
    "LHS*=*rsequence*<*rsequence*>*rsequence*",
    "LHS*=*rsequence*<*notset*>*rsequence*" {
      clear; add "
      # ----------------------------------------
      # Syntagma: 
      #   optionals between <...>
      # ----------------------------------------
      "; replace "\n      " "\n"; add "\n";
      add "       LHS* - "; get; add "\n";
      add "         =* - "; ++; get; add "\n";
      add " rsequence* - "; ++; get; add "\n"; 
      add "         <* - "; ++; get; add "\n"; 
      add "rseq/group* - "; ++; get; add "\n"; 
      add "         >* - "; ++; get; add "\n"; 
      add " rsequence* - "; ++; get; add "\n"; 
      print; quit;
    }

    # print each token and attribute value.
    "textmatch*{*" {
      clear; 
      add "textmatch* - "; get; add "\n";
      add "        {* - "; ++; get; --; add "\n"; 
      print; quit;
    }

    # left hand side of parse rule.
    "LHS*=*" {
      clear; 
      add " LHS* - "; get; add "\n";
      add "   =* - "; ++; get; add "\n"; 
      print; quit;
    }


    # print each token and attribute value. +( should have a list of pops
    # this should help debugging lookahead syntax
    "(*altgroup*)*","(*altgroup*|*","<*altgroup*>*","<*altgroup*|*" {
      clear; add "
      # ----------------------------------------
      # Syntagma: 
      #   alternation groups are used for optionals, lookaheads, and 
      #   alternation within a rule. Usually each 'branch' of the alternation
      #   needs to have the same number of tokens or literal characters.
      # ----------------------------------------
      "; replace "\n      " "\n"; add "\n";
      add "   <* or (* - "; get; add "\n";
      add "  altgroup* - "; ++; get; --; add "\n";
      add ">* |* or (* - "; ++; ++; get; --; --; add "\n"; 
      print; quit;
    }

    # print each token and attribute value. +( should have a list of pops
    # this should help debugging lookahead syntax
    "+(*lookgroup*)*","+(*lookgroup*|*" {
      clear; 
      add "       +(* - "; get; add "\n";
      add "lookgroup* - "; ++; get; --; add "\n";
      add "  |* or )* - "; ++; ++; get; --; --; add "\n"; 
      print; quit;
    }

    # alternation groups and optionals 
    "LHS*=*rsequence*(*altgroup*)*","LHS*=*rsequence*<*altgroup*>*",
    "LHS*=*rsequence*(*rsequence*)*","LHS*=*rsequence*<*rsequence*>*" {
      clear; add "
      # ----------------------------------------
      # Syntagma:
      #   nearly a parse rule. [..] is used for optionals and 
      #   (..) is used for grouping alternatives
      # ----------------------------------------
      "; replace "\n      " "\n"; add "\n";
      add "      LHS* - "; get; add "\n";
      add "        =* - "; ++; get; add "\n";
      add "rsequence* - "; ++; get; add "\n";
      add "  <* or (* - "; ++; get; add "\n";
      add " altgroup* - "; ++; get; add "\n";
      add "  >* or )* - "; ++; get; add "\n"; 
      print; quit;
    }

    # debug lookahead groups
    "LHS*=*rsequence*+(*lookgroup*)*;*" {
      clear; add "
      # ----------------------------------------
      # Syntagma:
      # ----------------------------------------
      "; replace "\n      " "\n"; add "\n";
      add "      LHS* - "; get; add "\n";
      add "        =* - "; ++; get; add "\n";
      add "rsequence* - "; ++; get; add "\n";
      add "       +(* - "; ++; get; add "\n";
      add "lookgroup* - "; ++; get; add "\n";
      add "        )* - "; ++; get; add "\n"; 
      add "        ;* - "; ++; get; add "\n"; 
      print; quit;
    }

    "textmatch*==*var*" {
      clear; add "
      # ----------------------------------------
      # Syntagma syntax:
      #  the program is incomplete. 
      # ----------------------------------------
      "; replace "\n      " "\n";
      # print each token and attribute value. 
      add "textmatch* - "; get; add "\n";
      add "       ==* - "; ++; get; --; add "\n";
      add "      var* - "; ++; ++; get; --; --; add "\n"; 
      print; quit;
    }

    "LHS*=*rsequence*" {
      clear; add "
      # ----------------------------------------
      # Syntagma:
      #   you wrote a partial program. add a ';' to complete the rule
      #   and a lexing rule as well.
      # ----------------------------------------
      "; replace "\n      " "\n";
      add "      LHS* - "; get; add "\n";
      add "        =* - "; ++; get; --; add "\n";
      add "rsequence* - "; ++; ++; get; --; --; add "\n"; 
      print; quit;
    }


    "{*action*}*" {
      clear; add "
      # ----------------------------------------
      # Syntagma syntax:
      #  the program is incomplete. 
      # ----------------------------------------
      "; replace "\n      " "\n";
      # print each token and attribute value. 
      add "      {* - "; get; add "\n";
      add " action* - "; ++; get; --; add "\n";
      add "      }* - "; ++; ++; get; --; --; add "\n"; 
      print; quit;
    }

    "(*notset*)*" {
      clear; add "
      # ----------------------------------------
      # Syntagma syntax:
      #  a 'notset' is for negative lookaheads and groups
      #  example: (not (a b) and not (b c))
      # ----------------------------------------
      "; replace "\n      " "\n";
      # print each token and attribute value. 
      add "      (* - "; get; add "\n";
      add " notset* - "; ++; get; --; add "\n";
      add "      )* - "; ++; ++; get; --; --; add "\n"; 
      print; quit;
    }




    "rule*","ruleset*" {
      clear;
      add "\n";
      add "# ----------------------------------------\n";
      add "# Syntagma friendly advice: \n";
      add "#   You need at least 1 lexing rule with your parsing rules\n";
      add "# Example (a well-known esoteric language):\n";
      add "#   lit: '['|']';         # lex literal tokens []\n";
      add "#   inst: [-+><.,];       # lex instructions -+><.,\n";
      add "#   block = '[' inst ']' | '[' prog ']' | '[' ']'; # a parse rule \n";
      add "#   prog = inst inst | inst block | prog inst | prog block; \n";
      add "#   eof { ()=prog { print 'valid BF program \\n'; exit;}} \n";
      add "# ----------------------------------------\n\n";
      get; add "\n\n"; 
      print; quit;
    }

    "textrule*","textruleset*" {
      clear; add "
      # ----------------------------------------
      # Syntagma syntax therapy:
      #  text-rules are for using within blocks either in the lexing phase
      #  or the parsing phase, here is an example:
      #    [:alpha:]+ { shape: 'circle'|'square'; word:*; }
      # ----------------------------------------
      "; replace "\n     " "\n";
      get; add "\n\n"; print; quit;
    }

    # interpolated text
    "print*itext*" {
      clear; add "
      # ----------------------------------------
      # Interpolated text: 
      #    example:  
      #    print '$1:$2'; 
      # ----------------------------------------
      "; replace "\n     " "\n";
      ++; get; --; add "\n\n"; print; quit;
    }

  }


  push;push;push;push;push;push;push;push;push;push;push;

  # end of error parsing
  #-----------------------
  # 1 token parsing
  pop;

  # currently ignoring comments but it would be nice to transfer
  # to compiled nom code.
  "comment*" { clear; .reparse }

  #-----------------------
  # 2 token parsing
  pop;

  "not*token*" {
    clear; add '!"'; ++; get; --; add '"'; put;
    clear; add "nottoken*"; push; .reparse
  }

  "(*nottoken*","+(*nottoken*","(*notsequence*","+(*notsequence*" {
    clear; add "(*notset*"; push; push; .reparse   
  }

  # a phantom beginblock
  "begin*{*" {
    add "beginblock*"; push; push; push; .reparse
  }
  
  # a phantom beginblock
  "beginblock*action*","beginblock*lexrule*","beginblock*vardef*" {
    clear; get; !"" { add "\n"; } ++; get; --; put;
    clear; add "beginblock*"; push; .reparse
  }

  # integrate the begin block into the script.
  "start*lexrule*" {
    clear; get; ++; get; --; put;
    clear; add "lexruleset*"; push; .reparse
  }

  # some simple literal token combinations

  # +( will be the lookahead group token. This will also store the
  # list of pop;pop;... just like = and | and ( - if I do alternation groups
  # which I will.
  "+*(*" { clear; put; add "+(*"; push; .reparse }

  # this is the token attribute assignment operator
  ":*=*" {
    clear; add ":=*"; push; .reparse 
  }

  # simplifying parse rules with context token unification
  # example: begins 'x' and ends 'y' {
  # example: 'a'|'b'|[1-9] {
  "andset*{*","charset*{*","orset*{*","quoted*{*","star*{*",
  "empty*{*","charquoted*{*","class*{*" {
    clear; add "textmatch*{*"; push; push; .reparse
  }

  # simplifying parse rules with context token unification
  # example: begins 'x' and ends 'y' == $1 {...}
  # example: "green" == $colour {...}
  # compile: clear; mark "here"; go "colour"; get; go "here"; "green" {...}
  # example: begins "green" == $3 {...}
  # compile: clear; ++;++; get; --;--; B"green" {...}

  # comparisons of variables with textmatches like classes, strings, etc
  # or just comparison with the pattern-space
  B"andset*",B"orset*",B"quoted*",B"star*",B"empty*",B"charquoted*",B"class*" {
    E"==*",E"var*",E"attvar*" {
      E"==*" { clear; add "textmatch*==*"; }
      E"var*" { clear; add "textmatch*var*"; }
      E"attvar*" { clear; add "textmatch*attvar*"; }
      push; push; .reparse
    }
  }

  
  # example: $count '1' (or) == empty (or) $1 [0-9]
  B"==*",B"var*",B"attvar*" {
    E"andset*",E"orset*",E"quoted*",E"star*",E"empty*",E"charquoted*",E"class*" {
      push; clear; add "textmatch*"; push; .reparse
    }
  }

  # reverse the order of comparisons 
  "var*textmatch*","attvar*textmatch*" {
    clear; get; ++; swap; --; put;
    clear; add "textmatch*var*"; push; push; .reparse
  }

  # reinsert the comparison operator == and copy attribute 
  # the comparison operator only serves to make the parse stack more
  # comprehensible.
  "textmatch*var*" {
    clear; ++; get; ++; put; --; clear; put; --; 
    clear; add "textmatch*==*var*"; push; push; push; .reparse
  }

  # use a lexblock phantom token here? no because that allows
  # empty lex rule blocks which seems silly.

  # context-induced parse-token simplification 
  "class*and*","quoted*and*","charquoted*and*" {
     clear; add "andset*and*"; push; push; .reparse
  }

  # this is a nice way to ensure that only the right sort of 
  # tokens can go into a block {...} that follows a parse-reduction rule
  # I am not sure if I should allow lex rules here but I will for now.
  # example: a = b c { exit; }
  "ruleblock*action*","ruleblock*textrule*","ruleblock*attrule*",
  "ruleblock*lexrule*" {
    # join token with newline unless the 1st is a phantom ruleblock
    clear; get; !"" { add "\n"; } ++; get; --; put;
    clear; add "ruleblock*"; push; .reparse
  }

  # interpolate variables into text.
  "print*quoted*" { add "interp*"; push; push; push; .reparse }
  "print*charquoted*" { clear; add "print*quoted*"; push; push; .reparse }
  "println*quoted*" { add "interp*"; push; push; push; .reparse }
  "println*charquoted*" { clear; add "print*quoted*"; push; push; .reparse }

  # This is way to interpolate variables into a string,
  "quoted*interp*" {
    clear; get; 
    # need to normalise quotes for interpolation
    B"'".E"'" { clip;clop; } B'"'.E'"' { clip;clop; } put;
    clear; add 'add "'; get; add '"'; put;
    
    # special line and char and counter 'variables'
    # the number of lines read from the input stream
    replace "$line" '"; lines; add "';
    # the number of chars read from the input stream
    replace "$char" '"; chars; add "';
    # access the pep machine accumulator
    replace "$counter" '"; count; add "';
    # text is the text in the current tape cell 
    replace "$text" '"; get; add "';
    # get the parse-stack?
    # replace "$stack" "'; ++;++;++;put;--;--;--; d;stack;swap;get; add '";

    # can I replace any variable here?
    
    # the $n variables which are token attribute values
    replace "$1" '"; get; add "';
    replace "$2" '"; ++; get; --; add "';
    replace "$3" '"; ++;++; get; --;--; add "';
    replace "$4" '"; ++;++;++; get; --;--;--; add "';
    replace "$5" '"; ++;++;++;++; get; --;--;--;--; add "';
    replace "$6" '"; ++;++;++;++;++; get; --;--;--;--;--; add "';

    # an optimisation!! remove empty add commands
    # replace "add '';" ""; replace 'add "";' '';

    # remove extra space from lone get.
    " get;" { clop; } add ";"; put;
    clear; add "itext*"; push; .reparse
  }

  # LHS token attribute assignment
  # example: @3
  # compile: ++;++; put; --;--;
  # example: @4
  # compile: ++;++;++; put; --;--;--;
  "@*digit*" {
    clear; add "@"; ++; get; --; 
    replace "@9" "++;@8;--"; replace "@8" "++;@7;--";
    replace "@7" "++;@6;--"; replace "@6" "++;@5;--";
    replace "@6" "++;@5;--"; replace "@5" "++;@4;--";
    replace "@4" "++;@3;--"; replace "@3" "++;@2;--";
    replace "@2" "++; @1; --"; replace "@1" "put"; add ";";

    put;
    clear; add "lattribute*"; push; .reparse
  }

  # variable length alternation sequences will compile code into 
  # this '{' token attribute, so I want to make sure that it is 
  # empty. no, fix: 
  "rsequence*{*" { 
    clear; ++; put; --; add "rsequence*{*"; 
    # dont reparse because you get an infinite loop
  }

  # set up rsequence parsing, also in alternation-groups
  # example: a = b c ( e f | g h ) i j ; # alternation group
  # example: a = b c +( e f | g h );   # lookahead group
  # example: a = b < e | h > x y;      # rsequence in and after optional 
  "=*token*","|*token*","(*token*",")*token*",
  "+(*token*","<*token*",">*token*" {
    push;
    # store pop list in = attribute
    clear; --; add "pop;"; put; ++;
    clear; add '"'; get; add '"'; 
    # dont double-wrap not-tokens in quotes
    B'"!"'.E'""' { clip; clop; } put;
    # reverse not ends with. fix:
    B'E!' { clop; clop; put; clear; add "!E"; get; } put;
    clear; add "rsequence*"; push; .reparse
  }

  # just put a pop; list in | this is used by lookgroups etc 
  "|*charquoted*" {
    clear; add "pop;"; put; clear; add "|*charquoted*";
  }

  # set up rsequence parsing with literals, also in alternations
  # see the 3 token rule for '|*charquoted*' etc
  # also for lookahead groups
  # example: '=' rsequence = '=' name ;
  # example: '(' rsequence = '(' name ;
  "=*charquoted*","(*charquoted*","+(*charquoted*","<*charquoted*" {
    push;
    # convert 'x' to x* for literal tokens
    clear; get; 
    # fix: also handle negated literal characters? these are
    # useful in lookaheads and other circumstances.
    # but I think I need a separate token. notcharquoted*
    # example: !";" -> !";*" ????
    
    clip; clop; add "*"; 
    # fix: # B"'",B"!'" { add "'"; } 
    put; 
    # store pop list in = or ( or +( or [ attribute
    clear; --; add "pop;"; put; ++;
    clear; add '"'; get; add '"'; put;
    clear; add "rsequence*"; push; .reparse
  }



  # get the next character into the pattern space or nothing if EOF
  # example: next;
  # compile: !(eof) { read; }

  "next*;*" {
    clear; add "!(eof) { read; }"; put;
    clear; add "lexrule*"; push; .reparse
  }

  "between*to*","between*not*","between*ends*","between*begins*" {
    replace "between*" ""; clip; put;
    clear; add "cant mix 'between' and '"; get; 
    add "' key words (at line "; lines; add ")'\n"; print; quit;
  }
  
  "to*between*","to*not*","to*ends*","to*begins*" {
    replace "to*" ""; clip; put;
    clear; add "cant mix 'to' and '"; get; 
    add "' key words (at line "; lines; add ")'\n"; print; quit;
  }
  # reduce number of tokens
  # example: [a-z] to -> parse: pattern*to*
  "class*to*","charquoted*to*","quoted*to*" {
    clear; add "pattern*to*"; push; push; .reparse
  }

  # example: to '/end' -> parse: to*pattern*
  # no classes here, because nom://until cant do it.
  "to*charquoted*","to*quoted*" {
    clear; add "to*pattern*"; push; push; .reparse
  }

  # example: [a-z] between -> parse: pattern*between*
  "class*between*","charquoted*between*","quoted*between*" {
    clear; add "pattern*between*"; push; push; .reparse
  }

  # example: between [:space:] -> parse: between*pattern*
  "between*charquoted*","between*quoted*" {
    clear; add "between*pattern*"; push; push; .reparse
  }


  
  # the 'match' keyword is actually optional, it is just supposed
  # to emphasis that some text is being matched.
  "match*class*","match*quoted*","match*charquoted*",
  "match*eof*","match*empty*","match*orset*","match*andset*",
  "match*tomatch*","match*betweenmatch*" {
     clop;clop;clop;clop;clop;clop; push; get; --; put; ++; 
     clear; .reparse
  }


  # syntactic sugar, 
  # example: only 'abc'  or 'a'
  # compile: [abc] or [a]
  "only*quoted*","only*charquoted*" {
    clear; ++; get; --;
    B"B","E","!" { 
      clear; add "cant combine 'only' with 'begins/ends/not'\n";
      print; quit;
    } 
    clip; clop; put;
    clear; add "["; get; add "]"; put;
    clear; add "class*"; push; .reparse
  }

  # allow negation of classes etc
  "not*class*","not*quoted*","not*charquoted*","not*empty*","not*eof*" {
    replace "not*" ""; ++; ++; put; --; --;
    clear; add "!"; ++; get; --; put; 
    clear; ++; ++; get; --; --; push; .reparse 
  }

  # allow negation of tokens, wrap in quotes
  "not*token*" {
    clear; add '!"'; ++; get; --; add '"'; put; 
    clear; add "token*"; push; .reparse 
  }


  # text begins with and text ends with. 
  "begins*quoted*","begins*charquoted*","ends*quoted*","ends*charquoted*" {
    B"ends*" { replace "ends*" ""; }
    B"begins*" { replace "begins*" ""; }
    push;
    --; get; ++; get; 
    # 'begins-not' needs to be 'not-begins' in nom etc
    B"E!" { clop; clop; put; clear; add "!E"; get; } 
    B"B!" { clop; clop; put; clear; add "!B"; get; } 
    # print; quit;
    --; put; ++;
    clear; .reparse
  }

  "comment*comment*" { 
    clear; get; add "\n"; ++; get; --; put;
    clear; add "comment*"; push; .reparse 
  }

  # how to include comments
  #*
  "comment*lexrule*","lexrule*comment*","lexruleset*comment*" { 
    clear; get; add "\n"; ++; get; --; put;
    clear; add "lexruleset*"; push; .reparse 
  }
  *#

  "token*token*","sequence*token*" {
    # count tokens to calculate "push;" later
    a+;
    clear; get; ++; get; --; put; 
    clear; add "sequence*"; push; .reparse
  }

  # allow literal chars in sequences if they have already been 
  # declared with lit: [abc]; (or) lit: ';'|':';
  # eg: option = word number ';' ;
  "token*charquoted*","sequence*charquoted*" {
    # count tokens to calculate "push;" later
    a+;
    # convert 'x' to x* for literal tokens
    clear; ++; get; clip; clop; add "*"; put; --;
    clear; get; ++; get; --; put; 
    clear; add "sequence*"; push; .reparse
  }

  # allow literal chars to begin sequences if they have already been 
  # declared with lit: [abc]; (or) lit: ';'|':';
  # eg: option = '(' obj ')' ;
  # charquoted.sequence should not occur.
  "charquoted*token*","charquoted*charquoted*","charquoted*sequence*" {
    # count tokens to calculate "push;" later
    a+;
    # convert 'x' to x* for literal tokens
    clear; get; clip; clop; add "*"; put;
    clear; get; ++; get; --; put; 
    clear; add "sequence*"; push; .reparse
  }

  # eg: opt = '(' ')' ;
  "charquoted*charquoted*" {
    a+;
    # convert 'x' to x* for literal tokens
    clear; get; clip; clop; add "*"; put;
    clear; ++; get; clip; clop; add "*"; put; --;
    clear; get; ++; get; --; put; 
    clear; add "sequence*"; push; .reparse
  }

  # need to construct the LHS here. using 'stack' is much easier
  # but feels a bit lazy, and I quite like being reminded how many 
  # tokens I am pushing.
  # example: a b c = 
  # compile: "clear; add 'a*b*c*'; push;push;push; .reparse"
  #      or: "clear; add 'a*b*c*'; stack; .reparse"
  "token*=*","sequence*=*" {
    # later have to transform this count number into
    # push; or push;push; etc
    clear; get; a+; count; put; clear; 
    # reset the token counter for the RHS 
    zero; 
    clear; add 'clear; add "'; get; add '#;';
    # 6 token limit for left-hand-side which is more than enough
    # look-ahead or context?
    replace "1#;" '"; push;';
    replace "2#;" '"; push;push;';
    replace "3#;" '"; push;push;push;';
    replace "4#;" '"; push;push;push;push;';
    replace "5#;" '"; push;push;push;push;push;';
    replace "6#;" '"; push;push;push;push;push;push;';
    add " .reparse"; put;
    # save into top of tape for variable length alternations
    mark "here"; go "LHS"; put; go "here";
    clear; add "LHS*=*"; push; push; .reparse
  }

  #*
  no, old rule, remove
  "token*;*","sequence*;*" {
    clear; get; a+; count; put;
    clear; zero; add "RHS*"; push; .reparse
  }
  *#

  # just simplify parse rules, while maintaining the separation
  # between the lexing and parsing sections.
  "lexrule*rule*" { clear; add "lexruleset*rule*"; }
  "lexruleset*rule*" { clear; add "lexruleset*ruleset*"; }

  "lexruleset*ruleset*".(eof) { 
    clear; 
    add "# -------------------------------------\n";
    add "# nom script created by www.nomlang.org/eg/syntagma.pss\n\n";
    add "begin { nop; }\nread; put; \n"; get; 

    # if the parser doesn't consume or delete character from the 
    # input stream, then it is an error. stop the show.
    add "\n!'' { \n";
    add "  put; clear; add 'unlexed character \"'; get; add '\" ';\n";
    add "  add 'at line '; lines; add ' of input.\\n'; \n";
    add "  add 'All characters in the input should be lexed or ignored\\n'; \n";
    add "  print; clear; zero; a-; a-; quit; \n";
    add "}";
    add "\n\nparse>\n"; 
    add "# show the parse-stack reductions.\n";
    add 'add "## line:"; lines; add " char:"; chars; ';
    add 'add " "; print; clear; \n';
    add 'unstack; print; stack; (eof) { add " EOF"; }  \n';
    add '# show last attribute if required.\n';
    add '# add " ("; --; get; ++; add ")"; \n';
    add '# replace "\\n" "\\n##      ";\n';
    add 'add "\\n"; print; clear;\n';
    ++; get; --; put; 
    clear; add "grammar*"; push; .reparse
  }

  # lists of textrules (eg: keyword:'to'|'is'|'a';)
  # if we mix lexrules with text then they become textrules
  "textrule*action*","textruleset*action*",
  "textrule*textrule*","textrule*lexrule*","textruleset*textrule*",
  "lexrule*textrule*","lexruleset*textrule*","textruleset*lexrule*" {
    # dont add a newline to a phantom block
    clear; get; !"" { add "\n"; } ++; get; --; put;
    clear; add "textruleset*"; push; .reparse
  }

  "lexrule*lexrule*","lexruleset*lexrule*","action*lexrule*",
  "lexruleset*lexrule*","lexrule*action*","lexruleset*action*" {
    clear; get; add "\n"; ++; get; --; put;
    clear; add "lexruleset*"; push; .reparse
  }

  "rule*rule*","ruleset*rule*","rule*action*",
  "action*rule*","ruleset*action*" {
    clear; get; add "\n"; ++; get; --; put;
    clear; add "ruleset*"; push; .reparse
  }

  "delete*;*","trim*;*","ltrim*;*","rtrim*;*" {
    clear; get; add "; put; "; put;
    clear; add "action*"; push; .reparse
  }
  "exit*;*" {
    clear; get; add ";"; put;
    clear; add "action*"; push; .reparse
  }


  "action*action*" {
    clear; get; add "\n"; ++; get; --; put;   
    clear; add "action*"; push; .reparse
  }

  # do not allow actionblock to contain attribute rules.
  "actionblock*action*","actionblock*lexrule*" {
    clear; get; add "\n"; ++; get; --; put;   
    clear; add "actionblock*"; push; .reparse
  }

  #-----------------------
  # 3 token parsing
  pop;

  "notset*and*notsequence*","notset*and*nottoken*" {
     clear; get; ++; ++; add "."; get; --; --; put;
     clear; add "notset*"; push; .reparse 
  }

  # an actionblock* cannot contain attrules* because we dont know the
  # length of the sequence.
  "altgroup*>*{*",">*rsequence*{*" {
    push; push; push; clear; put;
    add "actionblock*"; push; .reparse
  }

  "declare*var*;*" {
    # the var already has fetch code in it...
    clear; ++; get; --; 
    replace 'mark "here"; go' 'mark';
    replace 'get; go "here";' ''; add '++;'; put;
    clear; add "vardef*"; push; .reparse
  }

  # reverse the order
  "var*==*textmatch*","attvar*==*textmatch*" {
    clear; get; ++;++; swap; --;--; put;
    clear; add "textmatch*==*var*"; push; push; push; .reparse
  }

  # A phantom textruleset to start the block
  "==*attvar*{*","==*var*{*","rsequence*)*{*" {
    push; push; push; 
    put; add "textruleset*"; push; .reparse
  }

  # Let check for empty brackets (because of the phantom token above.
  # fix: put in the error section?
  "{*textruleset*}*","{*ruleblock*}*" {
    ++; swap; "" {
      add "Empty block braces {} found at line "; lines; add "\n"; 
      print; quit;
    } 
    swap; --;
  }


  # for deleting tokens and maybe checking, I was using 0 but I need that
  # for a number.
  # example: () = a b ;
  # compile: "a*b*" { clear; .reparse }
  "(*)*=*" {
    clear; add "clear; .reparse"; put;
    clear; add "LHS*=*"; push; push; .reparse 
  }

  #*
  # a lookahead grouping, for a single token sequence. this is not as
  # useful as (a|b|c) for lookaheads.
  # example: a = b (c d);
  # compile: "b*c*" { replace "b*c*d*" "a*c*d*"; push; push; .reparse }
  #      or: B"b*".E"c*" { replace "b*" "a*"; push; push; .reparse }
    now look at the more complicated
    example: a b = c d e (a b|c e); # must be equal length alternations
    compile: 
    pop;pop;pop;
    B"c*d*e*".!"c*d*e*" { 
      # add a start marker like '#'
      E"a*b*",E"c*e*" { 
        # add start marker, somehow
        replace "#c*d*e*" "a*b*"; push;push;push;push; 
        # !! now need to copy attributes from a b and c e to new
        # positions. this will be challenging.
        .reparse
      }
    }
    push;push;pus;
  *#

  # sequence alternations within (..) and [...] are called altgroups
  # example: ('.' x | ',' y )
  "(*rsequence*|*","<*rsequence*|*" {
    replace "rsequence*" "altgroup*"; push; push; push;
    clear; --; --; get; 
    put; ++; ++; clear; .reparse
  }

  "+(*rsequence*)*","+(*rsequence*|*" {
    replace "rsequence*" "lookgroup*"; push; push; push;
    clear; --; --; add "E"; get; 
    # reverse not-ends-with, for not tokens for example
    B'E!' { clop; clop; put; clear; add "!E"; get; }
    put; ++; ++; 
    clear; .reparse
  }

  # I need this avoid a class with lexing alternations (charset* token)
  # because a charset = charset | charset;
  "rsequence*|*charquoted*","altgroup*|*charquoted*","lookgroup*|*charquoted*" {
    push; push;
    # convert 'x' to "x*" for literal tokens
    # convert !'x' to !'x*' for negated literal tokens
    clear; get; clip; clop; add "*"; put; 
    clear; add '"'; get; add '"'; put;
    # store pop list in = attribute
    clear; --; add "pop;"; put; ++;
    clear; add "rsequence*"; push; .reparse
  }

  # part of the new rule parsing code. An rsequence is a sequence
  # of tokens on the right hand side of the = 

  # example: = a b c 
  # compile: = rseq
  # example: + ( '.' b | c d )
  # compile: +(*rsequence*|*rsequence*)*

  "=*rsequence*token*","|*rsequence*token*",
  "(*rsequence*token*","+(*rsequence*token*","<*rsequence*token*",
  ")*rsequence*token*",">*rsequence*token*" {
    # save the context token
    push;
    # store pop list in the '=' or '|' attribute. This will be used
    # for compilation later, but also to check rsequence lengths
    clear; --; get; add "pop;"; put; ++;
    # wrap sequence in quotes
    clear; get; clip; ++; get; add '"'; --; put;
    clear; add "rsequence*"; push; .reparse
  }

  # rule sequences with literals
  # example: = a 'c' 
  # compile: = rseq
  # example: ( a b '#' 
  "=*rsequence*charquoted*","|*rsequence*charquoted*",
  "(*rsequence*charquoted*","+(*rsequence*charquoted*",
  "<*rsequence*charquoted*" {
    # save the context token
    push;

    # convert 'x' to x* for literal tokens
    clear; ++; get; clip; clop; add "*"; put; --;
    # store pop list in the '=' or '|' attribute. This will be used
    # for compilation later, but also to check rsequence lengths
    clear; --; get; add "pop;"; put; ++;
    # wrap sequence in quotes
    clear; get; clip; ++; get; add '"'; --; put;
    clear; add "rsequence*"; push; .reparse
  }


  # the second item can be a string because we compile to 'until'
  # but second item cant be a class because of 'untils' limitations
  # this compiles a incomplete snippet that will be completed later
  # example: '[' to ']' 
  # example: [:;] to '/end'
  # compile: '[' { until ']'; put; 
  "pattern*to*pattern*" {
    clear; get; add ' { until '; ++; ++; get; --; --;
    add '; put; '; put; clear; add "tomatch*"; push; .reparse 
  }

  # this compiles a incomplete snippet that will be completed later
  # example: '[' between [:space:] 
  # compile: '[' { whilenot [:space:]; put; 
  # example: [:;] until '/'
  # bug: this is allowing 'a' between 'ab' because everything is a 
  #  pattern. 
  "pattern*between*pattern*" {
    clear; ++; ++; get; 
    # convert from quoted to class
    B"'".E"'" { 
      clip; clop; "]" { clear; add "\\]"; } put;
      clear; add "["; get; add "]"; put; 
    }
    --; --; 
    clear; get; add ' { whilenot '; ++; ++; get; --; --;
    add '; put; '; put; clear; add "betweenmatch*"; push; .reparse 
  }


  # and logic, but this cannot be mixed with OR | logic  
  # remember that the quoted* class* and charquoted* attributes can 
  # actually contain 'not' logic and 'begins with' logic etc, so these
  # tokens are somewhat badly managed. 
  # fix: I can remove all the class.and.quoted rules etc because this
  # is delt with by:
  #   >> andset and = class and | quoted and | charquoted and ;
  "andset*and*quoted*","andset*and*class*","andset*and*charquoted*",
  "class*and*quoted*","class*and*class*","class*and*charquoted*",
  "quoted*and*quoted*","quoted*and*class*","quoted*and*charquoted*",
  "charquoted*and*quoted*","charquoted*and*class*",
  "charquoted*and*charquoted*" {
    clear; get; ++; get; ++; get; --; --; put;
    clear; add "andset*"; push; .reparse
  }

  "delete*quoted*;*","delete*charquoted*;*" {
     clear; add "replace "; ++; get; --; add " '';"; put;
     clear; add "action*"; push; .reparse
  }
  
  # print statements
  # example: print 'error at line: $line \n';
  # compile: clear; add 'error at line:'; lines; add "\n"; print; clear;
  "print*itext*;*" {
    clear; add "clear; "; ++; get; --; add " print; clear;"; put;
    clear; add "action*"; push; .reparse
  }

  # the same but adds a newline
  "println*itext*;*" {
    clear; add "clear; "; ++; get; --; add " add '\\n'; print; clear;"; put;
    clear; add "action*"; push; .reparse
  }

  # example: print 'x';
  "print*charquoted*;*" {
    clear; add "clear; add "; ++; get; --; add "; print; clear;"; put;
    clear; add "action*"; push; .reparse
  }

  # delete from the input stream all following matching chars
  "ignore*class*;*" {
    clear; 
    add "# ignore-rule \n";
    add "while "; ++; get; add "; "; get; --; add " { clear; }"; put;
    clear; add "lexrule*"; push; .reparse
  }

  # eg: EOF: name = capital lowerchars; 
  # example: eof: print "hi";
  "eof*:*rule*","eof*:*action*" {
    replace ":*" "{*"; add "}*";
    push; push; push; push; .reparse
  }

  # simplify lex parsing, 
  # textrules and lexrules can only occur in the lexing phase of
  # the syntagma script.
  "{*textrule*}*","{*lexrule*}*","{*lexruleset*}*" {
    clear; add "{*textruleset*}*"; 
    push; push; push; .reparse
  }

  # indent code in braces
  "{*textruleset*}*","{*action*}*","{*ruleset*}*",
  "{*ruleblock*}*","{*beginblock*}*" {
    push; push; push;
    add "\n"; --; --; get; replace "\n" "\n  "; put; ++; ++;
    clear; pop; pop; pop; 
  }

  # orsets
  "quoted*|*quoted*","quoted*|*charquoted*","quoted*|*class*" {
    clear; get; add ","; ++; ++; get; --; --; put;
    clear; add "orset*";
  }
  "charquoted*|*quoted*","class*|*quoted*" {
    clear; get; add ","; ++; ++; get; --; --; put;
    clear; add "orset*";
  }

  "orset*|*quoted*","orset*|*charquoted*","orset*|*class*" {
    clear; get; add ","; ++; ++; get; --; --; put;
    clear; add "orset*";
  }

  # but these should be able to be expressed by classes like [ab\n]
  # charsets eg: 'a'|'b'|'\n'
  #          eg: [a-z]|'x'|'y'
  "charquoted*|*charquoted*","charquoted*|*class*","charset*|*charquoted*",
  "class*|*charquoted*","charset*|*class*","class*|*class*" {
    clear; get; add ","; ++; ++; get; --; --; put;
    clear; add "charset*";
  }

  # eg: exit 4;
  # compile: zero; a+; a+; a+; a+; quit;
  "exit*digit*;*" {
    clear; add "zero; "; ++; get; --; add "#";
    # a rather silly trick, todo, negative numbers
    replace "5#" "4# a+;"; replace "4#" "3# a+;";
    replace "3#" "2# a+;"; replace "2#" "1# a+;";
    replace "1#" "0# a+;"; replace "0#" ""; 
    add " quit;"; put;
    clear; add "action*"; push; .reparse
  }

  #-----------------------
  # 4 token parsing
  pop;

  # allow negation of sequences of tokens on the right-hand-side of 
  # a parse rule. These can be used in "notsets" which are and logic
  # sets of negated tokens or sequences of tokens. 
  # example: e = e '*' e +(not ('*' e) and not ('/' e));
  "not*(*rsequence*)*" {
    clear; ++; ++; add "!"; get; --; --; put;
    # put the pop; list in the previous invisible token ( ) | = etc
    # clear; ++; get; --; --; put; ++;
    clear; add "notsequence*"; push; .reparse
  }

  # like awks begin blocks
  "begin*{*beginblock*}*" {
    clear; add "begin {"; ++;++; get; --;--; add "\n}\n"; put;
    clear; add "start*"; push; .reparse
  }

  # alternation group parsing. need to check for unequal length sequences.
  # example: (a b | c '.')  
  # compile: "a*b*","c*.*"

  "altgroup*|*rsequence*|*","altgroup*|*rsequence*)*",
  "altgroup*|*charquoted*|*","altgroup*|*charquoted*)*",
  "altgroup*|*rsequence*>*","altgroup*|*charquoted*>*" {
    # a push list is already in | - see 2 token rule for charquoted.
    replace "|*rsequence*" ""; replace "|*charquoted*" ""; 
    push; push;

    # workspace is clear. get the pop; lists in ( and | .The (* token
    # is just before the altgroup* token, but not visible here.
    --; get; --; --; 
    !(==) { 
      clear; 
      add "\n";
      add "The sequences in the alternation group were of unequal length\n";
      add "(rule on line "; lines; add ") \n";
      add "This is currently not allowed in alternation groups (x|y|x) \n";
      print; quit;
    }
    ++; ++; ++;
    clear; --; --; get; ++; ++; add ","; get; --; --; put; ++; ++;
    clear; .reparse
  }

  # lookahead group parsing. need to check for unequal length sequences.
  # example: a b | c '.' |  (or) a b | c '.' )
  # compile: E"a*b*",E"c*.*"
  "lookgroup*|*rsequence*|*","lookgroup*|*rsequence*)*" {
    # here do "lookgroup|charquoted) as well" but need to put
    # a push list in | - see 2 token rule
    replace "|*rsequence*" ""; replace "|*charquoted*" ""; 
    push; push;

    # workspace is clear. get the pop; lists in +( and | .The +(* token
    # is just before the lookgroup* token, but not visible here.
    --; get; --; --; 
    !(==) { 
      clear; 
      add "\n";
      add "The sequences in the lookahead alternation were of unequal length\n";
      add "(rule on line "; lines; add ") \n";
      add "This is not allowed in lookahead groups +( ...) \n";
      print; quit;
    }
    ++; ++; ++;

    clear; --; --; get; ++; ++; add ",E"; get; --; --; put; ++; ++;
    clear; .reparse
  }


  # make a phantom 'ruleblock*' token, to help with parsing. A phantom
  # token is a token created with an empty attribute value, and without actually
  # parsing anything from the input stream. It must be created in a 
  # particular context, and must avoid interfering with other parse rules.

  # I need to use this also in the lexblocks because it is so good
  "LHS*=*rsequence*{*","LHS*=*alt*{*","+(*lookgroup*)*{*","(*altgroup*)*{*" {
    push; push; push; push; clear; put;
    add "ruleblock*"; push; .reparse
  }

  #*
  # assign text to LHS token attributes by getting attributes from the RHS
  # example: a = b c { @1 := "$1 : $2"; }
    compile: 
      pop;pop; "b*c*" {
        clear; get; add " : "; ++;get;--; put;
        clear; add "a*"; push; .reparse
      }
      push;push;
  *# 

  # example: @1 := "$1 : $2";
  # compile: clear; get; add " : "; ++;get;--; put; clear;
  "lattribute*:=*quoted*;*" {
    clear; ++; ++; 
    add "clear; add "; get; 

    # special line and char and counter 'variables'
    # the number of lines read from the input stream
    replace "$line" "'; lines; add '";
    # the number of chars read from the input stream
    replace "$char" "'; chars; add '";
    # access the pep machine accumulator
    replace "$counter" "'; count; add '";
    # text is the text in the current tape cell 
    replace "$text" "'; get; add '";
    # print the parse-stack
    replace "$stack" "'; ++;++;++;put;--;--;--; d;stack;swap;get; add '";

    # the $n variables
    replace "$1" "'; get; add '";
    replace "$2" "'; ++; get; --; add '";
    replace "$3" "'; ++;++; get; --;--; add '";
    replace "$4" "'; ++;++;++; get; --;--;--; add '";
    replace "$5" "'; ++;++;++;++; get; --;--;--;--; add '";
    replace "$6" "'; ++;++;++;++;++; get; --;--;--;--;--; add '";
    replace "$7" "'; ++;++;++;++;++;++; get; --;--;--;--;--;--; add '";
    replace "$8" "'; ++;++;++;++;++;++;++; get; --;--;--;--;--;--;--; add '";
    replace "$9" "'; ++;++;++;++;++;++;++;++; get; --;--;--;--;--;--;--;--; add '";

    # put;
    # now get @n code
    --; --; add ";\n"; get;
    # an optimisation!! remove empty add commands
    replace "add '';" ""; replace 'add "";' '';

    put;
    clear; add "attrule*"; push; .reparse
  }

  # new LHS/RHS rule parsing
 
  # in the parse token attributes for '=' and '|', just preceding
  # the rsequence, we have stored the pop list 'pop;pop;etc'. we
  # can compare these to check if the sequences are the same length.
  # if they are different lengths, then the compilation procedure is
  # quite different, and in the case of '{' possibly non-sensicle.
  # if they are unequal we will store a flag in the 1st "UNEQUAL"
  # this needs some rethought...  
  # example: a = b c | e f ;
  # compile: pop;pop; "b*c*","e*f*" { clear; add "a*"; push; .reparse } push;push;
  # example: a = b | e f ;
  # compile: 
  #   pop; "b*" { clear; add "a*"; push; .reparse } push;
  #   pop;pop; "e*f*" { clear; add "a*"; push; .reparse } push;push;
  #
  # as can be seen, the second compilation is more tricky
  # because we have to separate into 2 blocks. I believe that the 
  # 2nd example requires a variable LHS stored on the tape, because we
  # need to grab that var as soon as we find an unequal sequence....

  "rsequence*|*rsequence*;*","rsequence*|*alt*;*" {
    # save token sequence in tape cell above ';' attribute
    ++;++;++;++; put; --;--;--;--;
    clear; --; get; ++; ++;   
    # tape test
    # a trick to keep the poplist but flag the alternation as having
    # unequal length sequences. 
    # Here I could compile uneven sequences into the '{' token attribute
    # and remove "|*alt*" Then when completing the rule, I check '{' for
    # compiled code and include it.
    # !(==) { --; --; replace "pop;" "unequal;"; put; ++; ++; }
    
    # --------------------------------
    # attempting to compile unequal alternations to the ';' attribute
    !(==) { 
      clear; 
      # get the pop; list from the '|' token attribute
      --; ++; add "\n"; get; add "\n"; 
      # add the token match list
      ++; get; add " {\n  "; 
      mark "here"; go "LHS"; get; go "here";
      add "\n}"; ++; swap; get; put; 

      # print; quit;
      --; --; 
      # get the pop; list from '|' attribute and make a push; list
      clear; add "\n"; get; replace "pop;" "push;"; 
      ++; ++; swap; get; --; --;

      # put all compiled code into new ';' attribute
      put; --; 
      clear; add "rsequence*;*"; push; push; .reparse
    }

    # print; quit;
    # restore token sequence. dont need to reparse
    clear; --; ++;++;++;++; get; --;--;--;--;
  }

  "rsequence*|*rsequence*;*","rsequence*|*alt*;*" {
    # compose alternation
    clear; get; add ","; ++; ++; get; --; --; put;
    # copy ';' attribute down, this may contain compiled code
    # for unequal length alt sequences
    clear; ++; ++; ++; get; --; --; put; --;
    clear; add "alt*;*"; push; push; .reparse
  }


  # I think variable length sequences in alternations for rules
  # that have a composition block {} is non sensical so I will disallow
  # it here
  "rsequence*|*rsequence*{*","rsequence*|*alt*{*" {
    # save token sequence in { or ; attribute
    ++;++;++; put; --;--;--; clear; --; get; ++; ++;   
    !(==) { 
      clear; 
      add "\n";
      add "The sequences in the alternation were of unequal length\n";
      add "(alternation on line "; lines; add ") \n";
      add "This is not allowed in parse-token reduction rules that \n";
      add "have a following block\n";
      print; quit;
    }
    # restore token sequence. dont need to reparse
    clear; --; ++; ++; ++; get; --;--;--;
  }

  # tail-reduction of RHS token sequences before '{' or ';'
  # this is quite elegant because the rsequences have already been
  # wrapped in quotes.
  # example: ... c d | e g {
  # compile: "c*d*","e*g*" {
  "rsequence*|*rsequence*{*","rsequence*|*alt*{*" {
    clear; get; add ","; ++; ++; get; --; --; put;
    clear; add "alt*{*"; push; push; .reparse
  }

  # this could also be an unequal alternation list with code in
  # the ';' attribute.

  # compile a complete rule. The pop;pop; list is stored in the '='
  # LHS should already have its compiled code
  # NOTE: that the ';' token attribute will contain code for unequal 
  # length sequences, and so should be added here.
  # example: a = d e ;
  # compile: pop;pop; "d*e*" { clear; add "a*"; push; .reparse } 
  "LHS*=*rsequence*;*" {

    clear; ++; get; add "\n"; ++; get; add " "; --; --; 
    add "{\n  "; get; add "\n}\n"; put;

    # here: build the push;push; list and add to nom code
    clear; ++; get; --; replace "pop;" "push;"; swap; get; 

    # add the unequal sequence compiled code (from the ';' attribute)
    ++; ++; ++; get; --; --; --; put;
    clear; add "rule*"; push; .reparse
  }

  # normally the rsequences can be same or different lengths.
  # the compilation for unequal length sequences is pretty special.
  # it involves creating separate blocks for each branch and 
  # compiled nom code is saved in the ';' attribute and copied with
  # that token.
  # example: a = b c | e f ;
  # compile: 
  #   pop;pop; "b*c*","e*f*" { clear; add "a*"; push; .reparse } push;push;
  # example: a = b | e f ;
  # compile: 
  #   pop; "b*" { clear; add "a*"; push; .reparse } push;
  #   pop;pop; "e*f*" { clear; add "a*"; push; .reparse } push;push;
  #
  # as can be seen, the second compilation is more tricky

  "LHS*=*alt*;*" {
    #* remove:
    # check for unequal length sequences in the alternation
    # this is obsolete code, since unequal sequences are compiled
    clear; ++; get; B"unequal;" {  
      clear; 
      add "\n";
      add "The sequences in the alternation were of unequal length\n";
      add "(alternation on line "; lines; add ") ";
      add "... \n";
      print; quit;
    } --; 
    *#

    # build code with pop;pop; list and token match list
    clear; ++; get; add "\n"; ++; get; add " "; --; --; 
    add "{\n  "; get; add "\n}\n"; put; clear;

    # here: build the push;push; list and do swap;get;
    ++; get; --; replace "pop;" "push;"; swap; get;

    # add the unequal sequence compiled code (from the ';' attribute)
    ++; ++; ++; get; --; --; --; 
    put;
    clear; add "rule*"; push; .reparse
  }

  # eg: match '<' to '>' { tag: '<a>'|'<b>'; }
  # compile: 
  #  '<' { until [>]; put; 'green','blue','x' 
  #        { clear; add 'tag*"; push; .reparse } }

  "tomatch*{*textruleset*}*","betweenmatch*{*textruleset*}*",
  "tomatch*{*action*}*","betweenmatch*{*action*}*" {
    clear; 
    add "# lex-rule \n";
    get; replace "{" "{\n  "; 
    # not needed here???
    replace "while !" "whilenot "; 
    # identing is done above
    ++; ++; get; --; --;
    add '\n}'; put;
    clear; add "lexrule*"; push; .reparse 
  }

  # the second item can be a string because we compile to 'until'
  # but second item cant be a class because of 'untils' limitations
  # example: register: '[' to ']' ;
  # example: register: [:;] to '/end' ;
  # compile: '[' { until ']'; put; clear; add "register*"; }
  "token*:*tomatch*;*" {
    clear; ++; ++; get; --; --; 
    add ' clear; add "'; get; add '"; push; .reparse }'; put;
    clear; add "lexrule*"; push; .reparse 
  }

  # example: register: '[' between [:space:] ;
  # compile: '[' { whilenot [:space:] ; put; clear; add "register*"; }
  "token*:*betweenmatch*;*" {
    clear; ++; ++; get; --; --; 
    add ' clear; add "'; get; add '"; push; .reparse }'; put;
    clear; add "lexrule*"; push; .reparse 
  }


  # this allows a default token for all text, in this context 
  # I want the * to create a default token name even if the 
  # pattern space is empty. But in 'match * { ... }' it only matches
  # if pattern space is not empty, silly???? fix

  # example: shape: * ;
  # compile: !'' { clear; add "shape*"; push; .reparse }
  "token*:*star*;*" {
    clear; 
    add "clear; add '"; get; add "'; push; .reparse"; put;
    clear; add "lexrule*"; push; .reparse
  }

  # example: lit: [,.;];
  # compile: [,.;] { add "*"; push; .reparse }
  # example: lit: ','|':'|'x' ;
  # compile: ',',':','x' { add "*"; push; .reparse }
  "lit*:*class*;*","lit*:*charset*;*","lit*:*charquoted*;*" {
    clear; ++; ++; get; --; --; 
    add " { add '*'; push; .reparse }"; put;
    clear; add "lexrule*"; push; .reparse
  }

  # for empty do 'match empty { etc }'
  # eg: EOF { name = capital lowerchars; }
  "eof*{*rule*}*","eof*{*ruleset*}*" {
    clear; add "(eof) {"; ++; ++; get; --; --; add "\n}\n"; put;
    clear; add "rule*"; push; .reparse
  }

  # example: eof { print 'yes'; }
  "eof*{*action*}*" {
    clear; add "(eof) {"; ++; ++; get; --; --; add "\n}\n"; put; 
    clear; add "action*"; push; .reparse
  }


  # eg: EOF { letter: [a-z]; print 'bye'; exit 2; } 
  "eof*{*textruleset*}*" {
    clear; add "(eof) {"; ++; ++; get; --; --; add "\n}\n"; put;
    clear; add "lexrule*"; push; .reparse
  }

  # lex tokens with AND and OR | logic
  # example: keyword: 'is';
  # compile: 'is' { clear; add "keyword*"; push; .reparse }
  # example: logic: 'is'|'or'|'and';
  # compile: 'is','or','and' { clear; add "keyword*"; push; .reparse }
  # example: 0number: [:digit] AND begins '0' 
  # compile: [:digit:].B'0' { clear; add "0number*"; push; .reparse }
  "token*:*quoted*;*","token*:*orset*;*","token*:*andset*;*" {
    clear; ++; ++; get; --; --;
    add ' { put; clear; add "'; get; add '"; push; .reparse }'; put;
    clear; add "textrule*"; push; .reparse 
  }

  # example: char: [:alpha:];
  # compile: [:alpha:] { clear; add "char*"; push; .reparse }
  "token*:*class*;*" {
    clear; ++; ++; get; --; --;
    add ' { clear; add "'; get; add '"; push; .reparse }'; put;
    clear; add "lexrule*"; push; .reparse 
  }

  # example: space: ' ';
  # compile: ' ' { clear; add "space*"; }
  "token*:*charquoted*;*","token*:*charset*;*" {
    clear; ++; ++; get; --; --;
    add ' { clear; add "'; get; add '"; push; .reparse }'; put;
    clear; add "lexrule*"; push; .reparse 
  }

  
  #*
  fix: remove
  "andset*{*textruleset*}*","andset*{*action*}*",
  "charset*{*textruleset*}*","charset*{*action*}*",
  "orset*{*textruleset*}*","orset*{*action*}*",
  "quoted*{*textruleset*}*","quoted*{*action*}*",
  "star*{*textruleset*}*","star*{*action*}*",
  "empty*{*textruleset*}*","empty*{*action*}*",
  "charquoted*{*textruleset*}*","charquoted*{*action*}*",
  "class*{*textruleset*}*","class*{*action*}*" {
    clear; get; add " {"; 
    ++; ++; get; --; --; add "\n}"; put; 
    clear; add "lexrule*"; push; .reparse 
  }
  *#

  # example: "match 'ok' { print 'bye!'; exit 0; }
  # compile: 'ok' { ... }
  "textmatch*{*textruleset*}*","textmatch*{*action*}*" {
    clear; get; add " {"; 
    ++; ++; get; --; --; add "\n}"; put; 
    clear; add "lexrule*"; push; .reparse 
  }


  #-----------------------
  # 5 token parsing
  pop;

  # allow negation of sequences of tokens on the right-hand-side of 
  # a parse rule. These can be used in "notsets" which are and logic
  # sets of negated tokens or sequences of tokens. 
  # example: e = e '*' e +(not ('*' e) and not ('/' e));

  B";*",B"<*",B">*",B"(*",B")*",B"|*",B"=*" {
    E"not*(*rsequence*)*" {
      clear; ++; ++; add "!"; get; --; --; put;
      # put the pop; list in the previous token ( ) | = etc
      clear; ++; get; --; --; put; ++;
      clear; add "notsequence*"; push; .reparse
    }
  }

  # this should not interpolate the quoted text
  "declare*var*:=*quoted*;*","declare*var*:=*charquoted*;*" {
    # the var already has fetch code in it...
    clear; ++; get; --; replace 'mark "here"; go' 'mark';
    replace 'get; go "here";' ''; 
    add "add "; ++; ++; ++; get; --; --; --; add '; ++;'; put;
    clear; add "vardef*"; push; .reparse
  }

  # example: word: [:alpha:]+ ;
  # compile: [:alpha:] { while [:alpha:]; put; clear; add "word*"; }
  # example: word: ![a-z]+ ;
  # compile: ![:alpha:] { whilenot [:alpha:]; put; clear; add "word*"; }
  "token*:*class*+*;*" {
    clear; ++; ++; get; add ' { while '; get; --; --; 
    # while ![a-z]; is not valid nom syntax (sadly) 
    replace "while !" "whilenot ";
    add '; put; clear; add "'; get; 
    add '"; push; .reparse }'; put;
    clear; add "lexrule*"; push; .reparse 
  }

  # eg: [a-z]+ { colour: 'green'|'blue'|'x'; }
  # compile: 
  #  [a-z] { 
  #    while [a-z]; put; 
  #    'green','blue','x' { clear; add 'name*"; push; .reparse }
  #  }
  "class*+*{*textrule*}*","class*+*{*textruleset*}*",
  "class*+*{*lexrule*}*","class*+*{*lexruleset*}*" {
    clear; get; 
    add ' {\n  while '; get; replace "while !" "whilenot ";
    add '; put;'; 
    # identing is done above
    ++; ++; ++; get; --; --; --;
    add '\n}'; put;
    clear; add "lexrule*"; push; .reparse 
  }

  #-----------------------
  # 6 token parsing
  pop;

  # a syntax to check the value of a token attribute
  # I am avoiding reversing the test because that will match "textmatch{...}"
  # example: "green" == $2 { ... } 
  # compile: clear; ++; get; --; "green" { ... }
  "textmatch*==*attvar*{*textruleset*}*","textmatch*==*var*{*textruleset*}*" {
    # change '++;++; get; --; etc' to '++; swap; --;'
    # change 'go "xxx"; get; go "here" etc' to 'go "xxx"; swap; '
    # then add a swap at the end of the block. This preserves the token sequence?
    clear; ++;++; get; replace "get;" "swap;"; put; --;--;
    clear; 
    add "clear;\n"; ++;++; get; add '\n'; --;--; get; 
    add ' { '; ++;++;++;++; get; --;--;--;--; add "\n}\n";
    ++;++; get; --;--;
    put;
    clear; add "textrule*"; push; .reparse
  }

  # compile a complete rule with a following block. 
  # The pop;pop; list is stored in the '='
  # LHS should already have its compiled code. The code in the block
  # should be compiled before the LHS code.
  # example: a = d e { print 'reduced!\n'; }
  # compile: 
  #  pop;pop; "d*e*" { 
  #    clear; add "reduced!\n"; print; clear;
  #    clear; add "a*"; push; .reparse 
  #  } 
  "LHS*=*rsequence*{*ruleblock*}*","LHS*=*alt*{*ruleblock*}*" {
    # make push list and store in ';' attribute
    clear; ++; get; replace "pop;" "push;"; ++; ++; put; --; --; --; 
    clear; ++; get; add "\n"; ++; get; add " "; --; --; 
    add "{ "; 
    # add block code
    ++; ++; ++; ++; get; --; --; --; --; add "\n  ";
    # add LHS code
    get; add "\n}\n"; 
    ++; ++; ++; get; --; --; --; put;
    clear; add "rule*"; push; .reparse
  }

  #*
  # compile a complete rule with alternation and a following block. 

  # example: a = d e | f g { print 'reduced!\n'; }
  # compile: 
  #  pop;pop; "d*e*","f*g*" { 
  #    clear; add "reduced!\n"; print; clear;
  #    clear; add "a*"; push; .reparse 
  #  } 
  *#
  
  #-----------------------
  # 7 token parsing
  pop;

  "stack*(*rsequence*)*{*textruleset*}*","stack*(*altgroup*)*{*textruleset*}*",
  "stack*(*rsequence*)*{*ruleblock*}*","stack*(*altgroup*)*{*ruleblock*}*" {
    clear; 
    add "clear; unstack;\n"; ++;++; get; ++;++;++;
    add " {"; get; add "\n}\nstack;"; 
    --;--;--;--;--; put;
    clear; add "rule*"; push; .reparse
  }

  # parse optionals where there is not following sequence
  # andsets may be of some use here, but I wont worry for now.
  # example: a = b c [ d | e];
  # this is compiled into 2 separate nom blocks.
  # actually just delegate to rs < altgroup > rs ;
  "LHS*=*rsequence*<*altgroup*>*;*","LHS*=*rsequence*<*rsequence*>*;*",
  "LHS*=*rsequence*<*notset*>*;*" {
    # make a push list for rsequence and optionals, save in ';'
    clear; ++; get; ++; ++; get; ++; ++; ++; 
    replace "pop;" "push;"; put; --; --; --; --; --; --; 
    clear; 
    # get pop; list and sequence
    ++; get; add "\n"; ++; get; add " {\n  "; 
    --; --; get; add "\n}\n";
    # now get optional pop; list and sequence alternation
    ++; ++; ++; get; add "\n"; 
    # begins-with rsequence
    --; add "B"; get; add ".!"; get; add " {\n  " ;
    # ends-with optional sequence alternation
    # I believe the replace below is safe because "," wont occur anywhere
    # else in the compiled code. Also, works for [rsequence] but not 
    # for andsets.
    ++; ++; add "E"; get; replace '","' '",E"';
    add " {\n    ";
    # add LHS compiled code
    --; --; --; --; get; add "\n  }\n}\n";
    # get the saved push list
    ++; ++; ++; ++; ++; ++; get; add "\n"; 
    --; --; --; --; --; --;
    put; clear; add "rule*"; push; .reparse
  }

  # I may have to keep the LHS push; list in a variable 
  # because I need to access it separately to align the tape
  # pointer to the start of the look up group; Or use stack?
  "LHS*=*rsequence*+(*lookgroup*)*;*","LHS*=*rsequence*+(*andset*)*;*" {
    E"andset*)*;*" {
      # this is a bit dubious..fix:
      clear; ++;++;++;++; get; replace "!'" "!E'"; 
      # also add a 1 pop list for the andset.
      put; --; clear; add "pop;"; put; --;--;--;
    }
    # temporarily fix the LHS
    clear; get; 
    replace "clear; add " ""; replace " .reparse" ""; put;

    # make an rsequence push; list from '=' and store in ';'
    clear; ++; get; replace "pop;" "push;"; ++;++;++;++;++; put;
    --; --; --; --; --; --;
    clear;

    # make a lookgroup push; list from '+(' and store in ')'
    clear; ++; ++; ++; get; replace "pop;" "push;"; ++; ++; put;
    --; --; --; --; --;

    clear;
    # construct pop; list at top of parse block
    # this list consist of rsequence length + lookahead length
    ++; get; ++; ++; get; add "\n";

    # match the rsequence
    # example: B"a*b*c*".!"a*b*c*" {
    --; add "B"; get; add ".!"; get; add " {\n  ";

    # match the lookahead group
    # example: E"x*y*",E"p*q*" {
    ++; ++; get; add " {\n";

    # build the replace command, and push list. The push list is
    # the LHS length + Lookahead length.
    # example: replace "a*b*c*" "new*"; push;push;push;

    # I am just going to check for rsequence in lookahead and halt if true.
    # but this clobbers the current attribute
    add "    put; replace "; --; --; 
    swap; clop; swap; add '"*'; get; add ' ""; !(==) {\n';
    add "      clear; add 'lookahead contains reduction sequence.\\n';\n";
    add "      add 'This is an error condition. Please modify \\n';\n";
    add "      add 'the syntagma grammar. \\n'; print; quit;\n";
    add "    }\n";
    add '    replace "'; get; --; --; add " "; 
    # here I could try to use a trick to avoid multiple replace
    # but I cant do it, because LHS has '"a*b*"; push;push;'
    # example:
    #   replace "*a*b*" "****a*b*"; 
    #   replace "a*b*" "new*"; 
    #   replace "****a*b*" "*a*b*"; 
    #   push;push;push;
    # This trick should avoid replacing sequences that dont start
    # the workspace. But a ^ anchor for replace would be better.

    # build the push list
    get; add "\n    ";  
    ++; ++; ++; ++; ++; get; 
    add " .reparse";

    #*
    # ?? copy down all attributes in lookgroup\n";
    # need to realign to the end of rsequence...do this by
    # subtracting the push; list in LHS, but this push list also
    # has the name,... 
      A dodgy strategy: add "add ";get LHS attrib, get +( attribute, now we have
      >> add "l*g*"; push;push;pop;pop;
      >> replace '"; push;' '";clear;push;';
      now we have 
      >> add "l*g*";clear;push; ... pop;
      >> replace "push;" "++;"; replace "pop;" "--;";
      now we have
      >> add "...";clear;--;--;++;++;
      and this will realign the pointer?
    *# 

    add "\n  }\n}\n";

    # build final push; list from ')' and ';' 
    # which is rsequence + lookahead lengths
    get; ++; get; add "\n";
    # print; 
    --; --; --; --; --; --;
    put;
    # clear; add "LHS*=*rsequence*+(*lookgroup*)*;*";
    clear; add "rule*"; push; .reparse
  }

  # ---------------------
  # 8 token parsing
  pop;
  # parse optionals 
  # example: a = b c [ x y | e f ] c;
  # this is compiled into 2 separate nom blocks.

  "LHS*=*rsequence*<*altgroup*>*rsequence*;*",
  "LHS*=*rsequence*<*rsequence*>*rsequence*;*",
  "LHS*=*rsequence*<*notset*>*rsequence*;*" {
    # make a push list for rsequence and optionals, save in ';'
    clear; ++; get; ++; ++; get; ++; ++; get; ++; ++;
    replace "pop;" "push;"; put; --; --; --; --; --; --; --;
    clear; 
    # get pop; list and sequence
    ++; get; add "\n"; ++; get; add " {\n  "; 
    --; --; get; add "\n}\n";
    # now get optional pop; list and sequence alternation
    ++; ++; ++; get; add "\n"; 
    # begins-with rsequence
    --; add "B"; get; add ".!"; get; add " {\n  " ;
    # ends-with optional sequence alternation
    # I believe the replace below is safe because "," wont occur anywhere
    # else in the compiled code. Also, works for [rsequence] but not 
    # for andsets.
    ++; ++; add "E"; get; replace '","' '",E"';
    add " {\n    ";
    # add LHS compiled code
    --; --; --; --; get; add "\n  }\n}\n";
    # get the saved push list
    ++; ++; ++; ++; ++; ++; get; add "\n"; 
    --; --; --; --; --; --;
    put; clear; add "rule*"; push; .reparse
  }


  # ---------------------
  # 9 token parsing
  pop;

  # lookahead  with a rule block. To achieve this we need to copy attributes
  # from the lookahead tokens to their new positions in the parse stack,
  # if the LHS sequence is shorter than the RHS sequence (rsequence).
  # This copy procedure is somewhat verbose. I think I will prohibit the
  # LHS token sequence being longer than the RHS sequence, because it 
  # complicates the attribute copy, and it doesn't seem very useful anyway.

  # The tokens '=' and '+(' contain a list of pops which indicates the
  # length of the following sequence, or alternations group members.
  # I will probably use a variable to hold the pop; or push; list for
  # the LHS since there is nowhere to save it.

  # example: a = b c +(x y | p q) { @1 := "$1/$2"; println 'found a'; }
  # example: n m = b c +('.' | ',') { @1 := "$1/$2"; println 'found a'; }
  "LHS*=*rsequence*+(*lookgroup*)*{*ruleblock*}*" {
    clear; 
    # first create a complete pop; list for rsequence and lookgroup
    # and save it in the '{' token.
    ++; get; ++; ++; get; ++; ++; ++; put; mark "poplist";
    --; --;--; --; --; --; add "\n";
    # build block test eg: B"b*c*" { E"x*y*",E"p*q*" { ...
    ++; ++; add "B"; get; add " {\n  "; ++; ++; get; add " {\n";
    # build the replace, eg: replace "b*c*" "a*";
    --; --; add "    replace "; get; add " "; --; --; 
    swap; replace "clear; add " ""; replace " .reparse" ""; 

    # ------------------------------------
    # some serious juggling here. need the push later to 
    # calculate the length difference between the LHS and rsequence.
    replace '"; push;' '"; #push;'; swap;
    get; add "\n    ";

    # build code to: save new token sequence lower on tape and push later.
    mark "here"; go "poplist"; swap;
    replace "pop;" "++;"; swap; get; add " put; ";
    swap; replace "++;" "--;"; swap; get;
    go "here";
    # add compiled ruleblock code
    add "\n    # ----------";
    add "\n    # code block in {}";
    ++;++;++;++;++;++;++;
    # re-indent the code
    swap; replace "\n" "\n  "; swap; get; 
    --;--;--;--;--;--;--;

    # copy lookahead token attributes. this is the trikiest part.
    add "\n    # ----------";
    add "\n    # copy lookahead attributes";
    # first create a "diff" pop; list (difference in length between LHS
    # and RHS sequences.)
    
    # compare LHS and rsequence push/pop lists
    # we need to build this somewhere else.
    add "\n    #"; 

    #*
    swap; 
    # lets pretend 'LHS' contains just push; list for LHS 
    ++; get;    # eg: "pop;pop;push;push;push;" for "a b= c d e..."
    replace "push;push;pop;pop;" ""; replace "push;pop;" "";
    # now only "pop;" list. eg "pop;pop;"
    replace "pop;" "++;"; swap; get; add " get;";
    swap; replace "++;" "--;"; swap; get; add "put; "; 
    # save code eg: "++;++; get; --;--; put;" (for 2 token difference)
    put;
    # get +( pop; list (lookahead length)

    # if 1 lookahead tokens
    "pop; { go "diff"; get; }
    # if 2 lookahead tokens
    "pop;pop;" { 
      go "diff"; get; add "\n";
      add "++;"; get; add "--;"; add "\n";
    }
    # etc for more lookahead tokens
    --;
    swap;
    *#

    #replace '"; push;' '";\npush;'; swap; get;

    # build code to: get saved new token sequence from tape and push.
    add "\n    clear; ";
    mark "here"; go "poplist"; swap;
    replace "--;" "++;"; swap; get; add " get; ";
    swap; replace "++;" "--;"; swap; get;
    go "here";
    add " stack;\n  }\n}\n";

    # get the pop list and convert to push list
    ++;++;++;++;++;++; swap; replace "--;" "push;"; swap; get;
    --;--;--;--;--;--;
    put; 
    clear; add "rule*"; push; .reparse
  }

  # ---------------------
  # 10 token parsing
  pop;

  # some errors at eof. no see above
  (eof) {
    nop;
  }

  (eof) {

    "start*","action*","grammar*" {
      clear; get; add "\n\n"; print; quit;
    }

    # if no parse rules, make an empty one and a grammar 
    "lexruleset*","lexrule*" { 
      push; 
      add "\n# empty rule added to grammar\n"; put;
      clear; add "rule*"; push; .reparse 
    }

    add "\n[strange parse]\n"; print; quit;
  }
  
  push;push;push;push;push;push;push;push;push;push;
  
