Pep & Nom

home | documentation | examples | translators | download | journal | blog | all blog posts

<quote>

28 may 2026

introducing a language for parsing, syntagma

The word “grammar” evokes, for many people, the idea of useless formal knowledge which is inflicted apon students as a kind of punishment. Language learners usually think they should learn the grammar of the new language but don’t want to. Computer programmers usually don’t look at the formal Backus-Naur grammar of the coding language they are learning or using, and probably it wouldn't help them much if they did, just as it doesn't help (human) language learners to study the grammar of the language they are learning.

In fact, studying the grammar of a language is arguably counterproductive to actually learning the language. Infants never do it, and they learn their own language or languages fluently, in the same way that they never consult a dictionary to find the meaning of a word. Despite this, Grammar was one of the three “Trivium” medieval European academic subjects considered of first-order importance. Gramma means in Greek simply “line”, grammata is the plural “lines”. In the context of language learning, studying grammar may actually be conterproductive, because it encourages the learners brain into a “meta” mode of analysis, rather than the reactive mode of listen/repeat/express.

If grammar is not of importance in language learning, why is it considered so important? My answer to that, is that it represents the structure of textual or linguistic patterns. So it is, essentially, the patterns of the patterns, and that is an interesting idea for people to cogitate on.

In the last few weeks, I have been developing a language which I call syntagma* . The development and ideas have flowed easily because syntagma* builds on the ideas and structure of pep* and nom* and the syntagma compiler uses nom to compile into nom. Because nom* is a language for recognising or compiling “simple” languages, it gives rise to all sorts of self-reflectivity and recursive patterns. I have often noted these before but here are some concrete examples:

The final example is what I am currently working on, and it feels like a very expressive way to talk about linguistic and textual (alphabetic) patterns. Syntagma uses a format that is similar to the extended Backus-Naur form but it also includes blocks (between braces {..} ) that manipulate the textual attributes of the parse tokens. The textual manipulations are very simple (concatenation, character trim, etc) but they are sufficient because the structure of the pattern is already contained in the parse tokens of the grammar.

Here is a concrete example.

check text for balanced nested brackets and braces.

    # turn brackets into 'literal' parse tokens using class [...] and
    # alternation | syntax. literal tokens must be defined here, before
    # they can be used in the parse section of the script.
    literal: [{}()] | '[' | ']'; 
    # delete all other input (spaces, uppercase letters etc)
    delete;

    # define the parse rules for nested balanced brackets 
    # literal tokens can be put in quotes "" or '' and must be if they
    # are special characters. Other parse tokens (nest,list)
    # are written without quotes.
    nest = '(' ')' | '{' '}' | '[' ']' |
      '(' nest ')' | '{' nest '}' | '[' nest ']' |
      '(' list ')' | '{' list '}' | '[' list ']';

    # the list parse token is not necessary (I could just write "nest = nest nest;")
    # but it may make the semantics of the grammar clearer (a list is not a nest!)
    list = nest nest | list nest;

    # print a success/fail message at the end of input 
    eof { 
      stack (nest|list) { println "Balanced brackets!"; exit; } 
      println "Terribly sorry, but you're not balanced"; exit 1;
    }
  

The code above solves the nested and balanced brackets problem that is often used as an elementary parsing challenge, for the simple reason that it is slightly too difficult for regular expressions . The syntagma script is a “recogniser” or “syntax checker” in the sense that it only determines whether the input adheres to the given grammatical rules; it doesn't try to translate the input into some diffent output, but it could. The code looks to me clear and expressive of the grammatical structure and doesn't have too much unnecessary 'noise' (apart from the comments).

Now, this syntagma script can be run immediately using the interpreter which translates syntagma ↦ perl and then runs with the given input, for example:

interpret a syntagma script with input.
 echo "{}[{(){}[()]}](text ignored)([()]{{}})" | ./engine.perl.sh -f script.txt 

Also,

This code can be compiled to nom with any of the following


   pep -f syntagma.pss script.txt > script.pss
   cat script.txt | ./syntagma.tonom.pl > script.pss
   cat script.txt | ./syntagma.tonom.lua > script.pss
   # or use any other language for which a nom translator exists.
  

Then the nom code can be interpreted with input


   pep -f script.pss input.txt; 
   # or
   pep -f script.pss -i "some input"
  

Or the nom code can be compiled to another language

compile a syntagma (in nom) script to a rust executable

   pep -f nom.torust.pss script.pss > script.rs
   rustc -o script.exe script.rs
   cat input.txt | ./script.exe
  

or compile to a script language

   pep -f nom.toperl.pss script.pss > script.pl
   cat input.txt | perl script.pl
   # or
   pep -f nom.tolua.pss script.pss > script.lua
   cat input.txt | lua script.lua
  

Or do this to any other language for which a working translator exists in the nom translation folder

This very technical blog post was essentially about the expressive power of syntagma* and how it builds on the structure of the nom language to provide a much more intuitive language for expressing and manipulating textual and linguistic patterns.