Why is the sun (le soleil) masculine and the moon (la lune) feminine? Because they married each other and they belong together even though they are opposites. Cadaques Jordi
How to debug ℕ𝕠𝕞 scripts.
Since ℙ𝕖𝕡 🙵 ℕ𝕠𝕞 is a parsing and compiling system there are 2 processes taking place (but not concurrently) during the execution of a ℕ𝕠𝕞 script: the reduction of tokens on the grammar parse stack and the “assembling” of the attributes of those tokens to create the translated/transpiled/compiled output text.
It is often necessary to debug scripts since the ℙ𝕖𝕡 virtual machine is not trivial and ℕ𝕠𝕞 is a relatively “low-level” language in the sense that each nom command corresponds to a pep machine instruction. See the script /eg/toyBNF.pss for the beginning of a “higher-level” language which compiles to nom*.
Luckily there are a number of techniques to debug nom scripts which are detailed below.
This is possibly the most useful debug technique to ensure that the grammar you have designed (for whatever language or pattern that you wish to recognise or transpile) is functioning properly.
As the worlds leading, (and possibly only) expert on the Nom language this is the technique that I use most.
Because ℙ𝕖𝕡 🙵 ℕ𝕠𝕞 is a “text-filter” style system (which writes output to “stdout ” ), we can print the stack and line/character number after each reduction and watch the grammar in action. This is a very useful technique for debugging grammars and scripts.
The “print” statements are placed just after the “parse>” label. The 2 lines below should probably be included in every non-trivial script. When the script is working well the lines can be commented-out.
parse>
add "# "; lines; add ":"; chars; add " "; print; clear;
unstack; print; stack; add "\n"; print; clear;
The add
command above adds a '#' hash character to the beginning
of the debug line. This can or should be changed to whatever is the
comment character or characters in the output language. This allows the
script output to be tested as well as visualising the parse stack
token reductions.
The “less” program (or “more” on ms windows) makes it possible to search for any particular token by name, to watch it being reduced. We can also search by input line number.
pep -f eg/script.pss file.txt | less
The state
command is an extremely useful nom command which displays the
internal state of the pep machine at the moment that the command executes.
This is possibly the second most useful debugging technique after watching the
parse stack reductions.
(Some of ?) The translation scripts at /tr/ also implement this command.
For some reason, I was going to remove this command from the Nom language. I have no idea why.
read;
[:punct:] {
while [:punct:]; put;
clear; add "punctuation*"; push; .reparse
}
[:alnum:] {
while [:alnum:]; put;
clear; add "alnumeric*"; push; .reparse
}
!"" { add " <--- what is this?"; print; quit; }
parse>
pop;pop;
"alnumeric*punctuation*" {
state; clear; add "garble*"; push; .reparse
}
push;push;
(eof) {
unstack; print; quit;
}
The ℙ𝕖𝕡 interpreter also includes a fully interactive debug mode which is activated with the -I switch. This interactive debugger has a whole list of different commands to step through the script or run the script until a certain point and then view the state of the pep virtual machine
This facility is the 'big mamma' of debug techniques and hopefully you will not need to use it too often. It is a bit like having to use [gdb] to debug a c program.
pep -If someScript input.txt
pep -Ia asm.pp someScript
pep -a asm.pp someScript
(Now you can step through the compiled program “asm.pp” and watch as
it parses and compiles “someScript". Generally, use” rr" to run the
whole script, and “rrw text” to run the script until the workspace
is some particular text. This helps to narrow down where the asm.pp
compiler is not parsing the input script correctly.
Once in an interactive “pep” session, there are many commands to run and debug a script. Type hh to get a full list of available commands. For example:
-
- count - execute the next instruction in the program (step)
- mark - view the state of the machine (stack/ ??workspace/ ??registers/ ??tape/ ??program)
- rrw & ??lt;text> - run the script until the workspace is exactly some text.
- rre & ??lt;text> - run script until the workspace ends with something
- rrc & ??lt;num> - run script with & ??lt;num> characters of input.
- rr - run the whole script from the current instruction
- go.read - reset the virtual machine and input stream
(but not the compiled program)
If the script did not compile properly there will only be 1
instruction ( quit
). But this almost never happens these days
(2025)
Probably the most primitive technique is just using the print
command to show the contents of the workspace at a given time and
also commenting out problematic lines and blocks.
The nom multi-line comment syntax #*
... # is very useful for this.
Usually it is more enlightening to use the state
command rather than a simple print command.
"article*noun*" {
# !!! no clear.
add "nounphrase*"; push; .reparse
}
As a matter of habit, I write the clear
command on the
beginning of the line where I am going to create a new parse token.
This reminds me that I always need to clear the workspace
before using it for something different
pop;pop;
"bow*arrow*" {
# write clear at the start of the line to
# avoid forgetting it.
clear; get; add " shoot "; ++; get; --; put;
clear; add "defender*"; push; .reparse
}
"sentence*" { get; ++; put; } # unbalanced tape increment
Generally if you increment the tape pointer with ++
then you will have
to decrement it with --
in the same block. This is to ensure that the
stack and the tape remain synchronised. There are exceptions to this
rule, since you are free to write your scripts however you want, and to use
the virtual machine in any way you wish.
add "noun*verb*noun*"; push; push; # << error, 3 tokens, 2 pushes
"article*noun*" {
clear; add "nounphrase*"; push;
# error! no '.reparse' command
}
The .reparse
command is important for ensuring that all
grammar reductions take place. It also acts as an if/else logic
structure because code in the same block, after the .reparse
command will not execute.
"."{ clear; } print; clear; # << error: no read in script
Two “pop” commands does not guarantee that there are 2 tokens in the workspace. The stack may be empty, or may contain only 1 token.
# snippet ...
pop; pop;
B"noun*".!"noun*".!E"verb*" {
# process tokens here.
}
Often we expect a certain order of tokens, without realising that an extra token has already been parsed and pushed onto the stack.
In extreme cases, you may wish to see how a script is “compiled ” into the ℙ𝕖𝕡 assembly language. This would be only if you suspect that there is a bug in the compiler. The debugging techniques mentioned above a more practical.
pep -f compile.pss script
The compiled script will be printed to stdout and saved in sav.pp
or an error message will be displayed if the script has a syntax
error.
Sometimes the line above is useful for finding errors in a script which are not caught during the script loading process.
Use the script eg/nom.reference.syntax.pss
to check the syntax of the script.
This script has much better error checking currently (april 2025) than the
bumble.sf.net/books/pars/compile.pss which is the nom script from which the nom compiler is
generated although at some point I will rewrite the compiler to use
a “modern” up-to-date grammar and error checking.
pep -f eg/nom.reference.syntax.pss script.pss
pep -f eg/mark.latex.simple.pss pars-book.txt | sed '/%% ---/q;' | sed 's/^[^:]*: *//;s/\* *$//' | tr '*' '\n' | sort | uniq