A concatenative programming language https://sloum.colorfield.space/docs/nimf/
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
sloum f6d9a2c385 Adds syntactic sugar for variable access 4 months ago
examples Adds to cd and pwd to builtins. Updates readme. Adds install-lib command to interpreter 4 months ago
lib Adds syntactic sugar for variable access 4 months ago
termios Adds basic termios features to enable key input and get window size 3 years ago
.gitignore Reworks output, provides string support, accepts some current limitations, and adds cl file read 3 years ago
Makefile Smaller binary when using makefile 2 years ago
README.md Adds syntactic sugar for variable access 4 months ago
TODO.md Adds to cd and pwd to builtins. Updates readme. Adds install-lib command to interpreter 4 months ago
builtins.go Adds to cd and pwd to builtins. Updates readme. Adds install-lib command to interpreter 4 months ago
files.go Beginning of rewrite with lex/parse phases and new scanner, less forthy string, etc. Strings arent working right, but other things are. 5 months ago
go.mod Adds bitwise operations, timestamp generation, and random lib 3 years ago
go.sum Smaller binary when using makefile 2 years ago
helpers.go Adds to cd and pwd to builtins. Updates readme. Adds install-lib command to interpreter 4 months ago
main.go Adds syntactic sugar for variable access 4 months ago
memory.go Cleans up readme, removes all calls to panic and replaces them with managed error messages, removes dead words from word map 5 months ago
names.go Removes one more dead function 5 months ago
nimf.1 Updates the manpage 2 years ago
tokenizer.go Adds syntactic sugar for variable access 4 months ago

README.md

nimf

Nimf it an interpreted language written in golang. It bears no relationship to the language 'nim', which is very different. Nimf is an implementation of a concatenative language in a more or less forth-ish style. It has been created mostly as a learning exercise, but is definitely usable for certain types of programming tasks as well as for educational purposes.

After languishing for a few years while I worked on my other language (slope), I found some time to return to nimf and write a proper lex/parse cycle. This simplified a lot of the execution, allowed for better debugging/error tracing, and hopefully will be a better foundation to build on in the future. These changes do have the effect of moving away from the pseudo-register based code approach for certain parse features; for example, strings are now a parsed item, and not a flag that is toggled to change the interpreter mode.

Building

A Go compiler is required. I am not certain how old a version will work, but there are no external dependencies and I imagine anything >= 1.11 should be fine. Nimf was developed with Go 1.12 and 1.13.

A BSD compatible makefile has been included and contains a few targets of interest:

  • make will build the interpreter's binary in the current directory
  • make install will install the binary, manpage, and standard library to the /usr/local file tree (adjustable via the PREFIX variable)
    • This may require administrator privileges (sudo or the like) depending on your system setup
  • make install-local will build the binrary in the current directory and install the standard library to either $XDG_DATA_HOME/nimf or ~/.local/share/nimf
    • This is useful on systems where you do not have administrator privileges
  • make install-lib will install the standard library to the /usr/local file tree
    • This is generally desireable when developing core libraries and not wanting to build everything every time
  • make install-lib-local
    • Like the previous one, but installs to either $XDG_DATA_HOME/nimf or ~/.local/share/nimf

Running go install is also an option, but it wont help you out with the standard lib and the manpage.

Resources

Nimf is currently hosted in a git repository which can be found at git.rawtext.club/nimf-lang/nimf. The main documentation is this README. However, full api documentation is coming soon and will be hosted over https/gopher/gemini.

The Language

Things will look at first glance more or less like forth (nimf stands for: nimf is mostly forth). Anything inside of ( and ) is a comment. Nimf only supports this type of commenting, which works over multiple lines and can be anywhere inside a line, but at present cannot be nested.

Upon launching the REPL only the builtins will be loaded. These powerful words provide the building blocks for all of the nimf libraries, but lack a lot of the niceties of using the modules/libraries. nimf comes with a number of modules that can be loaded using inline, which will bring all of the variables and words into the global scope. Most modules also call other modules. For example, the text module inlines the std module. That means that if you only inline text, you also have access to std. However, it is recommended to be explicit and load all of the modules you intend to use rather than relying on hidden loading. There is no performance cost to calling inline on a module that has already been loaded: it will not be reloaded and no additional parsing will need to be done.

Hello World

"text" inline
"Hello, world" str.print-buf

In the above we first inline the text library. Nimf makes a distinction between builtins, which are coded in Go, and module functions, which are coded directly in Nimf from the primitives offered by the builtins. The text library contains words prefixed with ch and str (for character and string, respectively).

After inlining the text module we start a string literal with " and finish the string literal with ". Note that unlike in a traditional forth system, strings in nimf are handled by a lexer/parser and are not handled via " as a word itself (early versions of nimf did have " as a word, but it was changed to syntax in order to have more useful strings). The string is saved to temporary string storage (starting at memory address 50). The word str.print-buf is used to print the string currently held in temporary string storage. The temporary storage gets used by many words and should not be counted on to store a word for any longer than the next word call. The word could instead be saved to a variable:

"text" inline
"Hello, world" svar myString
myString str.print

In the above we start out like before, but instead of just printing we first call svar myString. svar reserves enough memory for the string currently in the temporary string buffer and copies that buffer over to that memory. It then creates a reference to that memory via the word that comes after svar. in this case myString. We then use myString, which puts the address of the string onto the stack, and str.print which takes the address of of the top of the stack and prints that address. str.print is used in str.print-buf which is implemented as str.buf-addr str.print (with str.buf-addr being the address of the temporary string buffer). As such, we could also have done: "Hello, world" str.buf-addr str.print. There are lots of options.

Math

Each of the below will leave the result on the stack. We could call , after each one to print the result, but here we will call .s at the end to view the stack itself.

5 6 +      ( 11 )
3 9 -      ( -6 )
2 4 *      (  8 )
7 2 /      (  3 )
7 2 %      (  1 )
7 2 /%     (  1, 3 )

.s         ( Prints: <7> [ 11, -6, 8, 3, 1, 1, 3 ] )

The .s builtin prints the stack. The first value in the example above (<7>) is how many items are on the stack, followed by the stack itself.

There are a number of numerical helpers available in the num module (" num " inline) as well and some expanded comparison opperators available in the std module (on its own nimf provides =, >, and <).

Subroutines

: squared ( n -- n*n :: squares a number ) dup * ;

The : starts a subroutine declaration, everything between ( and ) is a comment, dup makes a copy of the top item on the stack, * multiplies the TOS by the item under it, and ; ends the subroutine definition.

We can now call it like so:

5 squared .

The . drops and prints the top item on the stack (TOS) and adds a space after it, to avoid the space use ,. The above would output 25 .

Variables

Variables default to being sized to a single cell (size of int on your system, likely either 32 or 64 bits). From this basic variable you can store numbers, characters, flags, other memory addresses, etc. You can also extend variables for use in more complex structures or easily store and retrieve strings in memory.

A major limitation, and thus an adjustment when coming to nimf from many other languages, is that words do not have local variables/scope. A variable cannot be created within a subroutine and must exist in global space. To work around this limitations you will often see a word move the contents of a global variable onto the return stack at the beginning of the word, then use that variable/memory during operation, then at the end of the word move the values from the return stack back to their variables. This methodology allows you to use global variables with local values in a local scope and then return their state after you are done. If this sounds complicated or is a little beyond your usage for the language at present: don't worry. You will likely know when you need it and it will likely make more sense then. To see an example of how that sort of value passing via the return stack might work you can look at the text module. At the very bottom it defines two words that move the text module values onto and off of the return stack. They are called in a lot of the words in the text module.

Naming

Variables can be named anything you like with the following exceptions:

  • A variable, or a word for that matter, cannot contain only digits (ex. 23) as the interpreter will treat this as an integer
  • A variable or word name cannot take the form of a decimal, hexidecimal, octal, or binary number. Using numbers in a var name or word is fine, so long as they do not match the established patterns for these number forms
  • A variable or word name cannot contain whitespace
  • A variable cannot start with @, !, s@, or s!. These are used a syntactic sugar for quick access to variable values

As a convention, not enforced at a code level, private words can be created by using the following variable/word naming scheme: module-name.private.name. For example: url.private.port would be part of the url module and is intended to only be used internally so is marked private, it is then given the name port. Using private in this way excludes any variables and words containing .private. from the word listing provided with the word words. In reality you can still call private words if you like, but private is one way a developer can provide intent to other developers.

General Variables

var myvar
myvar .       ( The address of myvar: 101234 )
myvar get .   ( The value of myvar: 0 )
5 myvar set
myvar get .   ( The value of myvar: 5 )
8 myvar +!    ( Adds 8 to the value of myvar - not to the address )
myvar get .   ( The value of myvar: 13 )

The first line, above, adds the name myvar to the dictionary and assigns it a memory address. The second line prints the address of myvar. The third line prints the value at that address. All variables can be thought of as pointers, for those familiar with the term. You can get the value stored at an address with get or @ (they are the same). Since we have not given myvar a value yet, the value is 0. The set or ! builtins update a memory address. The following line shows that the value has been updated. The +! subroutine adds to the value at an address and the last line shows the result of this update.

Extending Variables

Nimf has the ability to store variables that take up multiple cells. This can be done with the allot keyword. Using allot will allow you to create something like arrays and is used often for managing strings.

: ? ( addr -- ) @ . ;

var myArray         ( Reserve a memory address )
5 allot             ( myArray is already 1 cell, '5 allot' adds 5 more for a total of 6)
5 myArray set       ( Set myArray to 5, so that the length of the array can be referenced )
9 myArray 1 + set   ( Set the the first offset, the first non-length value, to 9 by referencing `myArray 1 +` )
2 myArray 2 + set   ( Set the second offset to 2 by referencing `myArray 2 +` )

myArray ? myArray ++ ? myArray 2 + ?   ( 5 9 2 )

In the above example we first create a simple way to print the value at a memory location via the word ?.

After creating ? we create a variable myArr. We then allot 2 extra cells for this variable. allot does not take a memory address, it just expands the most recently created variable. We then update each cell in the "array" to a value, this is done by providing an offset to myArray. Lastly we use our newly created ? to view the value at each offset position.

allot opperates in an increasing manner on the next available memory. You cannot define a variable (A) then another variable (B) and go back an allot more for (A). You must allot before creating any other variables. Note that it is possible that inlining a module may also assign new variables and make it so that you can no longer allot for a variable you created, so be aware of this limitation of allot when creating additional variables or inlining modules. Best practice is to initialize a var or svar and then immediately allot.

String Variables

"text" inline

"hello" svar hi         ( Puts 'hello' into the temporary string buffer, reserves memory space, copies the string into it, and adds a new word, 'hi' )
hi .                    ( Memory address: 63214 )
hi str.print            ( Outputs: hello )
str.print-buf           ( Outputs: hello )
"hola" str.print-buf    ( Outputs: hola )
hi str.print            ( Outputs: hello )
hi @ .                  ( Outputs: 5, the length of the string )
hi 2 + @ emit           ( Outputs: e, the second char )

We first inline the text module so that we have access to the print oriented words (which could easily be define on your own without the need for text, but that is not in scope here). We then create a string hello. That puts hello into the temporary string buffer. Calling str.print-buf or str.buf-addr str.print would print out hello. Instead we call svar hi, which secures enough memory to hold the string found in the temporary string buffer and copies the string, including its length, to the memory location secured by svar. Following the assignment is an example of printing the string as well as printing the temporary string buffer (which still contains the same string). We then add a new string to the temporary string buffer and print it, then print hi to show that they now differ. Outputting thevalue at hi gives the length of the string. Lastly we output the value at hi+1, as a character via emit, which will convert an integer to a character and output it. This yields h.

Strings are stored in memory as like so:

"hello" svar hi
hi .   ( Address: 920 )
hi @   (   Value:   5 )

- - - - - - - - - - - -

Address: | 920 | 921 | 922 | 923 | 924 | 925 |
Value:   |   5 | 104 | 101 | 108 | 108 | 111 |
                 'h'   'e'   'l'   'l'   'o'

The words found in the text module know to treat the first value as the length and the rest of them as characters. For example, calling: 108 emit would print l. Mapping strings in this way allows you to get part of a string based on a simple offset value. The third character of the above example can be acquired very easily: hi 3 + @. Simply add 3 to the address and get the value. In that light, arrays and strings can be thought of as 1 indexed, as opposed to the often more common 0 indexed.

Much like simple variables, strings also have getters and setters. Calling hi get-string, referencing the above example, would copy the value of the string that hi references into the temporary string buffer. Moving in the other direction we can move a string from the temporary string buffer to a memory address with set-string: "hola" 2300 set-string. That example would move the string 'hola' to memory address 2300. This is useful, but dangerous: you need to know that enough memory is writable at that spot to support the string. Otherwise you risk overwriting memory you might be using for something else. So, be careful. It is often useful to allot more memory than you need via var stringSpace 500 allot or the like. Then you can always overwrite up to 500 characters when assigning strings to the variable stringSpace. You can use var and allot to manage string memory in a more fine grained way and svar when you have a string already and just want it moved into memory.

Local Variables

Local variables are the only kind of variable that can be created within a word definition. In fact, they can only be defined inside of a word definition. When the word finishes executing all of the words that it is composed of, the local variables will be cleared from memory automatically.

: local-example
  local x
  5 x set
  x , space cr ( print the memory address of `x` )
  x get ( print the value of `x` )
;

local-example
x ,

In the above example we create a word local-example. In that word we use the local word to create a variable called x. The value held at memory address x is updated to 5 and that value is printed, along with the value of x itself, which is a memory address. We cal the word after defining it. After it prints the memory address of x it completes the word and clears the memory. When x is referenced outside of the word, x will not be found in the word dictionary and an error will be thrown (unless there is a global variable named x).

Some things to note:

  • Local variables can have the same name as a global variable. The local one will always be used when both exist inside of a word.
  • Local variables are only accessible in the word they reside in, sort of. The memory addresses they use will be available during the execution of the word containing the local, and thus all words called within that word will have the memory available as well. However, they do not have access to the variable name (x in the above example). So if you want to use that memory in another word, simply put the value on the stack for the next word to opperate on.
  • If a local is created within a loop, a new memory address will be assigned to that local variable at each loop itteration. So the naming will be overwritten. The memory that it previously occupied still exists until the end of the word though. This is a tricky quirk to utilize, but know that it is possible.
  • You can still allot more space for a local variable (in order to, for example, store a string or array), but to do so you must use the lallot word, rather than allot, as they opperate on different pointers to the same memory space.

Variable Syntactic Sugar

As of version 1.04 there is a way to get the value from a variable onto the stack or to store a value in a variable name. It works with string variables as well, but does not work to get full ranges of variables that have been expanded via allot.

var my-var ; create the variable `my-var`
20 !my-var ; store 20 in my-var
@my-var ,  ; put the value of my-var on TOS and print it

Basically, you can prepend a variable with @ or ! and the interpreter will expand 20 !my-var, for example, 20 my-var !. The string versions work the same, but you use s@my-var and s!my-var respectively. This is not a required feature and is just a way to make code a tiny bit less verbose and indicate in one word what is happening with a variable.

Branching

What would a programming language be without branching? Brnaching (if, else, then) can only be used within word definitions and function as follows:

"text" inline
"num" inline

: mySubroutine ( n -- )
  dup 10 > if
    . "is greater than 25" str.print-buf
    exit
  then

  dup num.positive? if
    . "is greater than 0" str.print-buf
  else
    . "is less than or equal to zero" str.print-buf
  then
;

50 mySubroutine     ( Output: 50 is greater than 25 )

In the very contrived branching above we put 50 on the stack. We then compare 50 and 25 via the > word, which will -1 on the stack if 50 is greater than 25 or 0 if not. if will branch based on the value on TOS. In this case it is truthy (-1, or any value other than 0) so it enters the first branch and prints out 50 is greater than 25. Nesting can occur by adding a new conditional inside an if or else. Remember to use dup to duplicate the value on the top of the stack if you will want to use it beyond the conditional.

The branching in nimf is currently a little funky in its implementation. Deep nesting can often have unexpected results. This is actively being worked on. Guard clauses, such as the one above (the first if where an exit is used inside), are encouraged as a way to reduce code complexity. Not all situations will allow for using exit, which leaves the current word immediately (similar to a return in a C based language, except that it doesnt return anything since the stack is a persistent structure). Using guards and exit avoids most of the current pitfalls with branching.

The std library can be inlined to provide a number of conditional logic constructs including and, or, 0=, !=, =, >=, <=, 0<, etc. The num module also contains some useful items. In the above example num.positive? was used to see if a number was greater than zero. You can, of course, just use dup 0 > instead of num.positive? but some words that use mostly symbols can be hard to remember and a clearly named word like num.positive? can improve code readability should you need to come back to it at a later time.

Branching utilizes the return stack, so be careful when using the return stack inside of an if or else block. Anything you put on the return stack should be taken off before the conditional segment you are in ends (so before else or then if you are in the truthy segment and before then if in the falsy segment). Care should also be taken when using the return stack in nested if blocks.

Loops

Nimf currently only supports one type of loop. do [...] loop. do marks the beginning of a loop. The code within a do will always be run at least once, unless it is surrounded by an if [...] then construct. The loop keyword eats TOS and if the value is truthy will return to do, otherwise the loop will end and execution will continue outside of the do [...] loop construct. Like branching, loops can only be used within subroutines:

"std" inline

: to100 ( n -- )
  dup 100 <=
  if
    do
      dup .
      ++
      dup 100 <=
    loop
  then
  drop
;

1 to100   ( 1 2 3 4 5 6 7 8 [...] )

In the above example we inline the std lib (to gain access to ., ++, and <=) and create a subroutine to100. The subroutine first checks that TOS is less than or equal to 100, if not it just drops TOS and ends. If so it enters a loop, duplicates and outputs TOS, increments TOS, duplicates TOS and checks to see if it is still less than or equal to 100... if so it loops, if not it leaves the loop. It then drops top of stack.

This is a fairly basic example and shows simple looping and conditionals. More complex loops may require the use of counters or other stored information. For an example that uses variables look in lib/std.nh for prints, which we used above for string printing (it actually needs some work to make sure previous variable states dont get overwritten. That improvement is coming soon TM)

Similar to branching, take care when using the return stack inside a loop. Anything you put on the return stack after do should be taken off before reaching loop.

Errors

Throwing an error can be done as follows:

: errorTest ( -- )
  1 2 +
  "Random error" error
  5 *
;

Running errorTest above will result in 3 being added to TOS and an error message being thrown stating Error: Random error, code execution will stop there. If you are running in interactive mode, the stack will be cleared and all operation flags reset. If you are running from a file, execution will cease and your program will exit with a non-0 exit code.

It is also possible to exit a program early without an error message via the halt word. halt eats top of stack and exits the programing setting the value it received from top of stack as the exit code for the program.

The Interpreter

Syntax: nimf [options] [filepath]

Nimf can be run with our without a file as input. When a file is provided as input the interpreter will run the contents of the file without command prompts or interactivity beyond what was coded in the file and will exit when the file has completed (or an error will be displayed).

Running nimf without any filepath will launch nimf in interactive mode. The user will be presented a repl and will be able to input code and see results in real time.

Nimf works fine with shebang lines (ex. #! /usr/bin/env nimf) and can thus run executable nimf scripts directly.

Runtime Options

When invoking nimf in either interactive or file mode the following command line options are available:

  • -memory [int] The number of memory cells to run nimf with (default: 250000, min: 34999)
  • -stack-depth [int] The depth of the two stacks (data and return, default: 250, min: 1)
  • -run [string] Pass in a string of commands to run as a one liner, similar to python's -c flag
  • -h Print command help and exit
  • -v Print the version number and exit
  • -limit-io Run the interpreter in a mode that does not allow filesystem access
  • -install-mod [string] Install a module to the local lib from a path or url; supports http(s), gemini, gopher, and local files. Note that this should be a single file, not a repo or directory

Examples:

nimf -memory 50000 -stack-depth 335 ./my-file.nf

nimf -run '1 2 + 3 4 * 5 .s'

Files

By convention nimf files that are meant to be executed have the filetype .nf (nimf file). If a file is meant to be inlined into another file and only contains variable declarations/allotment and word definitions it should end in .nh (nimf header).

Modules

When using the inline builtin you can often just use the name of the file you are wanting to inline. Nimf will search for the file in the following order:

  • Your local directory: ./[filename]
  • A lib directory in the local folder: ./lib/[filename]
  • The system lib folder: /usr/local/lib/nimf/[filename]

If the filename you are inlining does not have a suffix, nimf will look for [filename].nh. So when running:

"std" inline

The interpreter is looking for std.nh in each of the three locations and loading the first match it finds. If a module is requested via inline and nimf has already loaded it, the inline command will be ignored (no extra searching or processing will be performed).

Examples

At present the repo does come with one example file. You can run it, from the nimf directory, like so:

make
./nimf ./examples/ascii.nf

You should see a nicely formatted table of the printable ascii characters appear on your screen.

Additionally, the module gopher can be inlined in interactive mode to provide a minimalistic but usable gopher client interface:

"gopher" inline

gopher.visit  ( will query for host and path )

( ... Prints the text file, parsed gopher map, or an error )

5 gopher.follow  ( will follow link #5 )

( ... Prints the text file, parsed gopher map, or an error )

gopher.back  ( will return to, and print, the previous page )

( ... )

6 gopher.url?  ( will print the address that link 6 would take you to )

The gopher client uses the minimal TCP api available to nimf, more about which will be written in a future version of this document.

Syntax highlighting

If you are a vim user, a syntax plugin for nimf is available here. It includes basic indentation rules as well as syntax highlighting for various structures.

If anyone wants to make an emacs or nano syntax that would be awesome. My text editor hermes (based on Kilo, by antirez) can be easily set up to highlight nimf syntax as well.