|
4 months ago | |
---|---|---|
examples | 4 months ago | |
lib | 4 months ago | |
termios | 3 years ago | |
.gitignore | 3 years ago | |
Makefile | 2 years ago | |
README.md | 4 months ago | |
TODO.md | 4 months ago | |
builtins.go | 4 months ago | |
files.go | 5 months ago | |
go.mod | 3 years ago | |
go.sum | 2 years ago | |
helpers.go | 4 months ago | |
main.go | 4 months ago | |
memory.go | 5 months ago | |
names.go | 5 months ago | |
nimf.1 | 2 years ago | |
tokenizer.go | 4 months ago |
README.md
nimf
Nimf it an interpreted language written in golang. It bears no relationship to the language 'nim', which is very different. Nimf is an implementation of a concatenative language in a more or less forth-ish style. It has been created mostly as a learning exercise, but is definitely usable for certain types of programming tasks as well as for educational purposes.
After languishing for a few years while I worked on my other language (slope), I found some time to return to nimf and write a proper lex/parse cycle. This simplified a lot of the execution, allowed for better debugging/error tracing, and hopefully will be a better foundation to build on in the future. These changes do have the effect of moving away from the pseudo-register based code approach for certain parse features; for example, strings are now a parsed item, and not a flag that is toggled to change the interpreter mode.
Building
A Go compiler is required. I am not certain how old a version will work, but there are no external dependencies and I imagine anything >= 1.11 should be fine. Nimf was developed with Go 1.12 and 1.13.
A BSD compatible makefile has been included and contains a few targets of interest:
make
will build the interpreter's binary in the current directorymake install
will install the binary, manpage, and standard library to the/usr/local
file tree (adjustable via thePREFIX
variable)- This may require administrator privileges (
sudo
or the like) depending on your system setup
- This may require administrator privileges (
make install-local
will build the binrary in the current directory and install the standard library to either$XDG_DATA_HOME/nimf
or~/.local/share/nimf
- This is useful on systems where you do not have administrator privileges
make install-lib
will install the standard library to the/usr/local
file tree- This is generally desireable when developing core libraries and not wanting to build everything every time
make install-lib-local
- Like the previous one, but installs to either
$XDG_DATA_HOME/nimf
or~/.local/share/nimf
- Like the previous one, but installs to either
Running go install
is also an option, but it wont help you out with the standard lib and the manpage.
Resources
Nimf is currently hosted in a git repository which can be found at git.rawtext.club/nimf-lang/nimf. The main documentation is this README. However, full api documentation is coming soon and will be hosted over https/gopher/gemini.
The Language
Things will look at first glance more or less like forth (nimf stands for: nimf is mostly forth). Anything inside of (
and )
is a comment. Nimf only supports this type of commenting, which works over multiple lines and can be anywhere inside a line, but at present cannot be nested.
Upon launching the REPL only the builtins will be loaded. These powerful words provide the building blocks for all of the nimf libraries, but lack a lot of the niceties of using the modules/libraries. nimf comes with a number of modules that can be loaded using inline
, which will bring all of the variables and words into the global scope. Most modules also call other modules. For example, the text
module inlines the std
module. That means that if you only inline text
, you also have access to std
. However, it is recommended to be explicit and load all of the modules you intend to use rather than relying on hidden loading. There is no performance cost to calling inline
on a module that has already been loaded: it will not be reloaded and no additional parsing will need to be done.
Hello World
"text" inline
"Hello, world" str.print-buf
In the above we first inline
the text
library. Nimf makes a distinction between builtins, which are coded in Go, and module functions, which are coded directly in Nimf from the primitives offered by the builtins. The text library contains words prefixed with ch
and str
(for character and string, respectively).
After inlining the text
module we start a string literal with "
and finish the string literal with "
. Note that unlike in a traditional forth system, strings in nimf are handled by a lexer/parser and are not handled via "
as a word itself (early versions of nimf did have "
as a word, but it was changed to syntax in order to have more useful strings). The string is saved to temporary string storage (starting at memory address 50
). The word str.print-buf
is used to print the string currently held in temporary string storage. The temporary storage gets used by many words and should not be counted on to store a word for any longer than the next word call. The word could instead be saved to a variable:
"text" inline
"Hello, world" svar myString
myString str.print
In the above we start out like before, but instead of just printing we first call svar myString
. svar
reserves enough memory for the string currently in the temporary string buffer and copies that buffer over to that memory. It then creates a reference to that memory via the word that comes after svar
. in this case myString
. We then use myString
, which puts the address of the string onto the stack, and str.print
which takes the address of of the top of the stack and prints that address. str.print
is used in str.print-buf
which is implemented as str.buf-addr str.print
(with str.buf-addr
being the address of the temporary string buffer). As such, we could also have done: "Hello, world" str.buf-addr str.print
. There are lots of options.
Math
Each of the below will leave the result on the stack. We could call ,
after each one to print the result, but here we will call .s
at the end to view the stack itself.
5 6 + ( 11 )
3 9 - ( -6 )
2 4 * ( 8 )
7 2 / ( 3 )
7 2 % ( 1 )
7 2 /% ( 1, 3 )
.s ( Prints: <7> [ 11, -6, 8, 3, 1, 1, 3 ] )
The .s
builtin prints the stack. The first value in the example above (<7>
) is how many items are on the stack, followed by the stack itself.
There are a number of numerical helpers available in the num
module (" num " inline
) as well and some expanded comparison opperators available in the std
module (on its own nimf provides =
, >
, and <
).
Subroutines
: squared ( n -- n*n :: squares a number ) dup * ;
The :
starts a subroutine declaration, everything between (
and )
is a comment, dup
makes a copy of the top item on the stack, *
multiplies the TOS by the item under it, and ;
ends the subroutine definition.
We can now call it like so:
5 squared .
The .
drops and prints the top item on the stack (TOS) and adds a space after it, to avoid the space use ,
. The above would output 25
.
Variables
Variables default to being sized to a single cell (size of int
on your system, likely either 32 or 64 bits). From this basic variable you can store numbers, characters, flags, other memory addresses, etc. You can also extend variables for use in more complex structures or easily store and retrieve strings in memory.
A major limitation, and thus an adjustment when coming to nimf from many other languages, is that words do not have local variables/scope. A variable cannot be created within a subroutine and must exist in global space. To work around this limitations you will often see a word move the contents of a global variable onto the return stack at the beginning of the word, then use that variable/memory during operation, then at the end of the word move the values from the return stack back to their variables. This methodology allows you to use global variables with local values in a local scope and then return their state after you are done. If this sounds complicated or is a little beyond your usage for the language at present: don't worry. You will likely know when you need it and it will likely make more sense then. To see an example of how that sort of value passing via the return stack might work you can look at the text
module. At the very bottom it defines two words that move the text
module values onto and off of the return stack. They are called in a lot of the words in the text
module.
Naming
Variables can be named anything you like with the following exceptions:
- A variable, or a word for that matter, cannot contain only digits (ex.
23
) as the interpreter will treat this as an integer - A variable or word name cannot take the form of a decimal, hexidecimal, octal, or binary number. Using numbers in a var name or word is fine, so long as they do not match the established patterns for these number forms
- A variable or word name cannot contain whitespace
- A variable cannot start with
@
,!
,s@
, ors!
. These are used a syntactic sugar for quick access to variable values
As a convention, not enforced at a code level, private words can be created by using the following variable/word naming scheme: module-name.private.name. For example: url.private.port
would be part of the url
module and is intended to only be used internally so is marked private, it is then given the name port
. Using private
in this way excludes any variables and words containing .private.
from the word listing provided with the word words
. In reality you can still call private words if you like, but private is one way a developer can provide intent to other developers.
General Variables
var myvar
myvar . ( The address of myvar: 101234 )
myvar get . ( The value of myvar: 0 )
5 myvar set
myvar get . ( The value of myvar: 5 )
8 myvar +! ( Adds 8 to the value of myvar - not to the address )
myvar get . ( The value of myvar: 13 )
The first line, above, adds the name myvar
to the dictionary and assigns it a memory address. The second line prints the address of myvar
. The third line prints the value at that address. All variables can be thought of as pointers, for those familiar with the term. You can get the value stored at an address with get
or @
(they are the same). Since we have not given myvar
a value yet, the value is 0
. The set
or !
builtins update a memory address. The following line shows that the value has been updated. The +!
subroutine adds to the value at an address and the last line shows the result of this update.
Extending Variables
Nimf has the ability to store variables that take up multiple cells. This can be done with the allot
keyword. Using allot will allow you to create something like arrays and is used often for managing strings.
: ? ( addr -- ) @ . ;
var myArray ( Reserve a memory address )
5 allot ( myArray is already 1 cell, '5 allot' adds 5 more for a total of 6)
5 myArray set ( Set myArray to 5, so that the length of the array can be referenced )
9 myArray 1 + set ( Set the the first offset, the first non-length value, to 9 by referencing `myArray 1 +` )
2 myArray 2 + set ( Set the second offset to 2 by referencing `myArray 2 +` )
myArray ? myArray ++ ? myArray 2 + ? ( 5 9 2 )
In the above example we first create a simple way to print the value at a memory location via the word ?
.
After creating ?
we create a variable myArr
. We then allot
2 extra cells for this variable. allot
does not take a memory address, it just expands the most recently created variable. We then update each cell in the "array" to a value, this is done by providing an offset to myArray
. Lastly we use our newly created ?
to view the value at each offset position.
allot
opperates in an increasing manner on the next available memory. You cannot define a variable (A) then another variable (B) and go back an allot more for (A). You must allot
before creating any other variables. Note that it is possible that inlining a module may also assign new variables and make it so that you can no longer allot for a variable you created, so be aware of this limitation of allot
when creating additional variables or inlining modules. Best practice is to initialize a var
or svar
and then immediately allot
.
String Variables
"text" inline
"hello" svar hi ( Puts 'hello' into the temporary string buffer, reserves memory space, copies the string into it, and adds a new word, 'hi' )
hi . ( Memory address: 63214 )
hi str.print ( Outputs: hello )
str.print-buf ( Outputs: hello )
"hola" str.print-buf ( Outputs: hola )
hi str.print ( Outputs: hello )
hi @ . ( Outputs: 5, the length of the string )
hi 2 + @ emit ( Outputs: e, the second char )
We first inline the text
module so that we have access to the print oriented words (which could easily be define on your own without the need for text
, but that is not in scope here). We then create a string hello
. That puts hello
into the temporary string buffer. Calling str.print-buf
or str.buf-addr str.print
would print out hello
. Instead we call svar hi
, which secures enough memory to hold the string found in the temporary string buffer and copies the string, including its length, to the memory location secured by svar
. Following the assignment is an example of printing the string as well as printing the temporary string buffer (which still contains the same string). We then add a new string to the temporary string buffer and print it, then print hi
to show that they now differ. Outputting thevalue at hi
gives the length of the string. Lastly we output the value at hi+1, as a character via emit
, which will convert an integer to a character and output it. This yields h
.
Strings are stored in memory as like so:
"hello" svar hi
hi . ( Address: 920 )
hi @ ( Value: 5 )
- - - - - - - - - - - -
Address: | 920 | 921 | 922 | 923 | 924 | 925 |
Value: | 5 | 104 | 101 | 108 | 108 | 111 |
'h' 'e' 'l' 'l' 'o'
The words found in the text
module know to treat the first value as the length and the rest of them as characters. For example, calling: 108 emit
would print l
. Mapping strings in this way allows you to get part of a string based on a simple offset value. The third character of the above example can be acquired very easily: hi 3 + @
. Simply add 3 to the address and get the value. In that light, arrays and strings can be thought of as 1 indexed, as opposed to the often more common 0 indexed.
Much like simple variables, strings also have getters and setters. Calling hi get-string
, referencing the above example, would copy the value of the string that hi references into the temporary string buffer. Moving in the other direction we can move a string from the temporary string buffer to a memory address with set-string
: "hola" 2300 set-string
. That example would move the string 'hola' to memory address 2300. This is useful, but dangerous: you need to know that enough memory is writable at that spot to support the string. Otherwise you risk overwriting memory you might be using for something else. So, be careful. It is often useful to allot
more memory than you need via var stringSpace 500 allot
or the like. Then you can always overwrite up to 500 characters when assigning strings to the variable stringSpace
. You can use var
and allot
to manage string memory in a more fine grained way and svar
when you have a string already and just want it moved into memory.
Local Variables
Local variables are the only kind of variable that can be created within a word definition. In fact, they can only be defined inside of a word definition. When the word finishes executing all of the words that it is composed of, the local variables will be cleared from memory automatically.
: local-example
local x
5 x set
x , space cr ( print the memory address of `x` )
x get ( print the value of `x` )
;
local-example
x ,
In the above example we create a word local-example
. In that word we use the local
word to create a variable called x
. The value held at memory address x
is updated to 5
and that value is printed, along with the value of x
itself, which is a memory address. We cal the word after defining it. After it prints the memory address of x
it completes the word and clears the memory. When x
is referenced outside of the word, x
will not be found in the word dictionary and an error will be thrown (unless there is a global variable named x
).
Some things to note:
- Local variables can have the same name as a global variable. The local one will always be used when both exist inside of a word.
- Local variables are only accessible in the word they reside in, sort of. The memory addresses they use will be available during the execution of the word containing the local, and thus all words called within that word will have the memory available as well. However, they do not have access to the variable name (
x
in the above example). So if you want to use that memory in another word, simply put the value on the stack for the next word to opperate on. - If a local is created within a loop, a new memory address will be assigned to that local variable at each loop itteration. So the naming will be overwritten. The memory that it previously occupied still exists until the end of the word though. This is a tricky quirk to utilize, but know that it is possible.
- You can still allot more space for a local variable (in order to, for example, store a string or array), but to do so you must use the
lallot
word, rather thanallot
, as they opperate on different pointers to the same memory space.
Variable Syntactic Sugar
As of version 1.04
there is a way to get the value from a variable onto the stack or to store a value in a variable name. It works with string variables as well, but does not work to get full ranges of variables that have been expanded via allot.
var my-var ; create the variable `my-var`
20 !my-var ; store 20 in my-var
@my-var , ; put the value of my-var on TOS and print it
Basically, you can prepend a variable with @
or !
and the interpreter will expand 20 !my-var
, for example, 20 my-var !
. The string versions work the same, but you use s@my-var
and s!my-var
respectively. This is not a required feature and is just a way to make code a tiny bit less verbose and indicate in one word what is happening with a variable.
Branching
What would a programming language be without branching? Brnaching (if
, else
, then
) can only be used within word definitions and function as follows:
"text" inline
"num" inline
: mySubroutine ( n -- )
dup 10 > if
. "is greater than 25" str.print-buf
exit
then
dup num.positive? if
. "is greater than 0" str.print-buf
else
. "is less than or equal to zero" str.print-buf
then
;
50 mySubroutine ( Output: 50 is greater than 25 )
In the very contrived branching above we put 50 on the stack. We then compare 50 and 25 via the >
word, which will -1
on the stack if 50 is greater than 25 or 0
if not. if
will branch based on the value on TOS. In this case it is truthy (-1
, or any value other than 0
) so it enters the first branch and prints out 50 is greater than 25
. Nesting can occur by adding a new conditional inside an if
or else
. Remember to use dup
to duplicate the value on the top of the stack if you will want to use it beyond the conditional.
The branching in nimf is currently a little funky in its implementation. Deep nesting can often have unexpected results. This is actively being worked on. Guard clauses, such as the one above (the first if
where an exit
is used inside), are encouraged as a way to reduce code complexity. Not all situations will allow for using exit
, which leaves the current word immediately (similar to a return in a C based language, except that it doesnt return anything since the stack is a persistent structure). Using guards and exit avoids most of the current pitfalls with branching.
The std
library can be inlined to provide a number of conditional logic constructs including and
, or
, 0=
, !=
, =
, >=
, <=
, 0<
, etc. The num
module also contains some useful items. In the above example num.positive?
was used to see if a number was greater than zero. You can, of course, just use dup 0 >
instead of num.positive?
but some words that use mostly symbols can be hard to remember and a clearly named word like num.positive?
can improve code readability should you need to come back to it at a later time.
Branching utilizes the return stack, so be careful when using the return stack inside of an if
or else
block. Anything you put on the return stack should be taken off before the conditional segment you are in ends (so before else
or then
if you are in the truthy segment and before then
if in the falsy segment). Care should also be taken when using the return stack in nested if
blocks.
Loops
Nimf currently only supports one type of loop. do [...] loop
. do
marks the beginning of a loop. The code within a do will always be run at least once, unless it is surrounded by an if [...] then
construct. The loop
keyword eats TOS and if the value is truthy will return to do
, otherwise the loop will end and execution will continue outside of the do [...] loop
construct. Like branching, loops can only be used within subroutines:
"std" inline
: to100 ( n -- )
dup 100 <=
if
do
dup .
++
dup 100 <=
loop
then
drop
;
1 to100 ( 1 2 3 4 5 6 7 8 [...] )
In the above example we inline the std
lib (to gain access to .
, ++
, and <=
) and create a subroutine to100
. The subroutine first checks that TOS is less than or equal to 100, if not it just drops TOS and ends. If so it enters a loop, duplicates and outputs TOS, increments TOS, duplicates TOS and checks to see if it is still less than or equal to 100... if so it loops, if not it leaves the loop. It then drops top of stack.
This is a fairly basic example and shows simple looping and conditionals. More complex loops may require the use of counters or other stored information. For an example that uses variables look in lib/std.nh
for prints
, which we used above for string printing (it actually needs some work to make sure previous variable states dont get overwritten. That improvement is coming soon TM)
Similar to branching, take care when using the return stack inside a loop. Anything you put on the return stack after do
should be taken off before reaching loop
.
Errors
Throwing an error can be done as follows:
: errorTest ( -- )
1 2 +
"Random error" error
5 *
;
Running errorTest
above will result in 3 being added to TOS and an error message being thrown stating Error: Random error
, code execution will stop there. If you are running in interactive mode, the stack will be cleared and all operation flags reset. If you are running from a file, execution will cease and your program will exit with a non-0 exit code.
It is also possible to exit a program early without an error message via the halt
word. halt
eats top of stack and exits the programing setting the value it received from top of stack as the exit code for the program.
The Interpreter
Syntax: nimf [options] [filepath]
Nimf can be run with our without a file as input. When a file is provided as input the interpreter will run the contents of the file without command prompts or interactivity beyond what was coded in the file and will exit when the file has completed (or an error will be displayed).
Running nimf
without any filepath will launch nimf in interactive mode. The user will be presented a repl and will be able to input code and see results in real time.
Nimf works fine with shebang lines (ex. #! /usr/bin/env nimf
) and can thus run executable nimf scripts directly.
Runtime Options
When invoking nimf in either interactive or file mode the following command line options are available:
-memory [int]
The number of memory cells to run nimf with (default: 250000, min: 34999)-stack-depth [int]
The depth of the two stacks (data and return, default: 250, min: 1)-run [string]
Pass in a string of commands to run as a one liner, similar to python's -c flag-h
Print command help and exit-v
Print the version number and exit-limit-io
Run the interpreter in a mode that does not allow filesystem access-install-mod [string]
Install a module to the local lib from a path or url; supports http(s), gemini, gopher, and local files. Note that this should be a single file, not a repo or directory
Examples:
nimf -memory 50000 -stack-depth 335 ./my-file.nf
nimf -run '1 2 + 3 4 * 5 .s'
Files
By convention nimf files that are meant to be executed have the filetype .nf
(nimf file). If a file is meant to be inlined into another file and only contains variable declarations/allotment and word definitions it should end in .nh
(nimf header).
Modules
When using the inline
builtin you can often just use the name of the file you are wanting to inline. Nimf will search for the file in the following order:
- Your local directory:
./[filename]
- A lib directory in the local folder:
./lib/[filename]
- The system lib folder:
/usr/local/lib/nimf/[filename]
If the filename you are inlining does not have a suffix, nimf will look for [filename].nh
. So when running:
"std" inline
The interpreter is looking for std.nh
in each of the three locations and loading the first match it finds. If a module is requested via inline
and nimf has already loaded it, the inline
command will be ignored (no extra searching or processing will be performed).
Examples
At present the repo does come with one example file. You can run it, from the nimf directory, like so:
make
./nimf ./examples/ascii.nf
You should see a nicely formatted table of the printable ascii characters appear on your screen.
Additionally, the module gopher
can be inlined in interactive mode to provide a minimalistic but usable gopher client interface:
"gopher" inline
gopher.visit ( will query for host and path )
( ... Prints the text file, parsed gopher map, or an error )
5 gopher.follow ( will follow link #5 )
( ... Prints the text file, parsed gopher map, or an error )
gopher.back ( will return to, and print, the previous page )
( ... )
6 gopher.url? ( will print the address that link 6 would take you to )
The gopher client uses the minimal TCP api available to nimf, more about which will be written in a future version of this document.
Syntax highlighting
If you are a vim user, a syntax plugin for nimf is available here. It includes basic indentation rules as well as syntax highlighting for various structures.
If anyone wants to make an emacs or nano syntax that would be awesome. My text editor hermes (based on Kilo, by antirez) can be easily set up to highlight nimf syntax as well.