A new language based on MINT
I looked at implementing MINT (Minimum INTerpreter) for the 6502 processor. Although a good concept, some of the implementation decisions seemed arbitrary to me. For instance the arithmetic operator / is used for many other purposes, which may lead to ambiguity. To resolve these ambiguities white space becomes important. In response to my concerns about some aspects of MINT and my suggested “fixes”, it was suggested I was designing a different language.
So .. why not.
This paper outlines a different language based on MINT. The intent is to have 100% compatibility with MINT so a MINT program can be automatically translated to this (as yet unnamed) language. There are some suggested extensions to MINT, which may be bloat or may be useful. This will be borne out in implementation. But it doesn’t hurt to dream.
This paper assumes some knowledge of MINT, my reference was found
at
https://github.com/orgMINT/MINT
Numbers can be entered as decimal and are limited to 0 to 32,767 and are assumed positive. A negative number can be created by using the suffix operator ! (negate). This avoids ambiguity with the operator minus (-).
Alternately, a number can be entered as up to 4 hexadecimal digits (0..9,A..F) preceded by a hash sign (#).
Text is stored as an array of 8 bit values. A text constant can be input enclosed in double quotes – “here is some text”. It will be stored as an array of 8-bit values and referenced by the array address.
Just like MINT, there are 26 named data elements referenced by the lower case English character set (a...z) and 26 named functions referenced by the upper case English character set (A...Z).
For the 6502 implementation, a data element is 16 bits. The use of the element is arbitrary, but the language supports the following:
An integer.
For arithmetic purposes this is interpreted
as a signed value -32,768 to 32,767
For boolean purposes this
is interpreted as 0 = FALSE, other value = TRUE
A memory address
For data, the address of an array
For
execution, the address of a function
The named functions are also 16 bits. The only legal values are 0 – unassigned or another value the address of a function.
An array can either be an array of 8 or 16 bit values.
8 bit values are normally used to store text, each element being one character
16 bit values have the same supported usage as named data elements
A 16 bit array is defined using square brackets [...], an 8 bit array is preceded by backslash \[…] or created as a text string.
An array is fixed size. It can be populated on creation, or filled later. If it is populated on creation, the size is assumed to be the number of elements e.g. [3,4,5] has 3 defined elements so created as a 3 element array. Otherwise, the size of the array can be assigned e.g. [1,2:16] creates an array of 16 elements of which only the first two entries are populated.
Arrays should be created once. If an array creation occurs in repeated code, every repetition causes the creation of a new array, consuming memory.
The language is stack based, with the stack being a series of 16 bit elements. It is assumed that all operations use the stack, although this is not a necessary condition. Whenever a data element or function is referenced, either its contents or its address is placed on the stack. Similarly, stack elements can be assigned to (moved to) a data element or function.
Whenever a code block is invoked, the procedure stack has the return address for the calling procedure.
The language is designed so white space is normally ignored. Some MINT elements have been reassigned to other purposes because of this. White space is any non printing character (e.g. TAB, CR, LF etc) and space.
A comment is introduced by semicolon (;) and terminated either by another semicolon or one of the line ending characters CR or LF. This allows constructs such as:
; an example of interspersing comments ; if ; a > b ; then ; (b : a)
A block of code is enclosed in brackets (). Code blocks have many purposes:
Defining a named function: e.g. P(<function code>) assigns the function to the letter P. This requires P to not be previously assigned. Subsequent mention of P invoke the function regardless of if it is succeeded by a bracket. If it is desired to unassign P (why?) this can be achieved by 0:P
As conditional code to be executed depending on the value at the top of stack. E.g. 4(a1+:a) will add 1 to the data element ‘a’ and store it back in ‘a’ 4 times.
In an array as an unnamed function. For instance: [(a>b(b:a))14] creates an array of 2 elements, the first one being a function (including a conditional code block) and the second the value 14
As an ELSE code – <value>(<code block 1>)(<code block 2>) If value is zero, code block 1 will not be invoked, code block 2 will be invoked.
A program normally consists of a series of function names and code, followed by the code to be executed when the program is invoked. Execution starts when the following condition is met – (1) there is some code other than functions to execute (2) the count of opening brackets ( minus the count of closing brackets ) is zero (3) the use types <ENTER> (CR or hex #0D).
This allows single line programs such as:
M(‘The number is ‘.)42M (prints “The number is 42”)
But if the invoked code (42M in the example) is to be spread over multiple lines, then it is enclosed in brackets so the first ENTER doesn’t trigger execution.
A comma acts as a separator where there could be ambiguity. For
example, because white space is ignored, specifying 1 2 3 would
result in the value 123 being placed top of stack. To place three
stack items, they are specified as 1,2,3. White space can be added
for readability 1, 2, 3. If desirable, a comma can be added elsewhere
to aid readability, even if not required. For instance
1,2!3 is
accepted by the parser but is more readable as 1, 2!, 3
The assignment operator is colon (:), equivalent of MINT exclamation (!). Unlike MINT it is usually a prefix operator. This change is to ease the task of interpretation. In MINT, ab! means ‘assign the value of a to b’ but the meaning of b is not clear until the ! is encountered. In this language, the interpreter encounters the : and is ‘forewarned’ that the subsequent item is an address. Its operation is to place the value on the top of the data stack (at the time of encountering the :) at a memory location specified by the subsequent address. Note the use of the operator when wishing to store items in an array (see below).
It can also be used within an array declaration to define the size of the array. Otherwise the array size defaults to the contents when declared.
In looping code, it can be used to assign a loop counter.
Operations are performed on two data stack elements
- subtraction: subtract
the top element from the second element and drop the top element
/
division: divide the top
element into the second element leaving the quotient as the top
element and the remainder as the second element
+
addition: add the top
element to the second element and drop the top element
*
multiplication: multiply
the second element by the top element and drop the top element
{
shift left: multiply the
top element by 2
}
shift right: divide the
top element by 2
!
complement the top
element (x! is the equivalent of 0x-)
The two data stack elements are each treated as an array of 16 bits. If both elements are boolean values, these operators result in a correct boolean result.
& 16-bit bitwise AND of the two element, the top element is
dropped
| 16-bit bitwise OR of the two element, the top element
is dropped
^ 16-bit bitwise XOR of the two element, the top
element is dropped
Boolean operators leave either a 0 (false) or 1 (true) on the stack.
> 16-bit comparison GT: Top of stack is compared with the
second element, the top element is dropped and the second element
replaced by the result of IF(top>second)
< 16-bit
comparison LT: Top of stack is compared with the second element, the
top element is dropped and the second element replaced by the result
of IF(top<second)
= 16 bit comparison EQ: Top of stack is
compared with the second element, the top element is dropped and the
second element replaced by the result of IF(top=second)
~ 1-bit
NOT: The least significant bit of the top of stack element is
flipped. If a 16-bit NOT is required, this can be achieved by 1!^
Although this language is stack oriented, specific stack operations are unlikely to be common. They are therefore two character symbols, the first being $ and the second being the second letter of the operation
$r drop the top member of the stack DrOP
$u duplicate the
top member of the stack DuP
$v OvER - take the 2nd member of
the stack and copy to top of the stack
$w swap the top 2
members of the stack SwAP
$e stack depth [presumably placing the
number of items on the stack before the operation, on the stack **
NOTE ** - what is this used for??]
. print the number on the stack as a decimal
#. print the
number on the stack as a hexadecimal
[. print the contents of
the 8-bit array whose address is top of stack as if it were a string
of ASCII characters.
` print the literal string between ` and
`
@ The top of stack is assumed to be the address of a piece of code to be executed. @ causes the current address to be placed on the procedure stack and transfers control to the specified address. For instance to display day of week assuming 0=Monday when the day number is in variable d, an array can be declared early:
[('Mon')('Tue')('Wed')('Thu')('Fri')('Sat')('Sun')]:w ; w is an array of days
Then the day can be displayed by
'Today is 'd?w@
Note: In this case, procedures were used to print the day name for illustration. This could also be achieved using text strings:
["Mon","Tue","Wed","Thu","Fri","Sat","Sun"]:w ; w is an array of days : : 'Today is 'd?w[.
When declaring the text strings it is prudent to separate each string with a comma. At this point in the draft it isn’t specified that two successive quotes collapse to form a single quote instead of terminating the string. But that might happen.
? Indicates the top of stack is an array index. It will be followed by another index or an array identifier. It is used to locate the address of a data item. Indexes are evaluated in reverse order. Consider the array matrix [[1,2,3][4,5,6][7,8,9]]:y where each element of the outer array is another array. The value 6 is the third item in the second nested array of the named array ‘y’. To retrieve that value, it is specified as 2?1?y (indexes start at zero for the first entry). The indexes may be formula, for instance if x has the value 4 then the item could be specified as x2-?x3-?y. Note that if data is to be assigned to the element, the assignment operator : immediately precedes the array identifier. To change the value 6 in the previous example to 12 the operation is 12,2?1?:y
Looping operates in the same way as MINT – the number of times a code block is executed is determined by the value on the stack immediately before encountering the code block. Unlike MINT, there is no predefined variable containing the count of loops performed. Instead, the variable, if needed, is defined as the first item in the code block and is preceded by a colon.
e.g. 12(:jj.) will print the numbers 1 to 12
This has advantages and disadvantages compared to MINT – the advantage is no overhead if the loop count is not needed, and loops can be nested to any depth (within the constraints of the system) and still be provided with loop counters. The disadvantage is there is a limited supply of variables. This disadvantage can be overcome utilising the stack e.g.
ijk; stacks the variables ijk 4(:i3(:j2(:k ;stuff to do 24 times e.g.; ijk**.))) ;return variables; :k:j:i
Some MINT operations are not yet considered. These are conceptual thoughts:
Operations like read a character, output a character, manipulating ports could be assigned to backslash (\) operations as it is utilised above only to introduce a byte array. The need for a “byte mode” seems unnecessary as operations on arrays need to store more information than the actual array data – one being the array size. Another could be the array type and there are only two. Maybe there are other considerations that will crop up as the project advances.
Not considered in detail how this would work.
There’s quite a few prefixed by forward slash (/). It is proposed that those that are required they be prefixed by percent (%) as this is not used elsewhere.