Berkeley Pascal PX Implementation Notes

2. Operations

2.1. Naming conventions and operation summary


.PP
As discussed in section 1.10,
the main interpreter loop decodes the first word of the interpreter
instruction,
using the first byte as an operation code,
and places the second byte,
the ``subop'',
in register 3.
The subop may be used to index the display,
as a small constant,
or to indicate one of several relational operators.
In the cases where a constant is needed, but it
is not small enough to fit in the byte sub-operator,
a zero is placed there and the constant follows in the next word.
Zero is easily tested for,
as the instruction which places the
subop in r3 sets the condition code flags,
and this condition code is still available after the transfer
to an operation code sequence.
A construction like
.DS
.mD
_OPER:
	\fBbne\fR	1f
	\fBmov\fR	(lc)+,r3
1:
	...
.DE
.IP
is all that is needed to effect this packing of data.
This technique saves a great deal of space in the Pascal
.I obj
object code.
.PP
Table 2.1 gives the codes used in the instruction descriptions
to indicate the kind of inline data expected by each instruction.
.KF
.TS
box center;
c s
l | l
ci | aw(3.25i).
Table 2.1 \- Inline data type codes
_
Code	Description
=
a	T{
.fi
An address offset is given in the word
following the instruction.
T}
_
l	T{
An index into the display, ready as an offset or a guaranteeably small integer,
is given in the sub-operation code.
T}
_
r	T{
A relational operator encoded as described
in section 2.3 is given in the subop.
T}
_
s	T{
A small integer is
placed in the subop, or in the next word
if it is zero or too large.
T}
_
v	T{
Variable length inline data.
T}
_
w	T{
A word value in the following word.
T}
_
"	T{
An inline constant string.
T}
.TE
.KE
.PP
Before giving a list of the machine opcodes,
it is useful to note the naming conventions in the interpreter for typed
operations.
Machine instructions which have numeric operands use a simple and uniform
naming convention in which a suffix on the root operation name indicates
the type of operands expected.
These are given in Table 2.2.
Here the expression ``a above b'' means that `a' is on top of the
stack with `b' below it.
Short integers are 2 byte integers,
and long integers are 4 byte integers.
.TS
box center;
c s s
c s s
l l l
c ap-2 a.
Table 2.2 \- Operator Suffices
.sp
Unary operator suffices
.sp .1i
Suffix	Example	Argument type
2	NEG2	Short integer
4	SQR4	Long integer
8	ABS8	Real
.sp
.T&
c s s
l l l
c ap-2 a.
Binary operator suffices
.sp .1i
Suffix	Example	Argument type
2	ADD2	Two short integers
24	MUL24	Short above long integer
42	REL42	Long above short integer
4	DIV4	Two long integers
28	DVD28	Short integer above real
48	REL48	Long integer above real
82	SUB82	Real above short integer
84	MUL84	Real above long integer
.sp
.T&
c s s
l l l
c ap-2 a.
Other Suffices
.sp .1i
Suffix	Example	Argument types
T	ADDT	Sets
G	RELG	Strings
.TE
.PP
We now give the list of machine operations with a reference
to the appropriate sections and a short description of each.
The character `*' at the end of a name indicates that all
operations with the root prefix before the `*' are summarized by
the one entry.
.br
.ne 15
.TS H
box center;
c s s
lw(14) | lw(12) | lw(40)
lp-2 | a | l.
Table 2.3 \- Machine operations
_
Mnemonic	Reference	Description
=
.TH
.so fig2.1.n
.TE
.bp

2.2. Basic control operations

ABORT


.IP
This operator is used to halt execution immediately with an IOT process
fault.
It is used only for debugging
.I px
and is never generated by the translator
.I pi.

HALT


.IP
Corresponds to the Pascal procedure
.I halt ;
causes execution to terminate with a post-mortem backtrace as if a run-time
error had occurred.

BEG w1,w2,"


.IP
Causes the second part of the block mark to be created, and
.I w1
bytes of local variable space to be allocated and cleared to zero.
Stack overflow is detected here.
.I W2
is the first line of the body of this section for error traceback,
and he inline string (length 8) the character representation of its name.

NODUMP w


.IP
Equivalent to
.SM BEG ,
and used to begin the main program when the ``p''
option is disabled so that the post-mortem backtrace will be inhibited.

END


.IP
Complementary to the operators
.SM CALL
and
.SM BEG ,
exits the current block, calling the procedure
.I blkexit
to flush buffers for and release any local files.
Restores the environment of the caller from the block mark.
If this is the end for the main program, all files are
.I flushed,
the profile data file is written if necessary, and
the routine
.I psexit
which prints the statistics if desired (and does not return) is called.

CALL l,a


.IP
Saves the current line number, return address, and active display entry pointer
.I dp
in the first part of the block mark, then transfers to the entry point
given by the relative address
.I a ,
which is the beginning of a
.B procedure
or
.B function
at level
.I l.

PUSH s


.IP
Clears
.I s
bytes on the stack for, e.g., the return value of a
.B function
just before calling the function.

POP s


.IP
Pop
.I s
bytes off the stack.
Used, e.g., after a
.B function
or
.B procedure
returns to remove the arguments from the stack.

TRA a


.IP
Transfer control to relative address
.I a
as a local
.B goto
or part of a structured statement.

LINO s


.IP
Set current line number to
.I s.
For consistency, check that the expression stack is empty
as it should be (as this is the start of a statement.)
This consistency check will fail only if there is a bug in the
interpreter or the interpreter code has somehow been damaged.
Increment the statement count and if it exceeds the statement limit,
generate a fault.

GOTO l,a


.IP
Transfer conrol to address
.I a
which is in the block at level
.I l
of the display.
This is a non-local
.B goto.
Causes each block to be exited as if with
.SM END ,
flushing and freeing files with
.I blkexit,
until the current display entry is at level
.I l.

SDUP


.IP
Duplicate the one word integer on the top of
the stack.
This is used mostly for constructing sets.
See section 2.11.

2.3. If and relational operators

IF a


.IP
The interpreter conditional transfers all take place using this operator
which examines the Boolean value on the top of the stack.
If the value is
.I true ,
the subsequent code is executed,
otherwise control transfers to the specified address.

REL* r


.IP
These take two arguments on the stack,
and the sub-operation code indicates which relational operation is to
be performed, coded as follows with `a' above `b' on the stack:
.DS
.mD
.TS
lb lb
c a.
Code	Operation
_
0	a = b
2	a <> b
4	a < b
6	a > b
8	a <= b
10	a >= b
.TE
.DE
.IP
Each operation does a number of tests to set the condition code
appropriately and then does an indexed branch based on the
sub-operation code to a test of the condition here specified,
pushing a Boolean value on the stack.
.IP
Consider the statement fragment:
.DS
.mD
\*bif\fR a = b \*bthen\fR
.DE
.IP
If
.I a
and
.I b
are integers this generates the following code:
.DS
.TS
lp-2w(8) l.
RV4	\fIa\fR
RV4	\fIb\fR
REL4	\&=
IF	\fIElse part offset\fR
.sp
.T&
c s.
\fI\&... Then part code ...\fR
.TE
.DE

2.4. Boolean operators


.IP
The Boolean operators
.SM AND ,
.SM OR ,
and
.SM NOT
manipulate values on the top of the stack.
All Boolean values are kept in single bytes in memory,
or in single words on the stack.
Zero represents a Boolean \fIfalse\fP, and one a Boolean \fItrue\fP.

2.5. Rvalue, constant, and assignment operators

RV* l,a


.IP
The rvalue operators load values on the stack.
They take a block number as a subop and load the appropriate
number of bytes from that block at the offset specified
in the following word onto the stack. As an example, consider
.SM RV4 :
.DS
.mD
_RV4:
	\fBmov\fR	_display(r3),r0
	\fBadd\fR	(lc)+,r0
	\fBsub\fR	$4,sp
	\fBmov\fR	sp,r2
	\fBmov\fR	(r0)+,(r2)+
	\fBmov\fR	(r0)+,(r2)+
	\fBreturn\fR
.DE
.IP
Here the interpreter first generates the source address in r0 by adding the
display entry to the offset in the next instruction word.
It then reserves a long integer space on the stack (4 bytes)
and moves the data from the source onto the stack.
The pseudo-operation ``return''
takes the interpreter back to the main interpreter loop.
Note that the sub-operation code is already in
r3 and multiplied by 2 to be immediately usable as a word index
into the display.

CON* r


.IP
The constant operators load a value onto the stack from inline code.
Small integer values are condensed and loaded by the
.SM CON1
operator, which is given by
.DS
.mD
_CON1:
	\fBmov\fR     r3,-(sp)
	\fBreturn\fR
.DE
.IP
Here note that little work was required as the required constant
had already been placed in register 3.
For longer constants, more work is required;
the operator
.SM CON
takes a length specification in the subop and can be used to load
strings and other variable length data onto the stack.

AS*


.IP
The assignment operators are similar to arithmetic and relational operators
in that they take two operands, both in the stack,
but the lengths given for them indicate
first the length of the value on the stack and then the length
of the target in memory.
The target address in memory is under the value to be stored.
Thus the statement
.DS
i := 1
.DE
.IP
where
.I i
is a full-length, 4 byte, integer,
will generate the code sequence
.DS
.TS
lp-2w(8) l.
LV	\fIi\fP
CON1	1
AS24
.TE
.DE
.IP
Here
.SM LV
will load the address of
.I i,
which is actually given as a block number in the subop and an
offest in the following word,
onto the stack, occupying a single word.
.SM CON1 ,
which is a single word instruction,
then loads the constant 1,
which is in its subop,
onto the stack.
Since there are not one byte constants on the stack,
this becomes a 2 byte, single word integer.
The interpreter then assigns a length 2 integer to a length 4 integer using
.SM AS24 \&.
The code sequence for
.SM AS24
is given by:
.DS
.mD
_AS24:
	\fBmov\fR	(sp)+,r1
	\fBsxt\fR	r0
	\fBmov\fR	(sp)+,r2
	\fBmov\fR	r0,(r2)+
	\fBmov\fR	r1,(r2)
	\fBreturn\fR
.DE
.IP
Thus the interpreter gets the single word off the stack,
extends it to be a 4 byte integer in two registers,
gets the target address off the stack,
and finally stores the parts of the value in the target.
This is a typical use of the constant and assignment operators.

2.6. Addressing operations

LV l,w


.IP
The most common operation performed by the interpreter
is the ``lvalue'' or ``address of'' operation.
It is given by:
.DS
.mD
_LV:
	\fBmov\fR	_display(r3),r0
	\fBadd\fR	(lc)+,r0
	\fBmov\fR	r0,-(sp)
	\fBreturn
.DE
.IP
It calculates an address in the block specified in the subop
by adding the associated display entry to the
offset which appears in the following word.

OFF s


.IP
The offset operator is used in field names.
Thus to get the address of
.LS
p^.f1
.LE
.IP
.I pi
would generate the sequence
.DS
.mD
.TS
lp-2w(8) l.
RV	\fIp\fP
OFF	\fIf1\fP
.TE
.DE
.IP
where the
.SM RV
loads the value of
.I p,
given its block in the subop and offset in the following word,
and the interpreter then adds the offset of the field
.I f1
in its record to get the correct address.
.SM OFF
takes its argument in the subop if it is small enough.

NIL


.IP
The example above is incomplete, lacking a check for a
.B nil
pointer.
The code generated would, in fact, be
.DS
.TS
lp-2w(8) l.
RV	\fIp\fP
NIL
OFF	\fIf1\fP
.TE
.DE
.IP
where the
.SM NIL
operation checks for a
.I nil
pointer and generates the appropriate runtime error if it is.

INX* s,w,w


.IP
The operators
.SM INX2
and
.SM INX4
perform subscripting.
For example, the statement
.DS
a[i] := 2.0
.DE
.IP
with
.I i
a short integer, such as a subrange ``1..1000'',
and
.I a
an
``array [1..1000] of real''
would generate
.DS
.TS
lp-2w(8) l.
LV	\fIa\fP
RV2	\fIi\fP
INX2	8,1,999
CON8	2.0
AS8
.TE
.DE
.IP
Here the
.SM LV
operation takes the address of
.I a
and places it on the stack.
The value of
.I i
is then placed on top of this on the stack.
We then perform an indexing of the array address by the
length 2 index (a length 4 index would use
.SM INX4 )
where the individual elements have a size of 8 bytes.
The code for 
.SM INX2
is:
.DS
.mD
_INX2:
	\fBtst\fR	r3
	\fBbne\fR	1f
	\fBmov\fR	(lc)+,r3
1:
	\fBmov\fR	(sp)+,r1
	\fBsub\fR	(lc)+,r1
	\fBbmi\fR	1f
	\fBcmp\fR	r1,(lc)+
	\fBbgt\fR	1f
	\fBmul\fR	r3,r1
	\fBadd\fR	r1,(sp)
	\fBreturn
1:
	\fBerror\fR	ESUBSCR
.DE
.IP
Here the index operation subtracts the constant value 1 from the
supplied subscript,
this being the low bound of the range of permissible subscripts.
If the result is negative,
or if the normalized subscript then exceeds 999, which
is the maximum permissible subscript if the first is numbered 0,
the interpreter generates a subscript error.
Otherwise, the interpreter multiplies the offset by 8 and adds it to the address
which is already on the stack for
.I a ,
to address ``a[i]''.
Multi-dimension subscripts are translated as a sequence of single subscriptings.

IND*


.IP
For indirect references through
.B var
parameters and pointers,
the interpreter has a set of indirection operators which convert a pointer
on the stack into a value on the stack from that address.
different
.SM IND
operators are necessary because of the possibility of different
length operands.

2.7. Arithmetic operators


.IP
The interpreter has a large number of arithmetic operators.
All operators produce results long enough to prevent overflow
unless the bounds of the base type are exceeded.
No overflow checking is done on arithmetic, but divide by zero
and mod by zero are detected.

2.8. Range checking


.IP
The interpreter has a number of range checking operators.
The important distinction among these operators is between values whose
legal range begins at 0 and those which do not begin at 0, i.e. with
a subrange variable whose values range from 45 to 70.
For those which begin at 0, a simpler ``logical'' comparison against
the upper bound suffices.
For others, both the low and upper bounds must be checked independently,
requiring two comparisons.

2.9. Case operators


.IP
The interpreter includes three operators for
.B case
statements which are used depending on the width of the 
.B case
label type.
For each width, the structure of the case data is the same, and
is represented in the following figure.
.sp 1
.KF
.so fig2.2.n
.KE
.sp 1
.IP
The
.SM CASEOP
case statement operators do a sequential search through the
case label values.
If they find the label value, they take the corresponding entry
from the transfer table and cause the interpreter to branch to the
indicated statement.
If the specified label is not found, an error results.
.IP
The
.SM CASE
operators take the number of cases as a subop
if possible.
Three different operators are needed to handle single byte,
word, and double word case transfer table values.
For example, the
.SM CASEOP1
operator has the following code sequence:
.DS
.mD
_CASEOP1:
	\fBbne\fR	1f
	\fBmov\fR	(lc)+,r3
1:
	\fBmov\fR	lc,r0
	\fBadd\fR	r3,r0
	\fBadd\fR	r3,r0
	\fBmov\fR	r3,r2
	\fBtst\fR	(sp)+
1:
	\fBcmpb\fR	(r0)+,-2(sp)
	\fBbeq\fR	5f
	\fBsob\fR	r3,1b
	\fBerror\fR	ECASE
5:
	\fBsub\fR	r3,r2
	\fBadd\fR	r2,r2
	\fBadd\fR	lc,r2
	\fBadd\fR	(r2),lc
	\fBreturn
.DE
.IP
Here the interpreter first computes the address of the beginning
of the case label value area by adding twice the number of case label
values to the address of the transfer table, since the transfer
table entries are full word, 2 byte, address offsets.
It then searches through the label values, and generates an ECASE
error if the label is not found.
If the label is found, we calculate the index of the entry in
the transfer table which is desired and then add that offset
to the interpreter location counter.

2.10. Operations supporting pxp


.IP
For the purpose of execution profiling the following operations
are defined.

PXPBUF w


.IP
Causes the interpreter to allocate a count buffer
with
.I w
counters, each of which is a 4 byte integer,
and to clear the counters to 0.
The count buffer is placed within an image of the
.I pmon.out
file as described in the
.I "PXP Implementation Notes."
The contents of this buffer will be written to the file
.I pmon.out
when the program terminates.

COUNT s


.IP
Increments the counter specified by
.I s.

TRACNT w,a


.IP
Used at the entry point to procedures and functions,
combining a transfer to the entry point of the block with
an incrementing of its entry count.

2.11. Set operations


.IP
The set operations 
set union
.SM ADDT,
intersection
.SM MULT,
and the set relationals
.SM RELT
are straightforward.
The following operations are more interesting.

CARD s


.IP
Takes the cardinality of a set of size
.I s
bytes on top of the stack, leaving a 2 byte integer count.
.SM CARD
uses a table of 4-bit population counts to count set bits
in each 4-bit nibble of each byte in the set.

CTTOT s,w,w


.IP
Constructs a set.
This operation requires a non-trivial amount of work,
checking bounds and setting individual bits or ranges of bits.
This operation sequence is very slow,
and motivates the presence of the operator
.SM INCT
below.
The arguments to
.SM CTTOT
include the number of elements
.I s
in the constructed set,
the lower and upper bounds of the set,
the two
.I w
values,
and a pair of values on the stack for each range in the set, single
elements in constructed sets being duplicated with
.SM SDUP
to form degenerate ranges.

IN s,w,w


.IP
The operator
.B in
for sets.
The value
.I s
specifies the size of the set,
the two
.I w
values the lower and upper bounds of the set.
The value on the stack is checked to be in the set on the stack,
and a Boolean value of
.I true
or
.I false
replaces the operands.

INCT


.IP
The operator
.B in
on a constructed set without constructing it.
The left operand of
.B in
is on top of the stack followed by the number of pairs in the
constructed set,
and then the pairs themselves, all as single word integers.
Pairs designate runs of values and single values are represented by
a degenerate pair with both value equal.
A typical situation for this operator to be generated is
.LS
\fBif\fR ch \fBin\fR ['+', '-', '*', '/']
.LE
.IP
or
.LS
\fBif\fR ch \fBin\fR ['a'..'z', '$', '_']
.LE
.IP
These situations are very common in Pascal, and
.SM INCT
makes them run much faster in the interpreter,
as if they were written as an efficient series of
.B if
statements.

2.12. Miscellaneous


.IP
Other miscellaneous operators which are present in the interpreter
are
.SM ASRT
which causes termination if the Boolean value on the stack is not
.I true,
and
.SM STOI ,
.SM STOD ,
.SM ITOD ,
and
.SM ITOS
which convert between different length arithmetic operands for
use in aligning the arguments in
.B procedure
and
.B function
calls, and with some untyped built-ins, such as
.SM SIN
and
.SM COS \&.
.IP
Finally, if the program is run with the run-time testing disabled, there
are special operators for
.B for
statements
and special indexing operators for arrays
which have individual element size which is a power of 2.
The code can run significantly faster using these operators.

2.13. Functions and procedures


.IP
.I Px
has a large number of built-in procedures and functions.
The mathematical functions are taken from the standard
system library.
The linear congruential random number generator is described in
the
.I "Berkeley Pascal User Manual"
.IP
The procedures
.I linelimit
and
.I dispose
are included here but currently ignored.
One surprise is that the built-ins
.I pack
and
.I unpack
are here and quite complex, functioning as a memory to memory
move with a number of semantic checks.
They do no ``unpacking'' or ``packing'' in the true sense, however,
as the interpreter supports no packed data types.