The 12864 Microprocessor
March 17, 1991
This is a daydream...let me call it the 12864....
Let me start by saying I am prejudiced in favor of the 6809 microprocessor,
created by Motorola. That it was the best of its day was confirmed when NASA
decided to use it in the Space Shuttle's main computers. I personally feel that
the 6809 should have had wider acceptance in the personal-computer market, and
that Motorola snubbed its potential by introducing the 68000 too quickly. Only
recently, with the widespread use of the 32-bit microprocessors, has the 6809
really become outclassed. So it is time to move on, time to create a new best
microprocessor of the day. Since this is currently only my own dream, it has
been greatly influenced by what I know of the 6809, and also what I have learned
about the 68020. I do not mean to ignore any worthy contributions from other
microprocessors; that is in fact the main reason for this essay! I am sharing
my dream in hopes that it may be catching....
The 12864 is a 128/64-bit microprocessor. It has 64 address lines, and all
registers are 64 bits wide. But it also has 128 data lines, and this is why:
First, being able to handle this many bits at once means that the 12864 doesn't
need a coprocessor; most coprocessors only handle 80 bits or so. Therefore the
12864 also doesn't need a secondary instruction set telling it how to talk to a
coprocessor. A second reason for having a 128-bit data path leads to further
simplification of the microprocessor: All its instructions have been carefully
designed to fit within 128 bits, so that a single memory-access can provide the
12864 with a whole instruction. To make this still more efficient, the computer
that incorporates a 12864 will be required to have 128-bit-wide memory, and not
the common 8-bit-wide or 9-bit-wide memory of most of today's microcomputers.
This means that the 64-bit Program Counter or PC register is always incremented
just once for each instruction pulled from the memory. The 12864 is not much of
an evolutionary offshoot from previous microprocessors; it's a radical mutation.
Only in the efficiency of its instruction set does it relate to the 6809....
With 128-bit memory, design decisions made in the 6809 and 68020 are greatly
simplified in the 12864. Example: Because the 6809 fetched instructions only 8
bits at a time, there were two distinct groups of Branch instructions: an 8-bit
branch and a 16-bit branch. Machine code that used 8-bit branches as often as
possible was both shorter and faster than code that always used 16-bit branches,
because only one byte of memory and 1 clock-cycle of time was needed for 8-bit
branching-data, while 2 bytes and 2 cycles were needed for 16-bit data. (Not to
mention that 8-bit-branch INSTRUCTIONS were themselves only 8 bits, while most
16-bit branches also had 16-bit opcodes.) And in the 68020 processor, although
there are 8-bit, 16-bit, and 32-bit branch instructions; the latter, 32-bit type
requires an extra fetch of data from the memory. But the 12864 processor needs
only one size of branch instruction, because any 64-bit branch-distance will
always fit into a one-clock-cycle 128-bit opcode+data fetch.
Likewise, because any 64-bit address in the memory can be part of a 128-bit
fetch, there is no longer any need for a special Direct Page or DP register. In
the 6809 the DP register offered an 8-bit way to access part of the memory; thus
the longer and slower 16-bit way of specifying memory locations did not always
have to be used. This is not a problem in the 12864.
( 2 ) The 12864 Microprocessor
Now what about the choice to use 64-bit-addressing? This represents about
18.4 quintillion addresses (18,446,744,073,709,551,616 addresses, to be exact),
far beyond any reasonable projection of any computer's memory needs -- including
virtual memory! Not to mention that since each address holds 128 bits of data,
we are actually talking about 295 quintillion (8-bit) bytes of memory!
Nevertheless, there are some possibly valid reasons for this choice: First,
since the design of this processor is not yet completely fixed, and belongs to
nobody, it might be that it could tickle the fancy of a number of different chip
manufacturers, and lead to a Industry-Wide Standard Design. Naturally, it makes
sense for the 12864 assembly language instruction set to become standardized and
non-proprietary, also. Therefore a second reason for choosing 64-bit addressing
is simply that it would take longer to put this complex chip into production --
and that hopefully gives the software developers plenty of time to convert their
existing software to run on this admittedly incompatible processor. Thus, both
the new computers and their software could arrive at the same time! Finally, a
third reason for jumping straight to 64-bit addressing is that the architecture
of the new computers can be designed with that in mind. Simply because 64 bits
represents such a tremendous enhancement, making it the immediate goal means it
can remain a standard far into the future....
Now let's get into some of the details of the 12864. The total number of
registers of all types will be about 45, give or take a few. This number can
be decided after the Condition-Code/Status Register has had its bits defined.
As stated earlier, every register is 64 bits wide, including CCS. In the CCS
register a number of bits are necessary for various processor functions; just
how many depends on the total list of functions that will be designed. For the
purposes of this essay, let us examine the CCS register of the 68020: It is 16
bits wide, of which 12 are defined and 4 are undefined. If we start with a 64-
bit CCS and only use 12 of them for such things as result-of-instruction flags,
interrupt masks, etc, then that leaves 52 bits that can be equated to the entire
register set of the 12864 microprocessor. However, it is certain that some of
those 52 bits will be dedicated to other processor functions (but I don't know
other dreamers will add to this), and so the number of registers is yet unknown.
In case you are wondering why match the bits of CCS with the register set,
the answer involves the interrupt system. Whenever an interrupt or exception or
other special event occurs, the processor can automatically save on a stack all
the registers that are specified in the CCS register. The processor saves time
because none of those interrupt-type handling routines need include instructions
to specifically save and recover the registers they use. In fact, if the 12864
computer system's main power-up/initialization routine includes defining such a
list of registers in CCS, then all interrupt-type routines can be written using
only those registers. Different boot software, different registers. Note that
2 registers, the Program Counter and CCS, which ALWAYS are saved, do NOT need to
be matched to bits in CCS, and so the 12864 can have 2 more registers than the
simple count of available bits in CCS implies.
( 3 ) The 12864 Microprocessor
The next thing to discuss is the actual list of registers. A major element
in the design of the 12864 is that as far as the programming instruction set is
concerned, all registers are treated equal. But as far as the microcode and the
hardware is concerned, some are more equal than others.... For the sake of this
discussion, let us assume that there are 45 registers, numbered from 0 to 44.
Suppose that Register 33 is the Program Counter, while Register 17 is just an
ordinary general-purpose register. The hardware will always use 33 as a pointer
to the current instruction about to be executed, and the hardware will always
adjust 33 to point it at the appropriate next instruction. But the instruction
set will not distinguish 33 from 17! A Logical-OR instruction that manipulates
a group of bits inside 17 can just as easily manipulate bits inside 33, simply
by specifying 33 instead of 17, in the Logical-Or instruction. Just because
this is something that might be disastrous to the program is no reason to keep
it from being possible! Let the assembly-language programming tool be written
so that it catches such dubious instructions, and warns the programmer! The big
advantage of this scheme is that it leads to an extremely significant reduction
in the total complexity of both the instruction set and the microcode. Examples
later on in this essay may make this more clear.
Let us now examine the bit-format of some of the instructions. By far the
majority of the instructions will have a single format that offers astounding
programming potential (well, what do you expect with 128 bits to play with!)....
Actually, most of this instruction-group format fits into 64 bits, numbered 0 to
63, and defined as follows:
Bits 63-58: These 6 bits hold the actual generic instruction. Of course
this means that there are only 64 such instructions, but if you have any doubts
about this being enough, you don't yet realize how generic they are!
Bits 57-46 are divided into three groups of 4 bits each, hereinafter to be
referred to as 'admode fields', short for 'addressing mode'. Since these fields
have 4 bits, it follows that there are 16 different addressing modes. They will
be explained shortly. The first admode field, bits 57-54, tells the processor
where to find the first chunk of data needed for some instruction, say a SUB.
The second admode field, bits 53-50, tells the 12864 where to find the second
chunk of data; obviously a SUB instruction needs data that can be subtracted.
And the third admode field, bits 49-46, tells the processor where to put the
result of the SUB. Perhaps you now see that with 16 addressing modes for each
admode field, a simple generic SUB instruction can encompass both registers and
memory in quite a few different combinations!
For convenience, let us call the admode fields GET1, GET2, and PUT. A
list of proposed addressing modes follows, and if it is adopted, there will be a
few restrictions on the use of two of them. Should the list be modified during
later design stages of the 12864, these restrictions may still apply. The modes
subject to restriction are marked with * symbols; the limitations are detailed
at the end of the list. The admodes are numbered 0 to 15 in binary.
( 4 ) The 12864 Microprocessor
Direct Modes 0 to 3 Semi-Direct Modes 4 to 7
0000 Register Data+16\9\10\3bit Offset 0100 Reg Address + 16\9bit Offset
0001 Register Data+16\9\10\3bit Adjust 0101 Reg Address + 16\9bit Adjust
0010 GET1=PUTmode, or GET2 or PUT=NONE 0110 Reg Addr+(Reg+10\3bit Adj) Offset
*0011 Immediate 64bit Data *0111 Absolute 64bit Address
Indirect Modes 8 to 15
1000 [Reg + 16\9bit Offset],LSig64bits 1100 [Reg + 16\9bit Offset],MSig64bits
1001 [Reg + 16\9bit Adjust],LSig64bits 1101 [Reg + 16\9bit Adjust],MSig64bits
1010 [Reg+(Reg+10\3bit Adj)Offst],LS64 1110 [Reg+(Reg+10\3bit Adj)Offst],MS64
1011 [Reg],LS64+(Reg+10\3bit Adj)Offst 1111 [Reg],MS64+(Reg+10\3bit Adj)Offst
Register Data: The value in a register is considered to be data.
Reg Address: The value in a register is considered to be an address.
16\9bit, 10\3bit: 16 or 9 or 10 or 3 bits of twos-complement information,
sign-extended to 64 bits (internally) by 12864 processor.
64bit: 64 bits of information fetched with the 64-bit generic instruction.
Adjust: Value in register is modified, using twos-complement information.
If info is negative, register adjusted BEFORE instruction executed.
If info is positive, register adjusted AFTER instruction executed.
Offset: Similar to Adjust, but register not modified. Computation of the
Offset is always performed before instruction is executed.
NONE: No data or address at all.
: Value inside brackets is an address. Information at that address
is in turn used as an address.
,LSig64bits ,LS64: An address holds 128 bits of data, of which the Least
Significant 64 bits are selected for the instruction.
,MSig64bits ,MS64: The Most Significant 64 bits at an address.
(Reg): Distinguishes a second register that this addressing mode uses.
* Recall design decision that limits instructions to 128 bits, including 64
bits of Immediate Data or Absolute Address. It's quite obvious that only
one Admode Field can get to use those 64 bits. It also works out that
if different admodes exist in all three Admode Fields, then none of the
*-admodes may be placed in any Admode Field. And in any instruction that
uses data acquired through the GET2 field, or in which GET1 is different
from PUT...such instructions exclude the Absolute-64bit-Address mode from
the PUT field. (Of course, Immediate Data mode is always excluded from
the PUT field.) More details of these limits will be provided later; for
now it might be noted that the reason that no 64bit-Offset modes exist is
to avoid a lot of trouble. It makes the programmer use more registers
for indexing, but eliminates much competition between the Admode Fields
for the use of the 64 bits that accompany the instruction. Besides, you
might be surprised by how well other instructions can replace any 64-bit
Offset modes! Anyway the 12864 processor will probably have 30 or more
general-purpose registers (registers the hardware doesn't always modify
for specific purposes, like CCS or the Stacks---or have their contents
used for other purposes, like pointing at cache or program data). It may
be easy to find enough available registers for most address-pointing.
( 5 ) The 12864 Microprocessor
Now for some descriptions of the 16 admodes and their consequences:
Direct Modes 0 to 3 all specify data the 12864 processor has on hand, in
a register, or just loaded along-with or as-part-of the instruction. Obviously,
these modes can be executed more quickly than the Semi-Direct or Indirect Modes.
(0) In this admode the data needed by the current instruction is in one
or two of the registers. The 12864 processor has 128 data lines; just because
every register is only 64 bits wide is no reason to limit its ability to process
128 bits. ANY TWO registers may be put together, in any order, to make a place
that holds 128 bits! (Okay, I exaggerated; the 12864 will have both a 'Boss'
mode and a 'Peon' mode. Only the Boss mode can put ANY two registers together;
in the Peon mode a lot of combinations will be illegal. And, even in the Boss
mode, a lot of combinations will be undesirable, like using a Stack pointer with
a Cache-pointing register; the 12864 Assembler would warn the programmer.) Note
that admode 0 merely declares that one or two registers will be used; the actual
register(s) specified are elsewhere among the many bits of this generic format.
After the processor identifies the register(s) holding the data, an offset will
be applied to that data. The offset quantity shall be used by the 12864 in its
implementation of the instruction; the register(s) holding the data will not be
affected by the offset. The maximum size of the offset is affected by how many
registers are used, and by the type of generic instruction being performed; the
details of this will be provided later. The main purpose of admode 0 is to let
us eliminate the LEA (load effective address) instructions from the processor's
list of 64 generics--but certainly other uses will be found for it.
(1) This admode is very much like admode 0. The only real difference is
that the content of the register(s) IS affected by this admode, which makes the
mode useful in counting loops. One thing to keep in mind is that any negative
adjustment is performed before the whole instruction is implemented, while any
positive adjustment is performed after the overall instruction is implemented.
This admode also helps us eliminate LEA instructions (details later).
(2) This is the only admode with a double meaning. If admode 2 is used
within the GET1 field, then it means that the first chunk of data, needed by the
instruction, is currently in the place specified by the PUT field. Thus data at
some location, after manipulation, will return to that location. If we exactly
specify the same admode in both the GET1 and PUT fields (instead of using admode
2 in GET1), we end up being unable to use Immediate Data at all--you'll see! If
admode 2 is used in the GET2 field, then it means NONE, no data for that part of
the instruction. Operations like LSH (logical shift) use admode 2 in GET2; they
need only one main data chunk since any other data is part of the instruction's
definition. In fact, if any admode besides 2 is in GET2 during a LSH or similar
instruction, then the admode should be ignored, or declared illegal. If admode
2 is in GET2 during a SUB or similar instruction, then the net effect of the SUB
will be equivalent to a TST instruction. (With lots of TST-equivalents, there
need not be a specific TST among the 64 generics. But the 12864 Assembler may
include a TST, and translate it into an equivalent.) Finally, admode 2 in the
PUT field also means NONE, no address. The computed result of the SUB or other
manipulation is not put anywhere, and this is useful, too! The definition of a
CMP (compare) is exactly a SUB that doesn't save the result! So the CMP becomes
another common instruction that the 12864 processor excludes from its list of 64
generics.... Like TST, the 12864 Assembler can include CMP, and translate it to
an equivalent: a destinationless SUB. Similarly, the 6809 BIT operation is an
( 6 ) The 12864 Microprocessor
AND instruction with no destination. Designers unite! The 12864 has a full set
of destinationless instructions--and no extra complexity! Moving on, suppose
admode 2 is in both GET1 and PUT: This is basically a no-operation, NOP. Lots
of ways exist to do a NOP; the 12864 Assembler can include NOP, and translate.
(3) This is the Immediate Data admode. Since the instruction is often
64 bits long, while 128 bits are always fetched from memory, admode 3 tells the
instruction to use as data the group of 64 bits fetched with the instruction.
Semi-Direct Modes 4 to 7 all specify that the data the processor has on
hand are memory-addresses of the data needed by the instruction. Admodes 4 to 7
are slower than the Direct Modes because the 12864 has to go fetch the data from
the memory, but this process is still faster than the Indirect Modes.
(4) This admode is like admode 0 in operation. The main difference is
that only one register is ever specified, since one register holds 64 bits and
the memory addressing range is 64 bits. But the offset is figured the same way
as admode 0, and the value in the register is not changed. As mentioned, the
result of the offset computation is a memory address; the data at that location
is fetched for use by the instruction.
(5) This admode combines features of admode 1 and admode 4. Again only
one register is specified as an address-pointer, or index (4). An adjustment of
the value in that register will be applied, pre-decrement or post-increment (1).
If you review admodes 1 and 4, this one should be pretty obvious.
(6) The basic addressing mode for doing 64-bit (or any size larger than
16-bit) offsets is admode 6. One register is specified as a pointer (index) to
the general region of memory; a second register is specified that will hold the
offset from the general place to any specific place. Furthermore, this second
register can be given a predecrement or postincrement adjustment, which makes it
easy to skip through tables of data. Note that although the second register is
adjustable, its value is only an offset; the first register remains unchanged.
(7) This admode specifies that the 64 bits fetched along with the 64-bit
generic instruction is absolute memory address of data the instruction needs.
Indirect Modes 8 to 11 are quite like Indirect Modes 12 to 15: They are
computed the same way, but at some point the data at an address is used as an
address. Now since the data is always 128 bits and addresses are only 64 bits,
which 64 of the 128 do we use? Thus admodes 8-11 use the Least Significant 64
bits of the 128, while admodes 12-15 use the Most Significant 64 bits.
Note that all the admodes that use registers as indexes let the Program
Counter be used as easily as any other register. The 12864 processor needs no
special microcode to provide a host of Program-Counter-Relative admodes, due to
basic design decision making the instruction set handle all registers equally.
The trick to consistency is for the processor to apply any adjustment or offset
to chosen index register AFTER incrementing PC past the current operation This
in turn works due to design choice to make ALL the instructions fit in 128 bits.
( 7 ) The 12864 Microprocessor
Nevertheless, the 12864 Assembler may specifically distinguish Program-Counter-
Relative admodes from the other admodes, and translate appropriately. Finally,
note that it may be undesirable to use the PC register as a data-pointer in any
admode that will adjust the value of the index!
(8) This admode first computes an address in exactly the same way as
admode 4. The 12864 processor then fetches the lowest 64 bits from the memory
at that address, and uses this information as another address. Instruction will
use the data in the memory at the second address.
(9) This admode first computes an address in exactly the same way as
admode 5. Then an address is fetched, and then data, as just described.
(10) This admode first computes an address in exactly the same way as
admode 6. Then an address is fetched, and then data, as just described.
(11) This addressing mode starts by using the value in a register as
an address. The 64 lowest bits in the memory at that address are fetched; they
will be used as a second address. However, before they are used, an offset will
be applied to that second address. A second register is specified, along with
an adjustment. The value in this predecremented/postincremented register is the
64-bit offset that is applied to the second address; the first register's value,
and the memory that held the second address, are not changed by this process.
After computing the new, offset address, the 12864 processor fetches 128 bits of
data from that location in the memory, for the current instruction.
(12) This admode first computes an address in exactly the same way as
admode 4. The 12864 processor then fetches the highest 64 bits from the memory
at that address, and uses this information as another address. Instruction will
use the data in the memory at the second address.
(13) This admode first computes an address in exactly the same way as
admode 5. Then an address is fetched, and then data, as just described.
(14) This admode first computes an address in exactly the same way as
admode 6. Then an address is fetched, and then data, as just described.
(15) This addressing mode starts by using the value in a register as
an address. The 64 highest bits in the memory at that address are fetched; they
will be used as a second address. Then everything proceeds just like admode 11.
Now to show how LEA (load effective address) needn't be included among
the 64 generics. Consider admode 15: At the end of its computations the 12864
processor has an address which it normally uses, right now, to fetch data, after
which the address is not saved. LEA creates that address and saves it for later
use and re-use (doesn't use it now). Suppose admode 15 specifies register 10 (f
irst), register 7 (second), and an adjustment of -58. The Assembler translates
LEA (with syntax specifying admode 15 and the register info) into an ADD: The
GET1 field is given admode 4, register 10, and a 0 offset; the processor fetches
128 bits from the address (part of generic instruction we haven't got to lets us
select correct 64 bits). GET2 field is given admode 1, register 7, and adjust
of -58; the processor modifies the register and gives its content to the ADD.
Then the PUT field specifies where to save result. GET2 might have admode 0 and
get result without modifying register 7. Any LEA can be translated!
( 8 ) The 12864 Microprocessor
At last we can continue the bit-designations of the generic instruction
format. Been about 1000 bytes per bit of explanation, so far...!
Bits 45-39 specify a Bitfield Size for the instruction. These 7 bits can
hold any number from 0 to 127, and with 0 being interpreted by the processor as
128, it becomes possible for the instruction to operate on any data size from 1
to 128 bits. Even though the registers of the 12864 microprocessor are only 64
bits wide, its Arithmetic/Logic Unit is 128 bits wide, and is able to handle any
data size smoothly. So if the Bitfield Size is 79, then 79 bits will be taken
from the place specified via GET1, manipulated (if the instruction requires it)
with 79 bits from the place that GET2 indicates, and finally a 79-bit result is
sent to the place described by PUT. The 12864 Assembler considers the Bitfield
Size to be optional information; if it is not provided by the programmer, a size
of 128 bits will be assumed. Some Assembler instructions, like LEA, default to
64 bits due to the nature of the instruction (LEA computes a 64-bit address).
MUL will always have two 64-bit inputs and one 128-bit output; DIV will always
have a 128-bit dividend, a 64-bit divisor, and a 128-bit quotient. And whenever
Immediate Data is specified, then either the whole instruction must be limited
to 64 bits, or the processor must allow 64 bits to be used in the manipulation
of 128 bits. (Perhaps we can have both: The processor can have the ability to
do the latter, while the Assembler lets the programmer decide the former.) One
way the programer can set the Assembler's default to 64 bits would be to simply
specify only one data-holding register in an instruction's syntax.
Bit 38 of the generic instruction is the Signed Extension Flag. It tells
the processor to treat the result of an operation as a twos-complement number,
if this bit is set. When the result is PUT into its destination, its negative-
ness or positive-ness, as it exists within the Bitfield Size, is extended out to
the Bit-127-mark (the Most Significant Bit is numbered 127; the Least is 0). If
only one register is specified, then sign-extending the result out to the Bit-63
-mark is the thing to do. If the Signed Extension Flag is not set, the result
of the instruction is simply PUT into its destination, and nothing else is done.
Bits 37-34 contain the Do-If condition. Practically the whole instruction
set of the 12864 processor is conditional. This lets the programmer avoid a lot
of conditional-Branches that only skip past a few instructions. Where formerly
some code might have: BCS (branch if carry set) followed by a ROT (rotate) that
would be executed if the carry flag was clear, now we can specify Do-the-ROT-If
Carry Clear, and delete the Branch entirely. In fact, with these 4 bits we can
delete the entire collection of Branch operations from the generic instruction
set of the 12864! The Assembler simply translates any Branch to ADD Immediate
Data to the Program Counter, and sets the appropriate Do-If condition-bits. Of
course, most of the time, most instructions will set the Do-If to ALWAYS. With
only 4 bits, only 16 conditions are allowed. This is enough for Motorola's 6809
and 68020; I hope the final design of the 12864 processor won't require more.
Bits 33-29 are the Flag Mask bits, the other side of the coin from the Do-
If conditions. If every instruction can be controlled by the flags in the CCS
register, it follows that every instruction should be able to specify which CCS
flags, if any, will be affected as a result of its implementation. In fact, for
the Branch instructions to be properly deleted from the generic instruction set,
it is essential that flag-masking be possible. Traditionally, Branch operations
never affect any flags; translating them into ADD instructions makes it obvious
why we require flag-masking. Now consider again the Do-the-ROT-If Carry Clear
( 9 ) The 12864 Microprocessor
that was previously described: What if the instruction after the ROT is also to
be executed only if the Carry flag is clear? A ROT normally affects the Carry
flag! So we mask the flag; the next instruction can also Do-If Carry Clear. In
the 6809 and the 68020 there are only 5 conditions-of-results flags; I hope the
final design of the 12864 processor won't require more.
Bits 28-0 (yes, all the rest) are devoted to the details of the PUT field.
However, the highest seven of them, Bits 28-22, can have another purpose. There
is a group of operations that perform what we might call 'minor manipulations',
and which may need some minor data. The generic instructions of this class that
I have so far identified are, in alphabetical order: ASL and ASR (arithmetic
shift left and right), COPY, INIT (initialize), ISUB (subtract from an initial
value), LSR (logical shift right; LSL = ASL), ROL and ROR (rotations), and
SWAP. ASL, ASR, LSR, ROL, and ROR need data ranging from 1 to 128; INIT, ISUB,
and sometimes COPY, need twos-complement numbers ranging from -64 to +63. The
specification of 7 bits was decided by the needs of ASL, ASR, LSL, ROL and ROR;
other instructions are merely taking advantage of what is already there. Only
SWAP does not need any of those seven data bits. We could have assigned eight
bits to ASL, etc.; twos-complement numbers from -128 to +127 (with zero = +128)
would let us reduce the list of generic instructions even more. Unfortunately,
we are running out of bits! So we can either assign 5 of 64 generic operations
to various kinds of bit-shift, and use 7 bits to describe the size of the shift
--or we can have 3 generic shift operations and use 8 bits to describe the size
of the shift. But ONLY those 3 generic instructions ever really need that 8th
bit! It seems more reasonable to use an extra 2 of the 64 generic instructions.
Let's examine some of the capabilities of these 'minor' manipulators:
ASL (and the identical LSL) merely shift bits from Least Significant to
Most Significant. The Bitfield Size determines how many bit-positions will be
involved in the shift. There is also some Bitfield Start data (which we haven't
got to yet, but has to be mentioned NOW) that specifies exactly where among the
128 bits the Bitfield Size is located. The 12864 Assembler needs to scrutinize
these things carefully; we can't let Bit 100 be the Start while the Size is 34
bits, nor let the Size be 52 bits while the Shift is 73 bits! One final thing
about ASL and LSL: Perhaps they shouldn't be so identical. The 6809 processor
defines them so that there's no reasonable difference between an ASL and an LSL.
But the 68020 places a new flag in the CCS register, an eXtend flag designed to
hold a bit of data specifically for arithmetic operations. The Carry flag holds
data for both arithmetic and logical operations. Yet LSL and ASL both affect
the X flag! So perhaps a distinction can reasonably be made: Only ASL should
affect X. (ASR and LSR also have this small irrationality.)
ASR and LSR are similar to ASL, of course, their main difference being
that these instructions shift bits from Most Significant to Least. More details
of what they do need not be presented here; they all are common instructions.
But we might note that the power of the 12864 processor lets us get data from
just about any place in the computer (using GET1), shift or otherwise manipulate
any part of that data, and then PUT the result almost anywhere else, all in just
one instruction. The mundane turns into the extraordinary.
INIT lets us initialize a register, or registers, or data at any memory
location, such that it becomes a 64-bit or a 128-bit expansion of any number in
the 7-bit range of -64 to +63. INIT replaces CLR (clear), which initializes a
( 10 ) The 12864 Microprocessor
data-storage place to zero only; now we can initialize to 1, or -1, or to any of
more than a hundred possibilities. Note that INIT never needs any GET1 info.
ISUB replaces both NEG (negate) and NOT, which respectively subtract a
number from 0 or -1. ISUB subtracts numbers from anything in the Initial Value
range of -64 to +63. Finding other uses for this operation is not so important
as consolidating NEG and NOT into one generic instruction. The Assembler will,
of course, retain both NEG and NOT, and translate appropriately.
ROL and ROR are pretty much like the shift instructions. The 68020 has
another sort of rotation called ROXL and ROXR, but the 12864 may not need them.
First examine the rotation operation of the 6809: The Carry flag is always part
of the rotation; a bit coming off one end of a byte is moved to Carry by one ROT
and moved out of Carry back into a byte by another ROT. In the 68020 a simple
ROT moves a bit from one end of a location directly to the other end; a copy of
that bit is placed in the Carry Flag. ROX, on the other hand, uses the eXtend
flag the same way the 6809 uses the Carry. In the 12864 processor we can mask
flags that an instruction would normally affect. Suppose the 12864 rotation is
designed to normally flag both X and Carry: If we mask Carry, no copy is sent
there; if we mask X, the bit that normally moves through it simply bypasses it.
(Similarly, the 12864 can have one generic ASL/LSL operation, but the Assembler
can mask the X flag for LSL--if the notion proposed earlier is adopted.)
The COPY instruction replaces LoaD, STore, TransFeR, INC, DEC, JMP from
the 6809, and COPY also replaces MOVE from the 68020. Even some LEA operations
can be translated to COPY. The GET1 admode field lets us specify any place in a
12864-based computer from which to fetch data (and any number of bits from 1 to
128); the PUT field lets us specify almost any other place to receive a copy of
those bits of data. What could be simpler and more powerful? To replace INC
and DEC, GET1 can specify admode 2 -- same as PUT. When GET1 holds 2 while COPY
is being processed, the 7-bit Initialize-data will be used to modify the place
specified by PUT. Instead of only -1 or +1, the INC/DEC can now range from -64
to +63 -- even to +64 if the value of zero is interpreted thus (it's no good for
anything else!). Some LEA instructions that the Assembler translates into COPYs
will have admode 0 in GET1, a specified register, and a 16-bit offset ranging
from -32768 to +32767. PUT would specify the same admode and register, and an
offset of zero. Masking the flags is normal for LEA. Larger offsets can become
ADD Immediate Data to a register, with the flags masked. JMP instructions are
translated into COPY to the PC register, with masked flags--and remember that
any JMP can now be conditional! Load and store and transfer and MOVE operations
become COPY memory to register, reg. to mem., reg. to reg., and mem. to mem.
Another 68020 instruction, PEA (push effective address), may be unneeded in the
12864. It has the effect of computing an address and saving in a place that is
NOT a register, for later use (most likely by the Program Counter, since there
isn't a LEA-to-PC instruction in the 68020). In the 12864 processor, we simply
specify the Program Counter's register-number in the PUT-field data if we want
to LEA-to-PC. Otherwise we can PUT the EA almost anywhere else, for later use.
( 11 ) The 12864 Microprocessor
SWAP is similar to COPY, in that the GET1 data specifies one place while
the PUT data specifies another. However, as the names imply, they do different
things: The 12864 SWAP replaces both the 68020 SWAP and EXG (exchange); data in
the PUT place is sent to the GET1 place, as well as the usual GET1-to-PUT. Two
thing to note about SWAP are that register-adjustments of zero, in the specified
admodes, will probably be common, and the CCS flags will usually be masked. But consider that if the GET1 admode is 2 (same as PUT), then nothing happens. This
may be the ideal thing for the Assembler to translate a NOP into. And if the
flags are NOT masked while the GET1 admode is 2, during the generic SWAP, then
this may be the ideal thing for the Assembler to translate a TST into. (If the
flags aren't masked during a normal SWAP, then they will be affected only by the
data going from the GET1 place to the PUT place.)
Now back to Bits 28-0 of the generic instruction; as mentioned, they hold
the details of the PUT field data; we shall begin with Bits 0-6. These specify
the Bitfield Start for the PUT field, from 0 to 127. After the 12864 processor
analyzes the identity of the place where the result of an instruction is to be
PUT, the Bitfield Start tells it exactly where in that location the result goes.
For most instructions, most of the time, the value here will be Zero.
Bits 7-12 specify the number of the first register needed to identify the
place where the result is PUT. In other words, if Register 7 is the destination
of the data, then a 7 will be here (admode 1 in the PUT field). To modify flag
bits in the CCS register, simply set a Bitfield Size of 5 (for 5 flags), the CCS
register's number here, and a Bitfield Start of zero (assuming the designers put
the CCS flags in the lowest bit-positions of the register). If a memory address
indexed by Register 15 is the data's destination (admode 4 or 5 in PUT), then 15
will be the number placed here. Bits 7-12 can hold any number from 0 to 63, and
as mentioned early in this essay, the 12864 will probably only have 45 registers
or so, total. Anything more than the highest register number would be illegal,
of course, even in the Boss mode! If admode 2 or admode 7 is specified in the
PUT field, then the processor would ignore any register-number in these bits.
Admode 3 would be another, except it is illegal in the PUT field.
Bits 13-28 specify the offset or adjustment to be applied to the register.
indicated in Bits 7-12. At least this could be true for instructions OTHER than
ASL, ASR, COPY, etc., because only OTHER instructions never need the 7 bits from
22-28. An index register being used with a ROL instruction can only have a nine
bit offset or adjustment applied to it (in Bits 13-21, of course). HOWEVER, it
can be worse! Bits 13-18 may specify a second register altogether! For admodes
0 and 1, any Bitfield Size 65 or more, we must specify 2 registers. For admodes
6, 10, 11, 14, and 15, a second register is a normal part of address-indexing.
(At least those admodes get a 64-bit offset from the second register, applied to
the first register.) After a second register has been specified, only the bits
from 19-28, or from 19-21, can be used as an offset or adjustment to the second
register (a 10-bit or a 3-bit modification,
respectively). Here is a chart:
( 12 ) The 12864 Microprocessor
|2 2|2 1|1 1|1 | |
|8| | | | | |2|1| |9|8| | | | |3|2| | | | |7|6| | | | | |0|
| 16-bit offset/adjust | First | Bitfield |
| applied to first Register | Register* | Start |
|-------------------------------| | for PUT |
| ASL, ASR, | 9-bit off/adj | | only |
| COPY, INIT, | to 1st Register | | |
| ISUB, LSL, |-----------------| | |
| LSR, ROL, |3-bit| Second | | |
| ROR data | ad- | Register* | *will be | |
| | just| admode 0, | ignored | |
| | to | 1, 6, 10, | in admode | |
| | 2nd | 11, 14, | 2, 7 | |
| | Reg.| and 15 | | |
|-------------------| | | |
| 10-bit offset or | | | |
| adjust to 1st or | | | |
| 2nd reg, depending| | | |
| on the admode. | | | |
And just to be complete:
|6 5|5 5|5 5|4 4|4 3|3|3 3|3 2|
|3| | | | |8|7| | |4|3| | |0|9| | |6|5| | | | | |9|8|7| | |4|3| | | |9|
| 12864 | GET1 | GET2 | PUT | Bitfield | | Do-If | CCS |
| Instruc. | admode| admode| admode| Size, for | | Con- | Flag |
| Code | | | | entire | | dition| Masks |
| | | | | operation | | | |
Having used up 64 bits of the normal 128-bit fetch by the 12864 processor,
it's obvious that to provide details of the admodes specified for GET1 and GET2,
we will need to use the other 64 bits. Now it has already been stated that they
are supposed to hold Immediate Data or an Absolute Address; the potential for
conflict is obvious! This conflict is the main reason admode 2 was created: It
makes the GET1 field use the admode in the PUT field, thereby eliminating any
need for any specific GET1 information among the second 64 bits of the operation
fetch. And if the GET2 field specifies admode 3 (Immediate Data) or 7 (Absolute
Address), then THAT is all the GET2 information needed, and the instruction can
be properly executed. So the main restrictions of limiting the GET1/GET2/PUT
system to a total of 128 bits are these: (1) We can't combine Immediate Data
with more Immediate Data; (2) We can't combine Immediate Data with data at an
Absolute Address; (3) We can't combine the data at two Absolute Addresses; and
(4) We can't use Immediate Data or an Absolute Address in any instruction where
the GET1 admode is different from PUT. How much does it matter that we can't do
these things? We already can't do them with any current processor, right? What
we CAN do is far more important: Not only can we combine Immediate Data or the
content at an Absolute Address with the content of any register (normal for any
processor), we can also combine our Immediate/Absolute information with the data
at any place in the memory that can be index-referenced -- and save it too! The
typical 12864 program will probably be position-independent, anyway, and seldom
need Absolute Addressing. It likely will start by loading several registers
with the addresses of a number of data tables, all relative to the PC register.
No Immediate Data there! Then the remaining registers will become variable-
holders, and use Immediate Data as needed, just like any
( 13 ) The 12864 Microprocessor
So to be a little more specific about how GET1 and GET2 information is set
among the second group of 64 bits, let's first note that it took all of 29 bits
for the PUT information. Keeping that the same for GET1 and GET2 means that 58
of the 64 bits get assigned real quick! Suppose we assign the Least Significant
32 bits to the GET1 information, and the Most Sig. 32 to the GET2 information.
This leaves 3 bits extra for GET1 and 3 extra bits for GET2. The most obvious
thing to do with the extra bits is to expand the offset/adjustment data (from 16
to 19 bits, for example), but perhaps they can be used for something else. Note
that the ASL, ROL, etc. data takes space away ONLY from the PUT information. A
possible use for one of the extra bits is that of being a flag controlling the
Bitfield Start data: If the flag is zero, then the seven bits hold the number
of the starting bit; if the flag is one, then six of the seven bits specify a
register-number where the information on the starting bit is to be found. It
would have been nice to have had enough bits to do this to the PUT field, but it
may not be missed too much, since the PUT field's Bitfield Start is likely to be
zero most of the time, anyway. So here is one more chart:
The GET2 info duplicates this GET1 info, except Bit-numbers range from 32-63.
|3|3 1|1 1|1 | | |
|1|0| | | | | | | | | | |9|8| | | | |3|2| | | | |7|6|5| | | | |0|
| | 18-bit offset/adjust, applied | First | | Register |
| | to First Register | Register | | holding |
| |-----------------------------------| | |-----------|
| | 12-bit off/adj to | Second | | Bitfield |
| | 1st or 2nd Register, | Register | | Start data |
| | depending on admode | | | for GET1 |
Flag determining use of register to hold Bitfield Start data
Now let's consider a few things about the 12864 Assembler. Obviously it's
going to recognize many common assembly instructions, and translate them into
the far fewer set of generic instructions recognized by the processor. The set
of 12864 instructions may be enlarged, simply to take advantage of the possible
list of 64. Ordinary instructions like ADD, SUB, ADDC (add with carry), SUBC,
ABCD (add Binary Coded Decimal), SBCD, OR, EOR (exclusive or), and AND may be
supplemented with NOR, ENOR, and NAND. I don't propose to offer a complete list
here; let the Industry decide all the final details. The main thing that needs
some attention right now is the format of the Assembler instructions; each will
occupy a fair amount of space! But this is reasonable, considering that a 12864
instruction will usually equal 2, and often 3, regular-processor instructions.
All the information from 3 regular lines of Assembly code, plus some new stuff,
has to fit on 1 line in this proposed Assembler format:
|Label|Instruc |Bitfield|ASL, etc| GET1 | GET2 | GET3 |Do-If|Flags |Comment
|field|Mnemonic| Size |7bt data|admode|admode|admode|cond.|masked| field
The Label field gives this place in a program an optional name, so that it
referred to from other places in the program, if desired.
The Instruction Mnemonic is, of course, the name of the instruction.
Bitfield Size (BfSz) is simply a number 1-128; if this part of the Assembly
format is blank, a 128 Size is assumed -- but Admode data
may change it to 64.
( 14 ) The 12864 Microprocessor
7-bit data is required in this area whenever the Mnemonic is ASL, ROL, etc.
The nature of this data has already been described. Note exceptions like SWAP
and COPY, which the Assembler knows never needs this data. The Assembler offers
INC and DEC instructions that will require 7-bit data; the programmer never need
see this get translated to COPY. Exceptions are peculiar, aren't they?!
Below are examples of the syntax for the addressing modes:
Admode Syntax Explanation
0000 16;+20.33 Register 16 has data. A Bitfield Start (BfSt) of 33 is
specified, so data extracted from 16 starts at Bit 33
(BfSz specifies how many bits). Extracted data will
have 20 added to it (register not affected), before
being given to the current instruction. An assumed BfSz
of 128 would change to 64 (one register specified) minus
32 (due to the BfSt). Conflicts cause Assembly errors
0000 6 9.18 Two registers have data: 6 is Most significant; 9 is
Least significant. 9 is the First Register in Bits
7-12 of the Specific Information area; recall charts.
Data extracted from registers starts at Bit 18.
0000 20:10 Data in register 20. BfSt in register 10. Note that a
period denotes exact BfSt data; a colon means a register
has the data. BfSt-in-a-register is illegal in PUT.
0000 10 11 Two registers have data. BfSt is assumed to be zero.
Note spaces denoting registers. Other admodes will use
commas, and admode fields must be tabulation-separated.
0000 7 3;-123 128 bits of data available in registers 7 and 3. BfSz
determines how many are extracted. BfSt assumed to be
0. 123 subtracted from it before instruction gets it.
0000 URHERE PC:12 The Assembler will accept either 'PC' or the actual
number of the PC register (as yet unknown!). Assembler
computes offset between content of PC register (what it
will be at end of instruction) and place in memory that
is designated by label 'URHERE'. Suppose PC register is
0000 PC;-87:12 \ 34, and the programmer knows that the offset is -87:
0000 URHERE 34:12 > would be identical to URHERE PC:12. PC is the only
0000 34;-87:12 / register to which labels can be referenced, because
it is the only register that has it value known at all
times by the Assembler (relative to Origin of program).
Note the :12 means BfSt is in register 12 (no, I don't
know why the programmer wants that in this example!).
0001 20+20.33 Check first example; note lack of semicolon here. Plus
or minus sign mandatory for all offsets and adjustments.
With plus adjust, data first goes from register to the
instruction. At END of instruction, data in register
is adjusted. With minus adjust, register is adjusted
before data extracted from it for use by instruction.
( 15 ) The 12864 Microprocessor
0010 To specify admode 2, simply leave the field BLANK!
0011 #123456 Immediate data preceded by #. A + or - is optional.
0100 ,20;+20.33 Check first example; note extra comma here. Register 20
now being used as index, with an offset of 20 applied to
it. The offset value (index+offset) is the address from
which data is removed, starting at Bit 33. The data may
extend to Bit 127, depending on the BfSz. Note that
initial index is always just ONE register.
0100 ,14:2 Register 14 is index of address holding data; register 2
has data on where is the BfSt. Specifying offsets,
adjustments, or bitfield starts is always optional.
0100 URHERE,PC Note use of comma. Assembler computes offset between PC
and URHERE, as before. In previous exmple the ADDRESS
of URHERE was the information (ignoring fact that BfSt
specified in that example made the address useless!);
now the memory content at that address is the data.
0101 ,20+20.33 Check similar examples. Here register 20 is an index
which is used to fetch data. Afterwards, an adjustment
of +20 is applied to the register. If the adjustment is
negative, it is applied to the index before the index is
used as an address-pointer that tells us where data is.
0110 ,10 12-14:5 Register 10 is the index, holding an address. Register
12 has a 64-bit offset to that address. Before offset
is applied, register 12 receives adjustment of -14. The
address thus found (by applying adjusted offset to
register 10) is the address of the data, which will be
accessed using the BfSt data in register 5.
0111 >123456 Absolute Address always preceded by > symbol. Out of
18.4 quintillion possibilities, this one is pretty low!
1000 [,20;+20]L.33 See admode 0100; the part inside brackets is figured in
exactly the same way, resulting in an address. Exactly
64 bits are always extracted from that address in this
admode. They are the Least Significat 64 bits, as the L
indicates. The 64 bits are then used as the address of
the data, which starts at Bit 33.
1000 [URHERE,PC]L The lowest 64 bits of the data at address URHERE are
extracted and used as an address. The instruction gets
its data from the address thus found. The L (or M, in
admodes 12-15) is a mandatory part of the syntax. All
programmer does is provide correct syntax; the Assembler
will deduce from that syntax the admode number, and the
specific info, that are built into the instruction.
( 16 ) The 12864 Microprocessor
1001 [,25-2222]L:13 The value in register 25 is adjusted by -2222 (maximum
can be -32768 in PUT before an assembly error occurs, or
-131072 in GET1 or GET2), and then the adjusted index is
used to fetch an address (least significant 64 bits).
In turn the fetched address is used to fetch the needed
data, using the BfSt in register 13.
1010 [,5 9-873]L See the example for admode 6 (0110); bracketed syntax is
analyzed the same way, this time using register 5 as the
basic index, register 9 as holding the 64-bit offset,
and -873 as the adjustment applied to the offset, before
the offset is applied to the index. The address thusly
computed is the place from which the Least 64 bits are
taken and, in turn, are used as the address to fetch the
data. Note that -873 is too big for PUT information,
but would work as GET1 or GET2 information.
1011 [,18]L 6+3 Value in register 18 is used as an address to fetch an
address from the memory. Least significant 64 bits are
taken from memory to become an address. Register 6 has
a 64-bit offset, which is applied to the extracted
address. The thusly-computed new address is the place
where data will be found. (I say 'found' or 'fetched',
but address is also a possible place to PUT the data.)
Afterwards, register 6 is adjusted by +3. This example,
if in the PUT admode field, and if the instruction is
LSL or one of that group, is using the largest positive
allowable adjustment (3 bits, twos-complement). What's
the chance of having only 32 generic instructions, so
we can move a bit to the PUT information field?
I don't think I need to provide any examples for admodes 12-15; they
are identical to admodes 8-11, with the sole exception that the letter L
in the syntax is replaced by M. The Assembler uses L and M to determine
the correct admode; the 12864 processor uses the admode to determine that
either the Least Significant or the Most Significant 64 bits are to be
taken from the memory and used as an address. This process has absolutely
nothing to do with Bitfield Sizes and Bitfield Starts.
It should be repeated that these examples are only a proposal; thinking
about them is bound to lead to speculation about how easily the programmer
can make a mistake by forgetting a comma. A whole different syntax might
be created just to reduce the chance of such accidents, perhaps one where
mnemonic letters replace the commas, periods, colons, and semi-colongs --
even lower-case letters, to prevent confusion between O/offset and 0/Zero.
simply attempts to make the admode-field information compact.
( 17 ) The 12864 Microprocessor
The next field of the Assembler format, after the PUT admode field, is the
Do-If condition. Two letters suffice to abbreviate the possible conditions (at
least only 2 letters if Motorola's list is used): HI (higher); LS (lower or
same); CC (carry flag clear); CS (carry set); NE (not equal to zero; zero flag
clear); EQ (equal; zero flag set); VC (oVerflow flag clear); VS (oVerflow set);
PL (plus); MI (minus); GE (greater than or equal to zero); LT (less than zero);
GT (greater than zero); and LE (less than or equal to zero). This list totals
14 possibile Do-If conditions; with a maximum of 16 allowed, the last two are
usually Do Always and Do Never. For the purpose of the Assembler format, the
Do Always condition can be the default if the Do-If field is simply left blank,
but it wouldn't hurt to allow a DA abbreviation. A DN abbreviation is logically
sensible, but practically almost useless -- a NOP for sure! (If the Assembler
converts NOP to SWAP, as proposed, obviously the Do-If would be Never!). Maybe
some other Do-If condition can be created, just to use that 16th possibility.
After the Do-If field in the Assembler instruction format is the Flag Mask
field. Motorola's flags are abbreviated X, N, Z, V, and C, so simply putting an
appropriate letter (or letters) in this field should tell the Assembler that you
don't want a particular flag to be affected by the current instruction. Simply
entering ZCN without any punctuation should be adequate to specify the Carry,
Zero, and Negative-sign flags, for example. Now consider the opposite notion:
Some Assembler instructions, like LEA, will be translated into other operations,
and the flags will automatically be masked by the Assembler during translation.
In the 6809 processor there are two registers Y and U, which are not treated the
same by LEA instructions. LEAY will affect the Zero flag, while LEAU will not.
The idea is to let register Y be used in counting loops, and it works fine. The
12864 Assembler could allow the same sort of thing: If the programmer specifies
the Z flag in the Mask field during an LEA instruction, then the Assembler WON'T
mask the flag! More precisely, what is happening is the programmer telling the
Assembler to reverse its normal handling of the 12864 flagmask bits. If the
Assembler usually doesn't mask a flag, then it will be masked -- and vice-versa.
The last field of the Assembler format is the Comment field, in which the
programmer is supposed to explain the purpose of the instruction. This field is
completely ignored by the Assembler, of course, during the task of creating the
machine code for the 12864 processor from the assembly source listing.
And now my two-cents-worth on the hardware of the 12864 computer; if what I
am about to say is really worth as much as two cents, I'll be surprised! The
average computer has a System Clock that controls the timing of everything that
goes on in the computer. The average microprocessor accesses the memory every
(fill in blank) cycles of the System Clock, on the average. The remaining clock
cycles are spent by the processor processing the data it has accessed. Some of
the newer processors have 'preprocessors' built into them, so they can access
the memory significantly more often. The preprocessors begin working on future
instructions before the main processor finishes the current instruction; it is
known as 'pipelining', I believe. The 12864 will be both similar and different
to this scheme. It'll likely have one main processor for the main instruction,
and 3 subprocessors to handle the data represented by GET1, GET2, and PUT. It
figures that if the average 12864 instruction is as complex as 2 or 3 regular-
processor instructions, the 12864 may have to do as many memory-accesses as 2 or
3 'regulars'. Yet by processing GET1, GET2, and PUT simultaneously, the 12864
is essentially doing the work of the 'pipeliners'. Whether or not pipelining of
( 18 ) The 12864 Microprocessor
the current sort is actually built into the 12864 remains to be seen. In the
meantime, though, the 12864 is still going to spend a number of clock cycles in-
between memory-accesses, during which it is processing the accessed data. Since
it is fairly obvious that the more often a processor can access the memory, the
greater the performance of the computer, the standard trick is to increase the
speed of the System Clock, and building both processors and memory chips to keep
up. Nevertheless, this does not change the fact that the processor spends many
clock-cycles NOT accessing the memory! And I get the impression that the memory
chips are not keeping up with the processors, in the speed race. So here is my
suggestion: Build the 12864 with a faster clock than the System Clock. It will
have to hold its outside lines open for more than one internal clock cycle each
time its subprocessors access memory (to stay in sync with the System Clock),
but while it is doing that, its main processor can be manipulating previously-
accessed data. With proper planning the 12864 should be able to access memory
almost every cycle of the System Clock, at the memory's maximum possible speed.
I have been saving the thorniest problem for last (at least I think the end
of this essay is approaching!), and it concerns the hardware's management of the
data. The first part of the problem is this: While most 12864 instructions are
128 bits long, many will be fully described in only 64 bits. So do we make the
processor skip the other 64 bits, and move on to the next memory location, or do
we scheme to fit another whole instruction in those 64 bits? My inclination is
to ignore the 64 bits, UNLESS it 'just happens' that two adjacent instructions
in the assembly source listing can both be reduced to 64 bits. In other words,
what the processor would do is load 128 bits, discover that the first 64 of them
comprise a complete instruction, execute that instruction, and test the next 64
bits to see if they also comprise a complete instruction. If they don't, they
will be ignored, and the processor will load 128 bits from the next address. It
would be worth having this scheme just to give the programmers a chance to prove
they are clever enough to always make full use of it. Any programmer who NEVER
attempts to conserve memory should be fired! (And so what if there are more
than 18.4 quintillion memory locations -- waste is waste.)
The other aspect of the memory management problem concerns the Stacks, which
are places where random numbers of registers are temporarily stored. If each
address a Stack register points at holds 128 bits, and each register being saved
is only 64 bits wide, then it seems at first obvious to always put 2 registers
at the Stack address. But many times an odd number of registers will be saved;
what then? The very simplest answer is to always only store 1 register at each
Stack address, and ignore the obvious waste, because this way the processor can
never get confused. The next-simplest answer may be to REQUIRE the programmer
to always PUSH or PULL an even number of registers when using the stack -- even
a JSR (jump to subroutine) instruction would have to save another register with
the Program Counter, just to keep the total even. I think I may recommend this
particular solution (would you believe I have been worrying about this since the
middle of this essay, and just now have come up with the idea?).
The bit-code format of instructions like JSR, BSR, PSH, PUL, and MOVEM can't
be the same as the format for most 12864 instructions. The main reason is, as
mentioned, that the instruction has to incorporate a list of registers -- but it
works out OK, because much of the instruction is predefined. Before we get into
any details of that, though, let us examine the Stacking system a little closer.
In the 6809 there are two Stack registers, one of which is always used by the
hardware to save JSR and interrupt information, and one
of which the programmer
( 19 ) The 12864 Microprocessor
can use for other things. There are occasions when having two Stacks is really
convenient, notably when moving large blocks of data around. In the 68020 there
are three Stack registers, one for the Boss mode, one for the Interrupt mode,
and one for the Peon. Two bits in the CCS register are devoted to keeping track
of which Stack the hardware is using at the moment, so if it had been wanted, a
fourth Stack could exist in the 68020. This seems worth putting in the 12864.
And another thing: TWO CCS registers! One would be a Boss mode CCS that keeps
track of things like the current Stack being used and interrupt-control flags,
as well as the list of registers to be saved during an Interrupt, as proposed at
the beginning of this essay. The other would have the instruction-result flags
in it and some other stuff. MOST of that other stuff is another register list,
like that in the Boss CCS. Thus when a GSR instruction is used (generic for JSR
and BSR: go to subroutine) a list of registers could specified that would be
saved in the Peon CCS. Here is a proposed bit-map for GSR:
|6 5|5 5|5 5|4 4|4 |
|3| | | | |8|7| | |4|3| | |0|9| | |6|5| | |.....| | |0|
| Code For | GET1 | GET2 | Do-If | Register List |
| GSR |cannot | | (PUT | Note Peon CCS |
|Instruction| be | | is PC | and PC registers|
| |admode | |always)| not on list; |
| | 2 | | | always saved. |
If GET2 is admode 2 then data specified by GET1 is copied to PC --
equivalent to JSR. If GET2 is any other admode then the data it
specifies is added to the data GET1 specifies, and the result is
copied to PC. If GET1 specifies PC then we have a BSR equivalent.
The CCS instruction-result flags are NEVER affected by this one.
Normal limitations: No adding Immediate Data to Absolute Address!
It is worth noting that the Register List, from 0 to 45, is in agreement
with the early estimate of approximately 45 registers total for the 12864. If
there are any registers that we can be sure NEVER need to be saved during a GSR,
even during the Boss mode, then we can have a few more than the 48 implied here.
When executing a GSR, the processor would copy the specified register list to
the Peon CCS register, save them all on the current Stack, THEN save both the PC
and Peon CCS registers. When an Interrupt occurs, the last two registers saved
would always be PC and the Boss CCS (although the Peon CCS would be saved just
before then). One bit in the same place in the two CCS registers would serve to
identify which is which; this bit cannot be allowed to be changed by anything.
Then when the generic RTN (return) instruction is executed, 128 bits of PC and
CCS data would be taken from the memory; the correct CCS would be identified,
and the correct way of returning would follow. One thing to note about RTN from
a subroutine: The instruction is almost completely pre-defined. The only odd
thing is that the values of the instruction-result flags in CCS BEFORE the RTN
occurs have to be preserved while CCS data is being loaded from the Stack during
the actual RTN operation. Unless various flag-masks are set by the programmer!
The bit-coding of RTN only needs 6 bits for the instruction, 4 bits for Do-If,
and 5 bits for flag-masks (flags the programmer does not want preserved during
the RTN from a subroutine); the rest of the 64 bits can be ignored. Programmers
should be wary of specifying any flag-masks for RTN at the end of an Interrupt
handling routine, since here the normal thing for the processor to do is to NOT
preserve the flags, as they exist at the end of the Interrupt handler. Masking
them would mean transferring Interrupt data to the interrupted program. This
would be OK if the interrupted program was specifically
waiting for such....
( 20 ) The 12864 Microprocessor
PSH, PUL, and MOVEM-type instructions can all be combined into one generic,
I think, that we can call STAK. The bit-coding for it might be like this:
|6 5|5 |5|5 5|4 4|4 |
|3| | | | |8|7| | |4|3| | |0|9| | |6|5| | | |.....| | | |0|
| STAK |Con- | | Do-If | PUT | Register List, to |
|instruction|trol | | | | be stacked or |
| code |Bits | | | | unstacked |
| | | | | | |
| | | | | | |
^ ^ ^
PUT specifies the address where the stack is to start. If LOCATION
OF ADDRESS is in memory somewhere, one Control Bit denotes L or M
for the 128 bits at that location, from which the stack's address
will be fetched -- no bitfield specs! After STAK is finished, the
PUT place is given a new value, indicating the new start of the stack.
(Immediate Data still forbidden in PUT, of course.) One Control Bit
specifies top or bottom of stack; another Control bit specifies data
being added to or removed from the stack. As always, only an EVEN
total number of registers may be specified. Bit 54 means that the
Peon CCS register is part of the stack operation. STAK never affects
flags, except when loading CCS from this kind of stack. (I forgot to
say, details of PUT can be in other 64 bits of the instruction fetch.)
That about wraps it up, I guess. Any inconsistencies you may have noticed
are due to the fact that this is only a proposal, and therefore does not need to
be perfect. Only if the Industry decides to get together to create a standard
microprocessor along these lines would it be necessary to get really finicky on
all the details. And what do I want out of this? First of all, I want to beat
the NIH Syndrome: 'If it is Not Invented Here, we are not interested!' Except
for the fact that computers I own and know well happen to have 6809s in them, I
am not associated in any significant way with any company in the entire computer
industry. I will claim the credit for dreaming up this thing, just to prevent
anyone else from doing so -- and just to prevent any person or any company from
claiming ownership of it, I am quite deliberately placing this whole concept in
the public domain, as of NOW. Thus the whole industry starts off on an equal
basis with respect to the proposed 12864 microprocessor, and there should now be
no barrier to creating an industry wide standard. I am knowingly forfieting all
legal claim to any compensation for these ideas, just to prove I seriously want
the Industry to get its act together. On the other hand, any 'royalties of
conscience' that might come my way will be gladly accepted!
March 17, 1991