The 12864 Microprocessor

                                 March 17, 1991



    This is a daydream...let me call it the 12864....                          


    Let me start by saying I am prejudiced in favor of the 6809 microprocessor,

created by Motorola.  That it was the best of its day was confirmed when NASA  

decided to use it in the Space Shuttle's main computers.  I personally feel that

the 6809 should have had wider acceptance in the personal-computer market, and 

that Motorola snubbed its potential by introducing the 68000 too quickly.  Only

recently, with the widespread use of the 32-bit microprocessors, has the 6809  

really become outclassed.  So it is time to move on, time to create a new best 

microprocessor of the day.  Since this is currently only my own dream, it has  

been greatly influenced by what I know of the 6809, and also what I have learned

about the 68020.  I do not mean to ignore any worthy contributions from other  

microprocessors; that is in fact the main reason for this essay!  I am sharing 

my dream in hopes that it may be catching....                                  


    The 12864 is a 128/64-bit microprocessor.  It has 64 address lines, and all

registers are 64 bits wide.  But it also has 128 data lines, and this is why:  

First, being able to handle this many bits at once means that the 12864 doesn't

need a coprocessor; most coprocessors only handle 80 bits or so.  Therefore the

12864 also doesn't need a secondary instruction set telling it how to talk to a

coprocessor.  A second reason for having a 128-bit data path leads to further  

simplification of the microprocessor:  All its instructions have been carefully

designed to fit within 128 bits, so that a single memory-access can provide the

12864 with a whole instruction.  To make this still more efficient, the computer

that incorporates a 12864 will be required to have 128-bit-wide memory, and not

the common 8-bit-wide or 9-bit-wide memory of most of today's microcomputers.  

This means that the 64-bit Program Counter or PC register is always incremented

just once for each instruction pulled from the memory.  The 12864 is not much of

an evolutionary offshoot from previous microprocessors; it's a radical mutation.

Only in the efficiency of its instruction set does it relate to the 6809....   


    With 128-bit memory, design decisions made in the 6809 and 68020 are greatly

simplified in the 12864.  Example:  Because the 6809 fetched instructions only 8

bits at a time, there were two distinct groups of Branch instructions: an 8-bit

branch and a 16-bit branch.  Machine code that used 8-bit branches as often as 

possible was both shorter and faster than code that always used 16-bit branches,

because only one byte of memory and 1 clock-cycle of time was needed for 8-bit 

branching-data, while 2 bytes and 2 cycles were needed for 16-bit data.  (Not to

mention that 8-bit-branch INSTRUCTIONS were themselves only 8 bits, while most 

16-bit branches also had 16-bit opcodes.)  And in the 68020 processor, although

there are 8-bit, 16-bit, and 32-bit branch instructions; the latter, 32-bit type

requires an extra fetch of data from the memory.  But the 12864 processor needs

only one size of branch instruction, because any 64-bit branch-distance will   

always fit into a one-clock-cycle 128-bit opcode+data fetch.                   


    Likewise, because any 64-bit address in the memory can be part of a 128-bit

fetch, there is no longer any need for a special Direct Page or DP register.  In

the 6809 the DP register offered an 8-bit way to access part of the memory; thus

the longer and slower 16-bit way of specifying memory locations did not always 

have to be used.  This is not a problem in the 12864.                          

                       ( 2 )     The 12864 Microprocessor



    Now what about the choice to use 64-bit-addressing?  This represents about 

18.4 quintillion addresses (18,446,744,073,709,551,616 addresses, to be exact),

far beyond any reasonable projection of any computer's memory needs -- including

virtual memory!  Not to mention that since each address holds 128 bits of data,

we are actually talking about 295 quintillion (8-bit) bytes of memory!         


    Nevertheless, there are some possibly valid reasons for this choice:  First,

since the design of this processor is not yet completely fixed, and belongs to 

nobody, it might be that it could tickle the fancy of a number of different chip

manufacturers, and lead to a Industry-Wide Standard Design.  Naturally, it makes

sense for the 12864 assembly language instruction set to become standardized and

non-proprietary, also.  Therefore a second reason for choosing 64-bit addressing

is simply that it would take longer to put this complex chip into production --

and that hopefully gives the software developers plenty of time to convert their

existing software to run on this admittedly incompatible processor.  Thus, both

the new computers and their software could arrive at the same time!  Finally, a

third reason for jumping straight to 64-bit addressing is that the architecture

of the new computers can be designed with that in mind.  Simply because 64 bits

represents such a tremendous enhancement, making it the immediate goal means it

can remain a standard far into the future....                                  


    Now let's get into some of the details of the 12864.  The total number of  

registers of all types will be about 45, give or take a few.  This number can  

be decided after the Condition-Code/Status Register has had its bits defined.  

As stated earlier, every register is 64 bits wide, including CCS.  In the CCS  

register a number of bits are necessary for various processor functions; just  

how many depends on the total list of functions that will be designed.  For the

purposes of this essay, let us examine the CCS register of the 68020:  It is 16

bits wide, of which 12 are defined and 4 are undefined.  If we start with a 64-

bit CCS and only use 12 of them for such things as result-of-instruction flags,

interrupt masks, etc, then that leaves 52 bits that can be equated to the entire

register set of the 12864 microprocessor.  However, it is certain that some of 

those 52 bits will be dedicated to other processor functions (but I don't know 

other dreamers will add to this), and so the number of registers is yet unknown.


    In case you are wondering why match the bits of CCS with the register set, 

the answer involves the interrupt system.  Whenever an interrupt or exception or

other special event occurs, the processor can automatically save on a stack all

the registers that are specified in the CCS register.  The processor saves time

because none of those interrupt-type handling routines need include instructions

to specifically save and recover the registers they use.  In fact, if the 12864

computer system's main power-up/initialization routine includes defining such a

list of registers in CCS, then all interrupt-type routines can be written using

only those registers.  Different boot software, different registers.  Note that

2 registers, the Program Counter and CCS, which ALWAYS are saved, do NOT need to

be matched to bits in CCS, and so the 12864 can have 2 more registers than the 

simple count of available bits in CCS implies.                                 

                       ( 3 )     The 12864 Microprocessor



    The next thing to discuss is the actual list of registers.  A major element

in the design of the 12864 is that as far as the programming instruction set is

concerned, all registers are treated equal.  But as far as the microcode and the

hardware is concerned, some are more equal than others....  For the sake of this

discussion, let us assume that there are 45 registers, numbered from 0 to 44.  

Suppose that Register 33 is the Program Counter, while Register 17 is just an  

ordinary general-purpose register.  The hardware will always use 33 as a pointer

to the current instruction about to be executed, and the hardware will always  

adjust 33 to point it at the appropriate next instruction.  But the instruction

set will not distinguish 33 from 17!  A Logical-OR instruction that manipulates

a group of bits inside 17 can just as easily manipulate bits inside 33, simply 

by specifying 33 instead of 17, in the Logical-Or instruction.  Just because   

this is something that might be disastrous to the program is no reason to keep 

it from being possible!  Let the assembly-language programming tool be written 

so that it catches such dubious instructions, and warns the programmer!  The big

advantage of this scheme is that it leads to an extremely significant reduction

in the total complexity of both the instruction set and the microcode.  Examples

later on in this essay may make this more clear.                                


    Let us now examine the bit-format of some of the instructions.  By far the 

majority of the instructions will have a single format that offers astounding  

programming potential (well, what do you expect with 128 bits to play with!)....

Actually, most of this instruction-group format fits into 64 bits, numbered 0 to

63, and defined as follows:                                                    


      Bits 63-58: These 6 bits hold the actual generic instruction.  Of course 

this means that there are only 64 such instructions, but if you have any doubts

about this being enough, you don't yet realize how generic they are!           


      Bits 57-46 are divided into three groups of 4 bits each, hereinafter to be

referred to as 'admode fields', short for 'addressing mode'.  Since these fields

have 4 bits, it follows that there are 16 different addressing modes.  They will

be explained shortly.  The first admode field, bits 57-54, tells the processor 

where to find the first chunk of data needed for some instruction, say a SUB.  

The second admode field, bits 53-50, tells the 12864 where to find the second  

chunk of data; obviously a SUB instruction needs data that can be subtracted.  

And the third admode field, bits 49-46, tells the processor where to put the   

result of the SUB.  Perhaps you now see that with 16 addressing modes for each 

admode field, a simple generic SUB instruction can encompass both registers and

memory in quite a few different combinations!                                   


        For convenience, let us call the admode fields GET1, GET2, and PUT.  A 

list of proposed addressing modes follows, and if it is adopted, there will be a

few restrictions on the use of two of them.  Should the list be modified during

later design stages of the 12864, these restrictions may still apply.  The modes

subject to restriction are marked with * symbols; the limitations are detailed 

at the end of the list.  The admodes are numbered 0 to 15 in binary.           

                       ( 4 )     The 12864 Microprocessor



           Direct Modes 0 to 3                  Semi-Direct Modes 4 to 7

 0000 Register Data+16\9\10\3bit Offset  0100 Reg Address + 16\9bit Offset

 0001 Register Data+16\9\10\3bit Adjust  0101 Reg Address + 16\9bit Adjust

 0010 GET1=PUTmode, or GET2 or PUT=NONE  0110 Reg Addr+(Reg+10\3bit Adj) Offset

*0011 Immediate 64bit Data              *0111 Absolute 64bit Address           


                             Indirect Modes 8 to 15

 1000 [Reg + 16\9bit Offset],LSig64bits  1100 [Reg + 16\9bit Offset],MSig64bits

 1001 [Reg + 16\9bit Adjust],LSig64bits  1101 [Reg + 16\9bit Adjust],MSig64bits

 1010 [Reg+(Reg+10\3bit Adj)Offst],LS64  1110 [Reg+(Reg+10\3bit Adj)Offst],MS64

 1011 [Reg],LS64+(Reg+10\3bit Adj)Offst  1111 [Reg],MS64+(Reg+10\3bit Adj)Offst



   Register Data:     The value in a register is considered to be data.

   Reg Address:       The value in a register is considered to be an address.

   16\9bit, 10\3bit:  16 or 9 or 10 or 3 bits of twos-complement information,

                      sign-extended to 64 bits (internally) by 12864 processor.

   64bit:   64 bits of information fetched with the 64-bit generic instruction.

   Adjust:  Value in register is modified, using twos-complement information.

            If info is negative, register adjusted BEFORE instruction executed.

            If info is positive, register adjusted AFTER instruction executed.

   Offset:  Similar to Adjust, but register not modified.  Computation of the

            Offset is always performed before instruction is executed.

   NONE:    No data or address at all.

   []:      Value inside brackets is an address.  Information at that address

            is in turn used as an address.

   ,LSig64bits ,LS64:  An address holds 128 bits of data, of which the Least

                       Significant 64 bits are selected for the instruction.

   ,MSig64bits ,MS64:  The Most Significant 64 bits at an address.

   (Reg):    Distinguishes a second register that this addressing mode uses.   


   *  Recall design decision that limits instructions to 128 bits, including 64

      bits of Immediate Data or Absolute Address.  It's quite obvious that only

      one Admode Field can get to use those 64 bits.  It also works out that

      if different admodes exist in all three Admode Fields, then none of the

      *-admodes may be placed in any Admode Field.  And in any instruction that

      uses data acquired through the GET2 field, or in which GET1 is different

      from PUT...such instructions exclude the Absolute-64bit-Address mode from

      the PUT field.  (Of course, Immediate Data mode is always excluded from

      the PUT field.)  More details of these limits will be provided later; for

      now it might be noted that the reason that no 64bit-Offset modes exist is

      to avoid a lot of trouble.  It makes the programmer use more registers

      for indexing, but eliminates much competition between the Admode Fields

      for the use of the 64 bits that accompany the instruction.  Besides, you

      might be surprised by how well other instructions can replace any 64-bit

      Offset modes!  Anyway the 12864 processor will probably have 30 or more

      general-purpose registers (registers the hardware doesn't always modify

      for specific purposes, like CCS or the Stacks---or have their contents

      used for other purposes, like pointing at cache or program data).  It may

      be easy to find enough available registers for most address-pointing.

                       ( 5 )     The 12864 Microprocessor



        Now for some descriptions of the 16 admodes and their consequences:    


        Direct Modes 0 to 3 all specify data the 12864 processor has on hand, in

a register, or just loaded along-with or as-part-of the instruction.  Obviously,

these modes can be executed more quickly than the Semi-Direct or Indirect Modes.


        (0) In this admode the data needed by the current instruction is in one

or two of the registers.  The 12864 processor has 128 data lines; just because 

every register is only 64 bits wide is no reason to limit its ability to process

128 bits.  ANY TWO registers may be put together, in any order, to make a place

that holds 128 bits!  (Okay, I exaggerated; the 12864 will have both a 'Boss'  

mode and a 'Peon' mode.  Only the Boss mode can put ANY two registers together;

in the Peon mode a lot of combinations will be illegal.  And, even in the Boss 

mode, a lot of combinations will be undesirable, like using a Stack pointer with

a Cache-pointing register; the 12864 Assembler would warn the programmer.)  Note

that admode 0 merely declares that one or two registers will be used; the actual

register(s) specified are elsewhere among the many bits of this generic format.

After the processor identifies the register(s) holding the data, an offset will

be applied to that data.  The offset quantity shall be used by the 12864 in its

implementation of the instruction; the register(s) holding the data will not be

affected by the offset.  The maximum size of the offset is affected by how many

registers are used, and by the type of generic instruction being performed; the

details of this will be provided later.  The main purpose of admode 0 is to let

us eliminate the LEA (load effective address) instructions from the processor's

list of 64 generics--but certainly other uses will be found for it.            


        (1) This admode is very much like admode 0.  The only real difference is

that the content of the register(s) IS affected by this admode, which makes the

mode useful in counting loops.  One thing to keep in mind is that any negative 

adjustment is performed before the whole instruction is implemented, while any 

positive adjustment is performed after the overall instruction is implemented. 

This admode also helps us eliminate LEA instructions (details later).          


        (2) This is the only admode with a double meaning.  If admode 2 is used

within the GET1 field, then it means that the first chunk of data, needed by the

instruction, is currently in the place specified by the PUT field.  Thus data at

some location, after manipulation, will return to that location.  If we exactly

specify the same admode in both the GET1 and PUT fields (instead of using admode

2 in GET1), we end up being unable to use Immediate Data at all--you'll see!  If

admode 2 is used in the GET2 field, then it means NONE, no data for that part of

the instruction.  Operations like LSH (logical shift) use admode 2 in GET2; they

need only one main data chunk since any other data is part of the instruction's

definition.  In fact, if any admode besides 2 is in GET2 during a LSH or similar

instruction, then the admode should be ignored, or declared illegal.  If admode

2 is in GET2 during a SUB or similar instruction, then the net effect of the SUB

will be equivalent to a TST instruction.  (With lots of TST-equivalents, there 

need not be a specific TST among the 64 generics.  But the 12864 Assembler may 

include a TST, and translate it into an equivalent.)  Finally, admode 2 in the 

PUT field also means NONE, no address.  The computed result of the SUB or other

manipulation is not put anywhere, and this is useful, too!  The definition of a

CMP (compare) is exactly a SUB that doesn't save the result!  So the CMP becomes

another common instruction that the 12864 processor excludes from its list of 64

generics....  Like TST, the 12864 Assembler can include CMP, and translate it to

an equivalent:  a destinationless SUB.  Similarly, the 6809 BIT operation is an

                       ( 6 )     The 12864 Microprocessor


AND instruction with no destination.  Designers unite!  The 12864 has a full set

of destinationless instructions--and no extra complexity!  Moving on, suppose  

admode 2 is in both GET1 and PUT:  This is basically a no-operation, NOP.  Lots

of ways exist to do a NOP; the 12864 Assembler can include NOP, and translate. 


        (3) This is the Immediate Data admode.  Since the instruction is often 

64 bits long, while 128 bits are always fetched from memory, admode 3 tells the

instruction to use as data the group of 64 bits fetched with the instruction.  


        Semi-Direct Modes 4 to 7 all specify that the data the processor has on

hand are memory-addresses of the data needed by the instruction.  Admodes 4 to 7

are slower than the Direct Modes because the 12864 has to go fetch the data from

the memory, but this process is still faster than the Indirect Modes.          


        (4) This admode is like admode 0 in operation.  The main difference is 

that only one register is ever specified, since one register holds 64 bits and 

the memory addressing range is 64 bits.  But the offset is figured the same way

as admode 0, and the value in the register is not changed.  As mentioned, the  

result of the offset computation is a memory address; the data at that location

is fetched for use by the instruction.                                         


        (5) This admode combines features of admode 1 and admode 4.  Again only

one register is specified as an address-pointer, or index (4).  An adjustment of

the value in that register will be applied, pre-decrement or post-increment (1).

If you review admodes 1 and 4, this one should be pretty obvious.              


        (6) The basic addressing mode for doing 64-bit (or any size larger than

16-bit) offsets is admode 6.  One register is specified as a pointer (index) to

the general region of memory; a second register is specified that will hold the

offset from the general place to any specific place.  Furthermore, this second 

register can be given a predecrement or postincrement adjustment, which makes it

easy to skip through tables of data.  Note that although the second register is

adjustable, its value is only an offset; the first register remains unchanged. 


        (7) This admode specifies that the 64 bits fetched along with the 64-bit

generic instruction is absolute memory address of data the instruction needs.  


        Indirect Modes 8 to 11 are quite like Indirect Modes 12 to 15:  They are

computed the same way, but at some point the data at an address is used as an  

address.  Now since the data is always 128 bits and addresses are only 64 bits,

which 64 of the 128 do we use?  Thus admodes 8-11 use the Least Significant 64 

bits of the 128, while admodes 12-15 use the Most Significant 64 bits.         


        Note that all the admodes that use registers as indexes let the Program

Counter be used as easily as any other register.  The 12864 processor needs no 

special microcode to provide a host of Program-Counter-Relative admodes, due to

basic design decision making the instruction set handle all registers equally. 

The trick to consistency is for the processor to apply any adjustment or offset

to chosen index register AFTER incrementing PC past the current operation  This

in turn works due to design choice to make ALL the instructions fit in 128 bits.

                       ( 7 )     The 12864 Microprocessor



Nevertheless, the 12864 Assembler may specifically distinguish Program-Counter-

Relative admodes from the other admodes, and translate appropriately.  Finally,

note that it may be undesirable to use the PC register as a data-pointer in any

admode that will adjust the value of the index!                                


          (8) This admode first computes an address in exactly the same way as 

admode 4.  The 12864 processor then fetches the lowest 64 bits from the memory 

at that address, and uses this information as another address.  Instruction will

use the data in the memory at the second address.                              


          (9) This admode first computes an address in exactly the same way as 

admode 5.  Then an address is fetched, and then data, as just described.       


          (10) This admode first computes an address in exactly the same way as

admode 6.  Then an address is fetched, and then data, as just described.       


          (11) This addressing mode starts by using the value in a register as 

an address.  The 64 lowest bits in the memory at that address are fetched; they

will be used as a second address.  However, before they are used, an offset will

be applied to that second address.  A second register is specified, along with 

an adjustment.  The value in this predecremented/postincremented register is the

64-bit offset that is applied to the second address; the first register's value,

and the memory that held the second address, are not changed by this process.  

After computing the new, offset address, the 12864 processor fetches 128 bits of

data from that location in the memory, for the current instruction.            


          (12) This admode first computes an address in exactly the same way as

admode 4.  The 12864 processor then fetches the highest 64 bits from the memory

at that address, and uses this information as another address.  Instruction will

use the data in the memory at the second address.                              


          (13) This admode first computes an address in exactly the same way as

admode 5.  Then an address is fetched, and then data, as just described.       


          (14) This admode first computes an address in exactly the same way as

admode 6.  Then an address is fetched, and then data, as just described.       


          (15) This addressing mode starts by using the value in a register as 

an address.  The 64 highest bits in the memory at that address are fetched; they

will be used as a second address.  Then everything proceeds just like admode 11.


          Now to show how LEA (load effective address) needn't be included among

the 64 generics.  Consider admode 15:  At the end of its computations the 12864

processor has an address which it normally uses, right now, to fetch data, after

which the address is not saved.  LEA creates that address and saves it for later

use and re-use (doesn't use it now).  Suppose admode 15 specifies register 10 (f

irst), register 7 (second), and an adjustment of -58.  The Assembler translates

LEA (with syntax specifying admode 15 and the register info) into an ADD:  The 

GET1 field is given admode 4, register 10, and a 0 offset; the processor fetches

128 bits from the address (part of generic instruction we haven't got to lets us

select correct 64 bits).  GET2 field is given admode 1, register 7, and adjust 

of -58; the processor modifies the register and gives its content to the ADD.  

Then the PUT field specifies where to save result.  GET2 might have admode 0 and

get result without modifying register 7.  Any LEA can be translated!
                       ( 8 )     The 12864 Microprocessor



      At last we can continue the bit-designations of the generic instruction  

format.  Been about 1000 bytes per bit of explanation, so far...!              


      Bits 45-39 specify a Bitfield Size for the instruction.  These 7 bits can

hold any number from 0 to 127, and with 0 being interpreted by the processor as

128, it becomes possible for the instruction to operate on any data size from 1

to 128 bits.  Even though the registers of the 12864 microprocessor are only 64

bits wide, its Arithmetic/Logic Unit is 128 bits wide, and is able to handle any

data size smoothly.  So if the Bitfield Size is 79, then 79 bits will be taken 

from the place specified via GET1, manipulated (if the instruction requires it)

with 79 bits from the place that GET2 indicates, and finally a 79-bit result is

sent to the place described by PUT.  The 12864 Assembler considers the Bitfield

Size to be optional information; if it is not provided by the programmer, a size

of 128 bits will be assumed.  Some Assembler instructions, like LEA, default to

64 bits due to the nature of the instruction (LEA computes a 64-bit address).  

MUL will always have two 64-bit inputs and one 128-bit output; DIV will always 

have a 128-bit dividend, a 64-bit divisor, and a 128-bit quotient.  And whenever

Immediate Data is specified, then either the whole instruction must be limited 

to 64 bits, or the processor must allow 64 bits to be used in the manipulation 

of 128 bits.  (Perhaps we can have both:  The processor can have the ability to

do the latter, while the Assembler lets the programmer decide the former.)  One

way the programer can set the Assembler's default to 64 bits would be to simply

specify only one data-holding register in an instruction's syntax.             


      Bit 38 of the generic instruction is the Signed Extension Flag.  It tells

the processor to treat the result of an operation as a twos-complement number, 

if this bit is set.  When the result is PUT into its destination, its negative-

ness or positive-ness, as it exists within the Bitfield Size, is extended out to

the Bit-127-mark (the Most Significant Bit is numbered 127; the Least is 0).  If

only one register is specified, then sign-extending the result out to the Bit-63

-mark is the thing to do.  If the Signed Extension Flag is not set, the result 

of the instruction is simply PUT into its destination, and nothing else is done.


      Bits 37-34 contain the Do-If condition.  Practically the whole instruction

set of the 12864 processor is conditional.  This lets the programmer avoid a lot

of conditional-Branches that only skip past a few instructions.  Where formerly

some code might have:  BCS (branch if carry set) followed by a ROT (rotate) that

would be executed if the carry flag was clear, now we can specify Do-the-ROT-If

Carry Clear, and delete the Branch entirely.  In fact, with these 4 bits we can

delete the entire collection of Branch operations from the generic instruction 

set of the 12864!  The Assembler simply translates any Branch to ADD Immediate 

Data to the Program Counter, and sets the appropriate Do-If condition-bits.  Of

course, most of the time, most instructions will set the Do-If to ALWAYS.  With

only 4 bits, only 16 conditions are allowed.  This is enough for Motorola's 6809

and 68020; I hope the final design of the 12864 processor won't require more.  


      Bits 33-29 are the Flag Mask bits, the other side of the coin from the Do-

If conditions.  If every instruction can be controlled by the flags in the CCS 

register, it follows that every instruction should be able to specify which CCS

flags, if any, will be affected as a result of its implementation.  In fact, for

the Branch instructions to be properly deleted from the generic instruction set,

it is essential that flag-masking be possible.  Traditionally, Branch operations

never affect any flags; translating them into ADD instructions makes it obvious

why we require flag-masking.  Now consider again the Do-the-ROT-If Carry Clear
                       ( 9 )     The 12864 Microprocessor



that was previously described:  What if the instruction after the ROT is also to

be executed only if the Carry flag is clear?  A ROT normally affects the Carry 

flag!  So we mask the flag; the next instruction can also Do-If Carry Clear.  In

the 6809 and the 68020 there are only 5 conditions-of-results flags; I hope the

final design of the 12864 processor won't require more.                        


      Bits 28-0 (yes, all the rest) are devoted to the details of the PUT field.

However, the highest seven of them, Bits 28-22, can have another purpose.  There

is a group of operations that perform what we might call 'minor manipulations',

and which may need some minor data.  The generic instructions of this class that

I have so far identified are, in alphabetical order:  ASL and ASR (arithmetic  

shift left and right), COPY, INIT (initialize), ISUB (subtract from an initial 

value), LSR  (logical shift right; LSL = ASL), ROL and ROR (rotations), and    

SWAP.  ASL, ASR, LSR, ROL, and ROR need data ranging from 1 to 128; INIT, ISUB,

and sometimes COPY, need twos-complement numbers ranging from -64 to +63.  The 

specification of 7 bits was decided by the needs of ASL, ASR, LSL, ROL and ROR;

other instructions are merely taking advantage of what is already there.  Only 

SWAP does not need any of those seven data bits.  We could have assigned eight 

bits to ASL, etc.; twos-complement numbers from -128 to +127 (with zero = +128)

would let us reduce the list of generic instructions even more.  Unfortunately,

we are running out of bits!  So we can either assign 5 of 64 generic operations

to various kinds of bit-shift, and use 7 bits to describe the size of the shift

--or we can have 3 generic shift operations and use 8 bits to describe the size

of the shift.  But ONLY those 3 generic instructions ever really need that 8th 

bit!  It seems more reasonable to use an extra 2 of the 64 generic instructions.


      Let's examine some of the capabilities of these 'minor' manipulators:    


        ASL (and the identical LSL) merely shift bits from Least Significant to

Most Significant.  The Bitfield Size determines how many bit-positions will be 

involved in the shift.  There is also some Bitfield Start data (which we haven't

got to yet, but has to be mentioned NOW) that specifies exactly where among the

128 bits the Bitfield Size is located.  The 12864 Assembler needs to scrutinize

these things carefully; we can't let Bit 100 be the Start while the Size is 34 

bits, nor let the Size be 52 bits while the Shift is 73 bits!  One final thing 

about ASL and LSL:  Perhaps they shouldn't be so identical.  The 6809 processor

defines them so that there's no reasonable difference between an ASL and an LSL.

But the 68020 places a new flag in the CCS register, an eXtend flag designed to

hold a bit of data specifically for arithmetic operations.  The Carry flag holds

data for both arithmetic and logical operations.  Yet LSL and ASL both affect  

the X flag!  So perhaps a distinction can reasonably be made:  Only ASL should 

affect X.  (ASR and LSR also have this small irrationality.)                   


        ASR and LSR are similar to ASL, of course, their main difference being 

that these instructions shift bits from Most Significant to Least.  More details

of what they do need not be presented here; they all are common instructions.  

But we might note that the power of the 12864 processor lets us get data from  

just about any place in the computer (using GET1), shift or otherwise manipulate

any part of that data, and then PUT the result almost anywhere else, all in just

one instruction.  The mundane turns into the extraordinary.                    


        INIT lets us initialize a register, or registers, or data at any memory

location, such that it becomes a 64-bit or a 128-bit expansion of any number in

the 7-bit range of -64 to +63.  INIT replaces CLR (clear), which initializes a
                       ( 10 )     The 12864 Microprocessor



data-storage place to zero only; now we can initialize to 1, or -1, or to any of

more than a hundred possibilities.  Note that INIT never needs any GET1 info.  


        ISUB replaces both NEG (negate) and NOT, which respectively subtract a 

number from 0 or -1.  ISUB subtracts numbers from anything in the Initial Value

range of -64  to +63.  Finding other uses for this operation is not so important

as consolidating NEG and NOT into one generic instruction.  The Assembler will,

of course, retain both NEG and NOT, and translate appropriately.               


        ROL and ROR are pretty much like the shift instructions.  The 68020 has

another sort of rotation called ROXL and ROXR, but the 12864 may not need them.

First examine the rotation operation of the 6809:  The Carry flag is always part

of the rotation; a bit coming off one end of a byte is moved to Carry by one ROT

and moved out of Carry back into a byte by another ROT.  In the 68020 a simple 

ROT moves a bit from one end of a location directly to the other end; a copy of

that bit is placed in the Carry Flag.  ROX, on the other hand, uses the eXtend 

flag the same way the 6809 uses the Carry.  In the 12864 processor we can mask 

flags that an instruction would normally affect.  Suppose the 12864 rotation is

designed to normally flag both X and Carry:  If we mask Carry, no copy is sent 

there; if we mask X, the bit that normally moves through it simply bypasses it.

(Similarly, the 12864 can have one generic ASL/LSL operation, but the Assembler

can mask the X flag for LSL--if the notion proposed earlier is adopted.)       


        The COPY instruction replaces LoaD, STore, TransFeR, INC, DEC, JMP from

the 6809, and COPY also replaces MOVE from the 68020.  Even some LEA operations

can be translated to COPY.  The GET1 admode field lets us specify any place in a

12864-based computer from which to fetch data (and any number of bits from 1 to

128); the PUT field lets us specify almost any other place to receive a copy of

those bits of data.  What could be simpler and more powerful?  To replace INC  

and DEC, GET1 can specify admode 2 -- same as PUT.  When GET1 holds 2 while COPY

is being processed, the 7-bit Initialize-data will be used to modify the place 

specified by PUT.  Instead of only -1 or +1, the INC/DEC can now range from -64

to +63 -- even to +64 if the value of zero is interpreted thus (it's no good for

anything else!).  Some LEA instructions that the Assembler translates into COPYs

will have admode 0 in GET1, a specified register, and a 16-bit offset ranging  

from -32768 to +32767.  PUT would specify the same admode and register, and an 

offset of zero.  Masking the flags is normal for LEA.  Larger offsets can become

ADD Immediate Data to a register, with the flags masked.  JMP instructions are 

translated into COPY to the PC register, with masked flags--and remember that  

any JMP can now be conditional!  Load and store and transfer and MOVE operations

become COPY memory to register, reg. to mem., reg. to reg., and mem. to mem.   

Another 68020 instruction, PEA (push effective address), may be unneeded in the

12864.  It has the effect of computing an address and saving in a place that is

NOT a register, for later use (most likely by the Program Counter, since there 

isn't a LEA-to-PC instruction in the 68020).  In the 12864 processor, we simply

specify the Program Counter's register-number in the PUT-field data if we want 

to LEA-to-PC.  Otherwise we can PUT the EA almost anywhere else, for later use.

                       ( 11 )     The 12864 Microprocessor



        SWAP is similar to COPY, in that the GET1 data specifies one place while

the PUT data specifies another.  However, as the names imply, they do different

things:  The 12864 SWAP replaces both the 68020 SWAP and EXG (exchange); data in

the PUT place is sent to the GET1 place, as well as the usual GET1-to-PUT.  Two

thing to note about SWAP are that register-adjustments of zero, in the specified

admodes, will probably be common, and the CCS flags will usually be masked.  But consider that if the GET1 admode is 2 (same as PUT), then nothing happens.  This

may be the ideal thing for the Assembler to translate a NOP into.  And if the  

flags are NOT masked while the GET1 admode is 2, during the generic SWAP, then 

this may be the ideal thing for the Assembler to translate a TST into.  (If the

flags aren't masked during a normal SWAP, then they will be affected only by the

data going from the GET1 place to the PUT place.)                              


      Now back to Bits 28-0 of the generic instruction; as mentioned, they hold

the details of the PUT field data; we shall begin with Bits 0-6.  These specify

the Bitfield Start for the PUT field, from 0 to 127.  After the 12864 processor

analyzes the identity of the place where the result of an instruction is to be 

PUT, the Bitfield Start tells it exactly where in that location the result goes.

For most instructions, most of the time, the value here will be Zero.          


      Bits 7-12 specify the number of the first register needed to identify the

place where the result is PUT.  In other words, if Register 7 is the destination

of the data, then a 7 will be here (admode 1 in the PUT field).  To modify flag

bits in the CCS register, simply set a Bitfield Size of 5 (for 5 flags), the CCS

register's number here, and a Bitfield Start of zero (assuming the designers put

the CCS flags in the lowest bit-positions of the register).  If a memory address

indexed by Register 15 is the data's destination (admode 4 or 5 in PUT), then 15

will be the number placed here.  Bits 7-12 can hold any number from 0 to 63, and

as mentioned early in this essay, the 12864 will probably only have 45 registers

or so, total.  Anything more than the highest register number would be illegal,

of course, even in the Boss mode!  If admode 2 or admode 7 is specified in the 

PUT field, then the processor would ignore any register-number in these bits.  

Admode 3 would be another, except it is illegal in the PUT field.              


      Bits 13-28 specify the offset or adjustment to be applied to the register.

indicated in Bits 7-12.  At least this could be true for instructions OTHER than

ASL, ASR, COPY, etc., because only OTHER instructions never need the 7 bits from

22-28.  An index register being used with a ROL instruction can only have a nine

bit offset or adjustment applied to it (in Bits 13-21, of course).   HOWEVER, it

can be worse!  Bits 13-18 may specify a second register altogether!  For admodes

0 and 1, any Bitfield Size 65 or more, we must specify 2 registers.  For admodes

6, 10, 11, 14, and 15, a second register is a normal part of address-indexing. 

(At least those admodes get a 64-bit offset from the second register, applied to

the first register.)  After a second register has been specified, only the bits

from 19-28, or from 19-21, can be used as an offset or adjustment to the second

register (a 10-bit or a 3-bit modification, respectively).  Here is a chart:
                       ( 12 )     The 12864 Microprocessor


           |2           2|2   1|1         1|1          |             |

           |8| | | | | |2|1| |9|8| | | | |3|2| | | | |7|6| | | | | |0|

           |  16-bit offset/adjust         | First     |  Bitfield   |

           |  applied to first Register    | Register* |  Start      |

           |-------------------------------|           |  for PUT    |

           | ASL, ASR,   | 9-bit off/adj   |           |  only       |

           | COPY, INIT, | to 1st Register |           |             |

           | ISUB, LSL,  |-----------------|           |             |

           | LSR, ROL,   |3-bit| Second    |           |             |

           | ROR  data   | ad- | Register* | *will be  |             |

           |             | just| admode 0, | ignored   |             |

           |             | to  | 1, 6, 10, | in admode |             |

           |             | 2nd | 11, 14,   | 2, 7      |             |

           |             | Reg.|  and 15   |           |             |

           |-------------------|           |           |             |

           | 10-bit offset or  |           |           |             |

           | adjust to 1st or  |           |           |             |

           | 2nd reg, depending|           |           |             |

           | on the admode.    |           |           |             |         


      And just to be complete:                                                 

    |6         5|5     5|5     5|4     4|4           3|3|3     3|3       2|   

    |3| | | | |8|7| | |4|3| | |0|9| | |6|5| | | | | |9|8|7| | |4|3| | | |9|   

    |  12864    | GET1  | GET2  |  PUT  |  Bitfield   | | Do-If |  CCS    |   

    |  Instruc. | admode| admode| admode|  Size, for  | | Con-  |  Flag   |   

    |  Code     |       |       |       |  entire     | | dition|  Masks  |   

    |           |       |       |       |  operation  | |       |         |   




      Having used up 64 bits of the normal 128-bit fetch by the 12864 processor,

it's obvious that to provide details of the admodes specified for GET1 and GET2,

we will need to use the other 64 bits.  Now it has already been stated that they

are supposed to hold Immediate Data or an Absolute Address; the potential for  

conflict is obvious!  This conflict is the main reason admode 2 was created:  It

makes the GET1 field use the admode in the PUT field, thereby eliminating any  

need for any specific GET1 information among the second 64 bits of the operation

fetch.  And if the GET2 field specifies admode 3 (Immediate Data) or 7 (Absolute

Address), then THAT is all the GET2 information needed, and the instruction can

be properly executed.  So the main restrictions of limiting the GET1/GET2/PUT  

system to a total of 128 bits are these:  (1) We can't combine Immediate Data  

with more Immediate Data; (2) We can't combine Immediate Data with data at an  

Absolute Address; (3) We can't combine the data at two Absolute Addresses; and 

(4) We can't use Immediate Data or an Absolute Address in any instruction where

the GET1 admode is different from PUT.  How much does it matter that we can't do

these things?  We already can't do them with any current processor, right?  What

we CAN do is far more important:  Not only can we combine Immediate Data or the

content at an Absolute Address with the content of any register (normal for any

processor), we can also combine our Immediate/Absolute information with the data

at any place in the memory that can be index-referenced -- and save it too!  The

typical 12864 program will probably be position-independent, anyway, and seldom

need Absolute Addressing.  It likely will start by loading several registers   

with the addresses of a number of data tables, all relative to the PC register.

No Immediate Data there!  Then the remaining registers will become variable-   

holders, and use Immediate Data as needed, just like any other program.
                       ( 13 )     The 12864 Microprocessor



      So to be a little more specific about how GET1 and GET2 information is set

among the second group of 64 bits, let's first note that it took all of 29 bits

for the PUT information.  Keeping that the same for GET1 and GET2 means that 58

of the 64 bits get assigned real quick!  Suppose we assign the Least Significant

32 bits to the GET1 information, and the Most Sig. 32 to the GET2 information. 

This leaves 3 bits extra for GET1 and 3 extra bits for GET2.  The most obvious 

thing to do with the extra bits is to expand the offset/adjustment data (from 16

to 19 bits, for example), but perhaps they can be used for something else.  Note

that the ASL, ROL, etc. data takes space away ONLY from the PUT information.  A

possible use for one of the extra bits is that of being a flag controlling the 

Bitfield Start data:  If the flag is zero, then the seven bits hold the number 

of the starting bit; if the flag is one, then six of the seven bits specify a  

register-number where the information on the starting bit is to be found.  It  

would have been nice to have had enough bits to do this to the PUT field, but it

may not be missed too much, since the PUT field's Bitfield Start is likely to be

zero most of the time, anyway.  So here is one more chart:


  The GET2 info duplicates this GET1 info, except Bit-numbers range from 32-63.

       |3|3                     1|1         1|1          | |           |

       |1|0| | | | | | | | | | |9|8| | | | |3|2| | | | |7|6|5| | | | |0|

       | |  18-bit offset/adjust, applied    | First     | | Register  |

       | |  to First Register                | Register  | | holding   |

       | |-----------------------------------|           | |-----------|

       | | 12-bit off/adj to     | Second    |           | Bitfield    |

       | | 1st or 2nd Register,  | Register  |           | Start data  |

       | | depending on admode   |           |           | for GET1    |


      Flag determining use of register to hold Bitfield Start data             


    Now let's consider a few things about the 12864 Assembler.  Obviously it's 

going to recognize many common assembly instructions, and translate them into  

the far fewer set of generic instructions recognized by the processor.  The set

of 12864 instructions may be enlarged, simply to take advantage of the possible

list of 64.  Ordinary instructions like ADD, SUB, ADDC (add with carry), SUBC, 

ABCD (add Binary Coded Decimal), SBCD, OR, EOR (exclusive or), and AND may be  

supplemented with NOR, ENOR, and NAND.  I don't propose to offer a complete list

here; let the Industry decide all the final details.  The main thing that needs

some attention right now is the format of the Assembler instructions; each will

occupy a fair amount of space!  But this is reasonable, considering that a 12864

instruction will usually equal 2, and often 3, regular-processor instructions. 

All the information from 3 regular lines of Assembly code, plus some new stuff,

has to fit on 1 line in this proposed Assembler format:


   |Label|Instruc |Bitfield|ASL, etc| GET1 | GET2 | GET3 |Do-If|Flags |Comment

   |field|Mnemonic| Size   |7bt data|admode|admode|admode|cond.|masked| field


    The Label field gives this place in a program an optional name, so that it 

referred to from other places in the program, if desired.                      


    The Instruction Mnemonic is, of course, the name of the instruction.       


    Bitfield Size (BfSz) is simply a number 1-128; if this part of the Assembly

format is blank, a 128 Size is assumed -- but Admode data may change it to 64.
                       ( 14 )     The 12864 Microprocessor



    7-bit data is required in this area whenever the Mnemonic is ASL, ROL, etc.

The nature of this data has already been described.  Note exceptions like SWAP 

and COPY, which the Assembler knows never needs this data.  The Assembler offers

INC and DEC instructions that will require 7-bit data; the programmer never need

see this get translated to COPY.  Exceptions are peculiar, aren't they?!       

      Below are examples of the syntax for the addressing modes:               


Admode  Syntax                    Explanation


0000    16;+20.33      Register 16 has data.  A Bitfield Start (BfSt) of 33 is

                       specified, so data extracted from 16 starts at Bit 33

                       (BfSz specifies how many bits).  Extracted data will

                       have 20 added to it (register not affected), before

                       being given to the current instruction.  An assumed BfSz

                       of 128 would change to 64 (one register specified) minus

                       32 (due to the BfSt).  Conflicts cause Assembly errors


0000    6 9.18         Two registers have data: 6 is Most significant; 9 is

                       Least significant.  9 is the First Register in Bits

                       7-12 of the Specific Information area; recall charts.

                       Data extracted from registers starts at Bit 18.


0000    20:10          Data in register 20.  BfSt in register 10.  Note that a

                       period denotes exact BfSt data; a colon means a register

                       has the data.  BfSt-in-a-register is illegal in PUT.


0000    10 11          Two registers have data.  BfSt is assumed to be zero.

                       Note spaces denoting registers.  Other admodes will use

                       commas, and admode fields must be tabulation-separated. 


0000    7 3;-123       128 bits of data available in registers 7 and 3.  BfSz

                       determines how many are extracted.  BfSt assumed to be

                       0.  123 subtracted from it before instruction gets it.


0000    URHERE PC:12   The Assembler will accept either 'PC' or the actual

                       number of the PC register (as yet unknown!).  Assembler

                       computes offset between content of PC register (what it

                       will be at end of instruction) and place in memory that

                       is designated by label 'URHERE'.  Suppose PC register is

0000    PC;-87:12    \     34, and the programmer knows that the offset is -87:

0000    URHERE 34:12  >  would be identical to  URHERE PC:12.  PC is the only

0000    34;-87:12    /     register to which labels can be referenced, because

                       it is the only register that has it value known at all

                       times by the Assembler (relative to Origin of program).

                       Note the :12 means BfSt is in register 12 (no, I don't

                       know why the programmer wants that in this example!).


0001    20+20.33       Check first example; note lack of semicolon here.  Plus

                       or minus sign mandatory for all offsets and adjustments.

                       With plus adjust, data first goes from register to the

                       instruction.  At END of instruction, data in register

                       is adjusted.  With minus adjust, register is adjusted

                       before data extracted from it for use by instruction.
                       ( 15 )     The 12864 Microprocessor



0010                   To specify admode 2, simply leave the field BLANK!


0011    #123456        Immediate data preceded by #.  A + or - is optional.


0100    ,20;+20.33     Check first example; note extra comma here.  Register 20

                       now being used as index, with an offset of 20 applied to

                       it.  The offset value (index+offset) is the address from

                       which data is removed, starting at Bit 33.  The data may

                       extend to Bit 127, depending on the BfSz.  Note that

                       initial index is always just ONE register.


0100    ,14:2          Register 14 is index of address holding data; register 2

                       has data on where is the BfSt.  Specifying offsets,

                       adjustments, or bitfield starts is always optional.


0100    URHERE,PC      Note use of comma.  Assembler computes offset between PC

                       and URHERE, as before.  In previous exmple the ADDRESS

                       of URHERE was the information (ignoring fact that BfSt

                       specified in that example made the address useless!);

                       now the memory content at that address is the data.


0101    ,20+20.33      Check similar examples.  Here register 20 is an index

                       which is used to fetch data.  Afterwards, an adjustment

                       of +20 is applied to the register.  If the adjustment is

                       negative, it is applied to the index before the index is

                       used as an address-pointer that tells us where data is.


0110    ,10 12-14:5    Register 10 is the index, holding an address.  Register

                       12 has a 64-bit offset to that address.  Before offset

                       is applied, register 12 receives adjustment of -14.  The

                       address thus found (by applying adjusted offset to

                       register 10) is the address of the data, which will be

                       accessed using the BfSt data in register 5.


0111    >123456        Absolute Address always preceded by > symbol.  Out of

                       18.4 quintillion possibilities, this one is pretty low!


1000    [,20;+20]L.33  See admode 0100; the part inside brackets is figured in

                       exactly the same way, resulting in an address.  Exactly

                       64 bits are always extracted from that address in this

                       admode.  They are the Least Significat 64 bits, as the L

                       indicates.  The 64 bits are then used as the address of 

                       the data, which starts at Bit 33.


1000    [URHERE,PC]L   The lowest 64 bits of the data at address URHERE are

                       extracted and used as an address.  The instruction gets

                       its data from the address thus found.  The L (or M, in

                       admodes 12-15) is a mandatory part of the syntax.  All

                       programmer does is provide correct syntax; the Assembler

                       will deduce from that syntax the admode number, and the

                       specific info, that are built into the instruction.
                       ( 16 )     The 12864 Microprocessor



1001    [,25-2222]L:13 The value in register 25 is adjusted by -2222 (maximum

                       can be -32768 in PUT before an assembly error occurs, or

                       -131072 in GET1 or GET2), and then the adjusted index is

                       used to fetch an address (least significant 64 bits).

                       In turn the fetched address is used to fetch the needed

                       data, using the BfSt in register 13.


1010    [,5 9-873]L    See the example for admode 6 (0110); bracketed syntax is

                       analyzed the same way, this time using register 5 as the

                       basic index, register 9 as holding the 64-bit offset,

                       and -873 as the adjustment applied to the offset, before

                       the offset is applied to the index.  The address thusly

                       computed is the place from which the Least 64 bits are

                       taken and, in turn, are used as the address to fetch the

                       data.  Note that -873 is too big for PUT information,

                       but would work as GET1 or GET2 information.


1011    [,18]L 6+3     Value in register 18 is used as an address to fetch an

                       address from the memory.  Least significant 64 bits are

                       taken from memory to become an address.  Register 6 has

                       a 64-bit offset, which is applied to the extracted

                       address.  The thusly-computed new address is the place

                       where data will be found.  (I say 'found' or 'fetched',

                       but address is also a possible place to PUT the data.)

                       Afterwards, register 6 is adjusted by +3.  This example,

                       if in the PUT admode field, and if the instruction is

                       LSL or one of that group, is using the largest positive

                       allowable adjustment (3 bits, twos-complement).  What's

                       the chance of having only 32 generic instructions, so

                       we can move a bit to the PUT information field?


        I don't think I need to provide any examples for admodes 12-15; they

     are identical to admodes 8-11, with the sole exception that the letter L

     in the syntax is replaced by M.  The Assembler uses L and M to determine

     the correct admode; the 12864 processor uses the admode to determine that

     either the Least Significant or the Most Significant 64 bits are to be

     taken from the memory and used as an address.  This process has absolutely

     nothing to do with Bitfield Sizes and Bitfield Starts.


        It should be repeated that these examples are only a proposal; thinking

     about them is bound to lead to speculation about how easily the programmer

     can make a mistake by forgetting a comma.  A whole different syntax might

     be created just to reduce the chance of such accidents, perhaps one where

     mnemonic letters replace the commas, periods, colons, and semi-colongs --

     even lower-case letters, to prevent confusion between O/offset and 0/Zero.

     This syntax simply attempts to make the admode-field information compact.
                       ( 17 )     The 12864 Microprocessor




      The next field of the Assembler format, after the PUT admode field, is the

Do-If condition.  Two letters suffice to abbreviate the possible conditions (at

least only 2 letters if Motorola's list is used):  HI (higher); LS (lower or   

same);  CC (carry flag clear); CS (carry set); NE (not equal to zero; zero flag

clear); EQ (equal; zero flag set); VC (oVerflow flag clear); VS (oVerflow set);

PL (plus); MI (minus); GE (greater than or equal to zero); LT (less than zero);

GT (greater than zero); and LE (less than or equal to zero).  This list totals 

14 possibile Do-If conditions; with a maximum of 16 allowed, the last two are  

usually Do Always and Do Never.  For the purpose of the Assembler format, the  

Do Always condition can be the default if the Do-If field is simply left blank,

but it wouldn't hurt to allow a DA abbreviation.  A DN abbreviation is logically

sensible, but practically almost useless -- a NOP for sure!  (If the Assembler 

converts NOP to SWAP, as proposed, obviously the Do-If would be Never!).  Maybe

some other Do-If condition can be created, just to use that 16th possibility.  


      After the Do-If field in the Assembler instruction format is the Flag Mask

field.  Motorola's flags are abbreviated X, N, Z, V, and C, so simply putting an

appropriate letter (or letters) in this field should tell the Assembler that you

don't want a particular flag to be affected by the current instruction.  Simply

entering  ZCN  without any punctuation should be adequate to specify the Carry,

Zero, and Negative-sign flags, for example.  Now consider the opposite notion: 

Some Assembler instructions, like LEA, will be translated into other operations,

and the flags will automatically be masked by the Assembler during translation.

In the 6809 processor there are two registers Y and U, which are not treated the

same by LEA instructions.  LEAY will affect the Zero flag, while LEAU will not.

The idea is to let register Y be used in counting loops, and it works fine.  The

12864 Assembler could allow the same sort of thing:  If the programmer specifies

the Z flag in the Mask field during an LEA instruction, then the Assembler WON'T

mask the flag!  More precisely, what is happening is the programmer telling the

Assembler to reverse its normal handling of the 12864 flagmask bits.  If the   

Assembler usually doesn't mask a flag, then it will be masked -- and vice-versa.


      The last field of the Assembler format is the Comment field, in which the

programmer is supposed to explain the purpose of the instruction.  This field is

completely ignored by the Assembler, of course, during the task of creating the

machine code for the 12864 processor from the assembly source listing.         


    And now my two-cents-worth on the hardware of the 12864 computer; if what I

am about to say is really worth as much as two cents, I'll be surprised!  The  

average computer has a System Clock that controls the timing of everything that

goes on in the computer.  The average microprocessor accesses the memory every 

(fill in blank) cycles of the System Clock, on the average.  The remaining clock

cycles are spent by the processor processing the data it has accessed.  Some of

the newer processors have 'preprocessors' built into them, so they can access  

the memory significantly more often.  The preprocessors begin working on future

instructions before the main processor finishes the current instruction; it is 

known as 'pipelining', I believe.  The 12864 will be both similar and different

to this scheme.  It'll likely have one main processor for the main instruction,

and 3 subprocessors to handle the data represented by GET1, GET2, and PUT.  It 

figures that if the average 12864 instruction is as complex as 2 or 3 regular- 

processor instructions, the 12864 may have to do as many memory-accesses as 2 or

3 'regulars'.  Yet by processing GET1, GET2, and PUT simultaneously, the 12864 

is essentially doing the work of the 'pipeliners'.  Whether or not pipelining of
                       ( 18 )     The 12864 Microprocessor



the current sort is actually built into the 12864 remains to be seen.  In the  

meantime, though, the 12864 is still going to spend a number of clock cycles in-

between memory-accesses, during which it is processing the accessed data.  Since

it is fairly obvious that the more often a processor can access the memory, the

greater the performance of the computer, the standard trick is to increase the 

speed of the System Clock, and building both processors and memory chips to keep

up.  Nevertheless, this does not change the fact that the processor spends many

clock-cycles NOT accessing the memory!  And I get the impression that the memory

chips are not keeping up with the processors, in the speed race.  So here is my

suggestion:  Build the 12864 with a faster clock than the System Clock.  It will

have to hold its outside lines open for more than one internal clock cycle each

time its subprocessors access memory (to stay in sync with the System Clock),  

but while it is doing that, its main processor can be manipulating previously- 

accessed data.  With proper planning the 12864 should be able to access memory 

almost every cycle of the System Clock, at the memory's maximum possible speed.


    I have been saving the thorniest problem for last (at least I think the end

of this essay is approaching!), and it concerns the hardware's management of the

data.  The first part of the problem is this:  While most 12864 instructions are

128 bits long, many will be fully described in only 64 bits.  So do we make the

processor skip the other 64 bits, and move on to the next memory location, or do

we scheme to fit another whole instruction in those 64 bits?  My inclination is

to ignore the 64 bits, UNLESS it 'just happens' that two adjacent instructions 

in the assembly source listing can both be reduced to 64 bits.  In other words,

what the processor would do is load 128 bits, discover that the first 64 of them

comprise a complete instruction, execute that instruction, and test the next 64

bits to see if they also comprise a complete instruction.  If they don't, they 

will be ignored, and the processor will load 128 bits from the next address.  It

would be worth having this scheme just to give the programmers a chance to prove

they are clever enough to always make full use of it.  Any programmer who NEVER

attempts to conserve memory should be fired!  (And so what if there are more   

than 18.4 quintillion memory locations -- waste is waste.)                     


    The other aspect of the memory management problem concerns the Stacks, which

are places where random numbers of registers are temporarily stored.  If each  

address a Stack register points at holds 128 bits, and each register being saved

is only 64 bits wide, then it seems at first obvious to always put 2 registers 

at the Stack address.  But many times an odd number of registers will be saved;

what then?  The very simplest answer is to always only store 1 register at each

Stack address, and ignore the obvious waste, because this way the processor can

never get confused.  The next-simplest answer may be to REQUIRE the programmer 

to always PUSH or PULL an even number of registers when using the stack -- even

a JSR (jump to subroutine) instruction would have to save another register with

the Program Counter, just to keep the total even.  I think I may recommend this

particular solution (would you believe I have been worrying about this since the

middle of this essay, and just now have come up with the idea?).               


    The bit-code format of instructions like JSR, BSR, PSH, PUL, and MOVEM can't

be the same as the format for most 12864 instructions.  The main reason is, as 

mentioned, that the instruction has to incorporate a list of registers -- but it

works out OK, because much of the instruction is predefined.  Before we get into

any details of that, though, let us examine the Stacking system a little closer.

In the 6809 there are two Stack registers, one of which is always used by the  

hardware to save JSR and interrupt information, and one of which the programmer
                       ( 19 )     The 12864 Microprocessor


can use for other things.  There are occasions when having two Stacks is really

convenient, notably when moving large blocks of data around.  In the 68020 there

are three Stack registers, one for the Boss mode, one for the Interrupt mode,  

and one for the Peon.  Two bits in the CCS register are devoted to keeping track

of which Stack the hardware is using at the moment, so if it had been wanted, a

fourth Stack could exist in the 68020.  This seems worth putting in the 12864. 

And another thing:  TWO CCS registers!  One would be a Boss mode CCS that keeps

track of things like the current Stack being used and interrupt-control flags, 

as well as the list of registers to be saved during an Interrupt, as proposed at

the beginning of this essay.  The other would have the instruction-result flags

in it and some other stuff.  MOST of that other stuff is another register list,

like that in the Boss CCS.  Thus when a GSR instruction is used (generic for JSR

and BSR:  go to subroutine) a list of registers could specified that would be  

saved in the Peon CCS.  Here is a proposed bit-map for GSR:


             |6         5|5     5|5     5|4     4|4                |

             |3| | | | |8|7| | |4|3| | |0|9| | |6|5| | |.....| | |0|

             | Code For  | GET1  | GET2  | Do-If | Register List   |

             |   GSR     |cannot |       | (PUT  | Note Peon CCS   |

             |Instruction|  be   |       | is PC | and PC registers|

             |           |admode |       |always)| not on list;    |

             |           |   2   |       |       | always saved.   |


        If GET2 is admode 2 then data specified by GET1 is copied to PC --

        equivalent to JSR.  If GET2 is any other admode then the data it

        specifies is added to the data GET1 specifies, and the result is

        copied to PC.  If GET1 specifies PC then we have a BSR equivalent.

        The CCS instruction-result flags are NEVER affected by this one.

        Normal limitations:  No adding Immediate Data to Absolute Address!     


      It is worth noting that the Register List, from 0 to 45, is in agreement 

with the early estimate of approximately 45 registers total for the 12864.  If 

there are any registers that we can be sure NEVER need to be saved during a GSR,

even during the Boss mode, then we can have a few more than the 48 implied here.

When executing a GSR, the processor would copy the specified register list to  

the Peon CCS register, save them all on the current Stack, THEN save both the PC

and Peon CCS registers.  When an Interrupt occurs, the last two registers saved

would always be PC and the Boss CCS (although the Peon CCS would be saved just 

before then).  One bit in the same place in the two CCS registers would serve to

identify which is which; this bit cannot be allowed to be changed by anything. 

Then when the generic RTN (return) instruction is executed, 128 bits of PC and 

CCS data would be taken from the memory; the correct CCS would be identified,  

and the correct way of returning would follow.  One thing to note about RTN from

a subroutine:  The instruction is almost completely pre-defined.  The only odd 

thing is that the values of the instruction-result flags in CCS BEFORE the RTN 

occurs have to be preserved while CCS data is being loaded from the Stack during

the actual RTN operation.  Unless various flag-masks are set by the programmer!

The bit-coding of RTN only needs 6 bits for the instruction, 4 bits for Do-If, 

and 5 bits for flag-masks (flags the programmer does not want preserved during 

the RTN from a subroutine); the rest of the 64 bits can be ignored.  Programmers

should be wary of specifying any flag-masks for RTN at the end of an Interrupt 

handling routine, since here the normal thing for the processor to do is to NOT

preserve the flags, as they exist at the end of the Interrupt handler.  Masking

them would mean transferring Interrupt data to the interrupted program.  This  

would be OK if the interrupted program was specifically waiting for such....
                       ( 20 )     The 12864 Microprocessor


    PSH, PUL, and MOVEM-type instructions can all be combined into one generic,

I think, that we can call STAK.  The bit-coding for it might be like this:


       |6         5|5    |5|5     5|4     4|4                    |

       |3| | | | |8|7| | |4|3| | |0|9| | |6|5| | | |.....| | | |0|

       |   STAK    |Con- | | Do-If |  PUT  |  Register List, to  |

       |instruction|trol | |       |       |    be stacked or    |

       |   code    |Bits | |       |       |      unstacked      |

       |           |     | |       |       |                     |

       |           |     | |       |       |                     |

                       ^  ^            ^

       PUT specifies the address where the stack is to start.  If LOCATION

       OF ADDRESS is in memory somewhere, one Control Bit denotes L or M

       for the 128 bits at that location, from which the stack's address

       will be fetched -- no bitfield specs!  After STAK is finished, the

       PUT place is given a new value, indicating the new start of the stack.

       (Immediate Data still forbidden in PUT, of course.)  One Control Bit

       specifies top or bottom of stack; another Control bit specifies data

       being added to or removed from the stack.  As always, only an EVEN

       total number of registers may be specified.  Bit 54 means that the

       Peon CCS register is part of the stack operation.  STAK never affects

       flags, except when loading CCS from this kind of stack.  (I forgot to

       say, details of PUT can be in other 64 bits of the instruction fetch.)  


    That about wraps it up, I guess.  Any inconsistencies you may have noticed 

are due to the fact that this is only a proposal, and therefore does not need to

be perfect.  Only if the Industry decides to get together to create a standard 

microprocessor along these lines would it be necessary to get really finicky on

all the details.  And what do I want out of this?  First of all, I want to beat

the NIH Syndrome:  'If it is Not Invented Here, we are not interested!'  Except

for the fact that computers I own and know well happen to have 6809s in them, I

am not associated in any significant way with any company in the entire computer

industry.  I will claim the credit for dreaming up this thing, just to prevent 

anyone else from doing so -- and just to prevent any person or any company from

claiming ownership of it, I am quite deliberately placing this whole concept in

the public domain, as of NOW.  Thus the whole industry starts off on an equal  

basis with respect to the proposed 12864 microprocessor, and there should now be

no barrier to creating an industry wide standard.  I am knowingly forfieting all

legal claim to any compensation for these ideas, just to prove I seriously want

the Industry to get its act together.  On the other hand, any 'royalties of    

conscience' that might come my way will be gladly accepted!                    


                              Vernon Nemitz

                              March 17, 1991