Design of asynchronous CPU

Design of asynchronous CPU

Design Problem: 

In this project, we are designing an asynchronous processor having following main features:

  • 10-bit address
  • 16- bit data bus
  • On-chip 1Kbyte ROM
  • On –chip 1Kbyte RAM
  • Instruction Decoder
  • 16 bit ALU
  • A register set of two registers, an Accumulator and a Flag Register.

Block Diagram:

The CPU comprises of five units in whole all of which are designed on the lines of the basic micropipeline structure. The five modules communicate with each other with the help of Asynchronous handshaking. The five modules are:

  • Increment PC
  • Fetch
  • Decoder
  • ALU
  • RAM

Apart from these units, there is a register set which comprises of Registers A, B, Accumulator and the flag register. The decoder, ALU, RAM and register set modules are combined finally to form a single block execute.

Screenshot 00.49.36

The block diagram of the execute unit is drawn below.Screenshot 00.52.57

The Increment PC unit generates addresses to be read from the Read Only Memory. The adder in this unit adds one to the output value. On requesting the Increment PC Unit, the value at the output is incremented by one. This address is fed into the fetch unit which gives as output the instruction. This instruction is then decoded in the decoder which generates the control signals to be fed into the ALU, RAM and the register set.

Instruction Set:

The instruction set has been designed keeping in view that the controller is mainly required to do computation and I/O functions. The instruction for division has not been included because the algorithm can be so formulated as to require division by 2n only, which can be done shift right operation. The instruction set has following features.

All instructions are one word (i.e.16 bit) instructions except one instruction which requires 2 words.

The instructions can be broadly divided into four groups.

  • Data Transfer instructions
  • Arithmetic and logic instructions
  • Branching instructions
  • Control instructions

Increment PC:

The increment PC module has simply an adder as its logical block. The address generated by the adder is latched through the ECR of this module. The latched address pins are fed back at the input of the increment PC again. One of the inputs of the adder is a 1 always so as to generate the next address at the output of the adder. The adder’s output pins form the input to the ECR that latches the next address as soon as a request is made to the module. In such a manner the addresses are generated.

The complete synthesized Increment PC module is shown below:

Screenshot 00.59.42

The simulated output of the increment PC is shown below:

Screenshot 00.59.50

Fetch circuitry:

The logical block of the fetch circuitry consists of the Read Only Memory. The address generated by the Increment PC module is fed into this circuitry as the input. The logical Block simply gives at its output the data at that address. This data is the instruction to be executed the next, so it is latched by the ECR of this module at its output. This instruction is then fed to the decoder so as to generate the required control signals.

The synthesized Fetch module is shown below:

diagram5

Decoder Circuitry:

The overall structure of the decoder is the same as the basic structure of micropipeline apart from the logic block. The logic block of the decoder takes the instruction to be decoded as its input and sends out the control signal. The instruction set of the CPU has been designed so that we can assign the control signals to each and every part in the CPU. The opcode of the CPU is of six bits and it contains one ALU enable bit and one Memory enable bit/Register enable bit.

diagram4

When ALU_EN is ‘0’, then ALU is enabled. When MEM_EN/REG_EN pin is ‘0’ Instructions using register set (without the involvement of memory) are operated upon. When this bit is ‘1’, it means memory operations are enabled. When ALU_EN bit is ‘0’, the 5 MSB’s of instruction code tells which operations are to be performed by ALU. In case of memory involving/ register set operations, MSB and bits 13 to 11 tells which operation is to be performed. If the operation involves memory, then last 10 bits tell from which address memory is to be addressed and is it is a register set instruction, the last 10 bits don’t matter.

Hence in the logical block in the decoder, the outputs RAM_ADDR, RAM_EN, ALU_EN and ALU_SELECT directly correspond to these bits in the instruction to be decoded. The logical block also sends out control signals to the Register set, namely the ld_a, ld_b, ld_acc, ld_f signals. Before sending these signals to the respective blocks, these are sent to ECR1 that latches the control signals at the output. Hence, they change at the next request only.

Apart from sending the control signals, the decoder also takes as input the contents of reg A, reg B, carry flag and the zero flag as inputs from the register set, although these are to be sent directly to the ALU but still are passed through the decoder’s ECR1 so as to latch there values at the input of the ALU.

The complete synthesized decoder module is shown below.

diagram3

Decoder simulation results are shown below:

Screenshot 01.04.49

Arithmetic and Logical Unit (ALU):

An arithmetic and logic unit (ALU) is a digital circuit that performs arithmetic and logical  operations. The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers.

The ALU for our asynchronous CPU design is 16 bits i.e. all the arithmetic and logical operations work on 16 bit data stream. The ALU takes as input the contents of the register A, B and the carry flag latched via the decoder’s ECR. The contents of register A, B and the carry flag are directly fed as inputs to the ALU’s logical block which takes two 16 bit inputs, a carry input and select lines to select the operation and generates a 32 bit output and a carry out(Cout). This data is sent to ECR2 to be latched at the output. Along with this data, the output of the databus is also fed as input to the ECR2 so that it can also be latched at the ALU’s output. The databus has the contents of the register which is to be moved a separate location in case the instruction to be executed is a move instruction. In such cases, the logical block of the ALU is disable via the enable pin and its output is invalid. All these values are finally sent to the RAM module after latching from the ECR2.

Let us now see the logical block of the ALU. The basic block diagram of the logical block in the ALU is shown below:

diagram6

Description of Block Diagram:

ALU consists of two main units viz Arithmetic Unit and Logical Unit. Arithmetic operations like addition, subtraction, multiplication are done by Arithmetic unit while logical operations like shifting and rotating bits comes under Logical unit. Each of the two units has two 16 bit outputs along with a carry in signal and a select signal. The output of ALU is a 16 bit signal and a carry out. The outputs of two units are multiplexed together to give a 16 bit output. Another multiplexer is used to multiplex the carry bit from two units to give carry output of ALU. There are a total of 5 select lines. The MSB of select signal is used to select one of the two basic units. Other 4 bits are used within arithmetic and logical unit to select a particular block.

We will describe both of the basic units one by one:

Logical Unit:

The various operations performed by logical unit are AND, OR, Shift Left, Shift Right, Arithmetic Shift Right, Rotate Right With Carry, Rotate Left With Carry, Rotate left and Rotate Right.

Block Diagram and Description: Logical unit possess one block for each of the operations and two multiplexers to multiplex the outputs of nine blocks to generate the output signal and carry out of logical unit.

The various blocks are:AND Block : It takes two 16 bit signals as inputs and AND them together to generate a 16 bit output.

  1. OR Block: It takes two 16 bit signals as inputs and OR them together to generate a 16 bit output.

  2. Shift Left (SHL): It takes one 16 bit signal as input and shift it one bit to the left and generate a 16 bit output. Zero is inserted in the LSB position.

  3. Shift Right(SHR): It takes one 16 bit signal as input and shift it one bit to the right and generate a 16 bit output. Zero is inserted in the MSB position.

  4. Shift Arithmetic Right (SAR): It takes one 16 bit signal as input and shift it one bit to the right and generate a 16 bit output. It differs from SHR in the sense that instead of inserting zero in MSB, MSB of input signal is inserted in MSB of output signal.

  5. Rotate Right With carry (RRC): It takes one 16 bit signal and carry in as input and shift 16 bit signal one bit to the right including carry and generate a 16 bit output with a carry out.

  6. Rotate Left With carry (RLC): It takes one 16 bit signal and carry in as input and shift 16 bit signal one bit to the left including carry and generate a 16 bit output with a carry out.

  7. Rotate Left: It takes one 16 bit signal as input and rotate it one bit to the left and generate a 16 bit output.

  8. Rotate Right: It takes one 16 bit signal as input and rotate it one bit to the left and generate a 16 bit output.

diagram10

Arithmetic Unit:

The various operations performed by arithmetic unit are Addition, Subtraction, Complement and Multiplication.

  1. Complement : This block takes 16 bit signal as input and gives a 16 bit output which is complement of the input signal.

  2. 16 bit Adder : This block takes two 16 bit signals and a carry in as input and adds them to generate a 16 bit output along with a carry out.16 bit adder consists of four 4 bit adders. Each 4 bit adder is supplied with 4 bits of each input and a carry in from carry out of previous adder.

  3. 16 Bit multiplier: This block takes two 16 bit signals as inputs and produces an output which is a 32 bit signal.

We used a 4 bit by 4 bit Vedic multiplier and developed an 8 bit by 8 bit multiplier which is further used as a basic unit to construct 16 bit by 16 bit multiplier. The 8 bit by 8 bit multiplier is as shown.

Firstly, we used four 4 bit by 4 bit vedic multipliers to obtain the partial products which are then given to two adder stages to obtain the desired 16 bit output.

This 8 bit by 8 bit multiplier is then used to obtain partial products which are again given to two adder stages to obtain 16 bit multiplication output.

  1. Subtractor: Two 16 bit inputs are subtracted by passing the signal to be subtracted through the complement block and then adding this complemented signal to the signal from which signal is to be subtracted. Basic idea is to be obtain the signal and its 16 bit complement, use a multiplexer and with the help of select line we can decide whether the signals are to be added or subtracted.

  2. Clear: Arithmetic operation ‘clear’ clears all the output data i.e. results all bits “0” in register A. This is directly achieved by using a multiplexer and assigning one of the inputs as zero.

fig8

The synthesized ALU module is shown below.

fig9The ALU simulation results are shown.

Screenshot 01.05.30

RAM module:

The RAM module takes the ALU’s output, memory address to be read or written in the RAM, the databus contents as its input. The logical block of the RAM is the memory that is read at the addr location. Some control signals also come from the decoder to the RAM which decide whether to access the RAM or not. In case the contents of the RAM are to be put on the databus, there is a multiplexer for this purpose. The select line of the multiplexer is same as the enable pin of the RAM. The lower 16 bit output of the ALU is fed into the parity generator, zero check to generate the required flags to be fed into the flag register. The ALU’s output is also directly fed to the ECR of this module. Hence after the RAM block has performed its operation, the contents of the databus or the ALU are valid depending on the type of instruction.

The synthesized RAM module is shown.

Screenshot 01.05.36

Register Set:

Register set of the CPU consists of 4 registers viz Register A, Register B, Register ACC and the Flag Register. All of the registers have a load signal as its input. When the load signal is ‘1’, the contents of the register are same as that of the value at the input otherwise they remain the same as before. Hence the register set is implemented as latches. The flag register is somewhat special than the other registers. In that it contains load_flags also as another input which sets the flag flip flops to the required values obtained from the RAM block and keeps all other bits of the register to the same value.

Related Downloads: Design Using VHDL