#### Chapter 4 The Processor (Part 1)

王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw

Computer Organization and Architecture, Fall 2010

## The Processor : Datapath and Control

#### Processor



### Outline

#### Introduction (4.1)

Logic Design Conventions (4.2, C.7, C.8, C.11)

- Building a Datapath (4.3)
- A Simple Implementation Scheme (4.4, D.2)

3

Computer Organization and Architecture, Fall 2010

#### The Processor: Datapath & Control

- We're ready to look at an implementation of the MIPS
- Simplified to contain only:
  - > memory-reference instructions: *lw*, *sw*
  - > arithmetic-logical instructions: add, sub, and, or, slt
  - $\succ$  control flow instructions: *beq*, *j*
- Generic Implementation:
  - > use the program counter (PC) to supply instruction address
  - get the instruction from memory
  - > read registers
  - use the instruction to decide exactly what to do
- All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow?

#### **Instruction Execution**

- $\Phi$  PC  $\rightarrow$  instruction memory, fetch instruction
- $\Phi$  Register numbers  $\rightarrow$  register file, read registers
- Depending on instruction class
  - Use ALU to calculate
    - Arithmetic result
    - Memory address for load/store
    - Branch target address
  - Access data memory for load/store
  - $\blacktriangleright$  PC  $\leftarrow$  target address or PC + 4

5 Computer Organization and Architecture, Fall 2010

#### More details: an abstract view

Abstract / Simplified View:



- Two types of functional units:
  - elements that operate on data values (combinational)
  - > elements that contain state (sequential)





### Outline

Introduction (4.1)

Logic Design Conventions (4.2, C.7, C.8, C.11)

- Building a Datapath (4.3)
- A Simple Implementation Scheme (4.4, D.2)

9 Computer Organization and Architecture, Fall 2010

Department of Electrical Engineering, Feng-Chia University

### Logic Design Basics

- Information encoded in binary
  - Low voltage = 0, High voltage = 1
  - One wire per bit
  - > Multi-bit data encoded on multi-wire buses

#### Combinational element

- Operate on data
- Output is a function of input

#### Sequential (state) elements

Store information

#### **Combinational Elements**



### Sequential Elements

- Register: stores data in a circuit
  - Uses a clock signal to determine when to update the stored value
  - Edge-triggered: update when Clk changes from 0 to 1





# Sequential Elements

- Register with write control
  - Only updates on clock edge when write control input is 1
  - Used when stored value is required later



#### Sequential Elements

- Clock methodology
  - > Define when signals can be read and when they can be written.
- Unclocked vs. Clocked
- Clocks used in synchronous logic
  - when should an element that contains state be updated?



15 Computer Organization and Architecture, Fall 2010



#### Edge-Triggered Methodology

Allow a state element to be read and written in the same clock cycle without creating a race that could lead to undetermined data values.



Text Book : C50 ~ C51

#### An unclocked state element

#### The Set-Reset latch

output depends on present inputs and also on past inputs



| S | R | $Q_n$ | Q <sub>n+1</sub> | Function          |
|---|---|-------|------------------|-------------------|
| 0 | 0 | 0     |                  | hold (keep value) |
| 0 | 0 | 1     |                  |                   |
| 0 | 1 | 0     |                  | 0                 |
| 0 | 1 | 1     |                  |                   |
| 1 | 0 | 0     |                  | 1                 |
| 1 | 0 | 1     |                  |                   |
| 1 | 1 | 0     |                  | unstable          |
| 1 | 1 | 1     |                  |                   |

#### Latches and Flip-flops

- Output is equal to the stored value inside the element (don't need to ask for permission to look at the value)
- Change of state (value) is based on the clock
- Latches: whenever the inputs change, and the clock is asserted
- Flip-flop: state changes only on a clock edge (edge-triggered methodology)

"logically true", — could mean electrically low

A clocking methodology defines when signals can be read and written — wouldn't want to read a signal at the same time it was being written

> 19 Computer Organization and Architecture, Fall 2010

> > Text Book : C52



Two inputs:

- the data value to be stored (D)
- > the clock signal (C) indicating when to read & store D
- Two outputs:
  - > the value of the internal state (Q) and it's complement





Computer Organization and Architecture, Fall 2010

Department of Electrical Engineering, Feng-Chia University

#### D flip-flop





Text Book : C54

### Register File (Read)



Do you understand? What is the "Mux" above?

#### Abstraction

- Make sure you understand the abstractions!
- Sometimes it is easy to think you do, when you don't



Computer Organization and Architecture, Fall 2010

Text Book : C56

23

#### **Register File (Write)**

Note: we still use the real clock to determine when to write



### Outline

- Introduction (4.1)
- Logic Design Conventions (4.2, C.7, C.8, C.11) Ð
- Building a Datapath (4.3)
- A Simple Implementation Scheme (4.4, D.2) Ф

25

Computer Organization and Architecture, Fall 2010



Department of Electrical Engineering, Feng-Chia University

#### Fetch the instruction

- PC: program counter
- Instruction memory
  - Can be cache



Computer Organization and Architecture, Fall 2010

27

#### Datapath for R-type instructions

Read two registers + ALU operation + write one register



#### Datapath for a load or store

ALU computes the address used in accessing the memory I-type Op Rs Rt 16-bit Immediate ALU operation Δ Read register 1 MemWrite Read data 1 Read Zero register 2 Instruction ALU Registers ALU Write Read result Address data register Read data 2 Write Data data memory Write RegWrite data 16 32 Sign MemRead extend 29 Computer Organization and Architecture, Fall 2010

#### Datapath for memory instructions and **R-type** instructions

Combine what we have so far Ð



### Datapath for a branch

#### Another ALU computes the branch target address



#### Building the Datapath

Use multiplexors to stitch them together



Department of Electrical Engineering, Feng-Chia University

Computer Organization and Architecture, Fall 2010

- Introduction (4.1)
- Logic Design Conventions (4.2, C.7, C.8, C.11)
- Building a Datapath (4.3)
- A Simple Implementation Scheme (4.4, D.2)

Computer Organization and Architecture, Fall 2010

Department of Electrical Engineering, Feng-Chia University

#### Control

- Selecting the operations to perform (ALU, read/write, etc.)
- Controlling the flow of data (multiplexor inputs)
- Information comes from the 32 bits of the instruction
- Instruction Format:

| 31:26    | 05.04                               |                                                                            |                                                                                                  |                                                           |                                                                        |  |  |
|----------|-------------------------------------|----------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------|------------------------------------------------------------------------|--|--|
| 01.20    | 25:21                               | 20:16                                                                      | 15:11                                                                                            | 10:6                                                      | 5:0                                                                    |  |  |
| ruction  |                                     |                                                                            |                                                                                                  |                                                           |                                                                        |  |  |
| 35 or 43 | rs                                  | rt                                                                         |                                                                                                  | address                                                   |                                                                        |  |  |
| 31:26    | 25:21                               | 20:16                                                                      | 6 15:0                                                                                           |                                                           |                                                                        |  |  |
| re instr | uction                              |                                                                            |                                                                                                  |                                                           |                                                                        |  |  |
| 4        | rs                                  | rt                                                                         |                                                                                                  | address                                                   |                                                                        |  |  |
| 31:26    | 25:21                               | 20:16                                                                      |                                                                                                  | 15:0                                                      |                                                                        |  |  |
|          | 35 or 43<br>31:26<br>ore instr<br>4 | 35 or 43     rs       31:26     25:21       ore instruction       4     rs | 35 or 43     rs     rt       31:26     25:21     20:16       ore instruction     4     rs     rt | 35 or 43 rs rt   31:26 25:21 20:16   ore instruction 4 rs | 35 or 43rsrtaddress31:2625:2120:1615:0ore instruction4rsrtaddress4rsrt |  |  |

ALU's operation based on instruction type and function code

#### Control

| Example:                                                                   | add \$8, \$17, \$18            |                                                                        |                                                                                                                 |                                                                                                                                   |                                                                                                                                                                     |  |  |
|----------------------------------------------------------------------------|--------------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| ор                                                                         | rs                             | rt                                                                     | rd                                                                                                              | shamt                                                                                                                             | funct                                                                                                                                                               |  |  |
| 000000                                                                     | 10001                          | 10010                                                                  | 01000                                                                                                           | 00000                                                                                                                             | 100000                                                                                                                                                              |  |  |
| Example:     Iw     \$1, 100(\$2)       op     rs     rt     16 bit offset |                                |                                                                        |                                                                                                                 |                                                                                                                                   |                                                                                                                                                                     |  |  |
|                                                                            |                                |                                                                        |                                                                                                                 |                                                                                                                                   |                                                                                                                                                                     |  |  |
|                                                                            | op<br>000000<br>Example:<br>op | op     rs       000000     10001       Example:     lw       op     rs | op     rs     rt       000000     10001     10010       Example:     lw     \$1, 100(\$)       op     rs     rt | op     rs     rt     rd       000000     10001     10010     01000       Example:     Iw     \$1, 100(\$2)       op     rs     rt | op     rs     rt     rd     shamt       000000     10001     10010     01000     00000       Example:     lw     \$1, 100(\$2)       op     rs     rt     16 bit of |  |  |

#### ALU control input

| ALU control lines | Function         |
|-------------------|------------------|
| 0000              | AND              |
| 0001              | OR               |
| 0010              | add              |
| 0110              | subtract         |
| 0111              | set on less than |
| 1100              | NOR              |
|                   |                  |



Computer Organization and Architecture, Fall 2010

#### Two level decode

#### ALUOp is assigned based on instruction type

| Instruction | Instruction<br>opcode | ALUOp | Funct<br>field | Desired<br>ALU action | ALU control<br>input |
|-------------|-----------------------|-------|----------------|-----------------------|----------------------|
| lw          | LW (35)               | 00    | XXXXXX         | add                   | 0010                 |
| sw          | SW (43)               | 00    | XXXXXX         | add                   | 0010                 |
| beq         | Branch (4)            | 01    | XXXXXX         | subtract              | 0110                 |
| add         | R-type (0)            | 10    | 100000         | add                   | 0010                 |
| sub         | R-type (0)            | 10    | 100010         | subtract              | 0110                 |
| and         | R-type (0)            | 10    | 100100         | and                   | 0000                 |
| or          | R-type (0)            | 10    | 100101         | or                    | 0001                 |
| slt         | R-type (0)            | 10    | 101010         | Set on less than      | 0111                 |

#### ALU control unit

- Must describe hardware to compute 4-bit ALU control input
  - > given instruction type



ALUOp Assigned from instruction type

- function code for arithmetic -
- > "11" is not used, so "10" and "01" can be "1x" and "x1" respectively.

Describe it using a truth table (can turn into gates):

| ALU      | Funct field |    |    |    |    |    | Operation |      |
|----------|-------------|----|----|----|----|----|-----------|------|
| ALUOp[1] | ALUOp[0]    | F5 | F4 | F3 | F2 | F1 | F0        |      |
| 0        | 0           | Х  | Х  | Х  | Х  | Х  | Х         | 0010 |
| Х        | 1           | Х  | Х  | Х  | Х  | Х  | Х         | 0110 |
| 1        | Х           | Х  | Х  | 0  | 0  | 0  | 0         | 0010 |
| 1        | Х           | Х  | Х  | 0  | 0  | 1  | 0         | 0110 |
| 1        | Х           | Х  | Х  | 0  | 1  | 0  | 0         | 0000 |
| 1        | Х           | Х  | Х  | 0  | 1  | 0  | 1         | 0001 |
| 1        | Х           | Х  | Х  | 1  | 0  | 1  | 0         | 0111 |

37

Computer Organization and Architecture, Fall 2010

Text Book : D4 ~ D6

### ALU control unit

Simple combinational logic



38 Computer Organization and Architecture, Fall 2010



39

Computer Organization and Architecture, Fall 2010

0

0

Text Book : D7 ~ D8

0

1

#### Main control unit

0

0

0

0

1

0

0

1

Х

Х

Truth table for the main control unit

sw

beq

Х

Х

1

|         | Signal name | R-format | lw | SW | beq |
|---------|-------------|----------|----|----|-----|
| Inputs  | Op5         | 0        | 1  | 1  | 0   |
|         | Op4         | 0        | 0  | 0  | 0   |
|         | Op3         | 0        | 0  | 1  | 0   |
|         | Op2         | 0        | 0  | 0  | 1   |
|         | Op1         | 0        | 1  | 1  | 0   |
|         | Op0         | 0        | 1  | 1  | 0   |
| Outputs | RegDst      | 1        | 0  | Х  | Х   |
|         | ALUSrc      | 0        | 1  | 1  | 0   |
|         | MemtoReg    | 0        | 1  | Х  | Х   |
|         | RegWrite    | 1        | 1  | 0  | 0   |
|         | MemRead     | 0        | 1  | 0  | 0   |
|         | MemWrite    | 0        | 0  | 1  | 0   |
|         | Branch      | 0        | 0  | 0  | 1   |
|         | ALUOp1      | 1        | 0  | 0  | 0   |
|         | ALUOp0      | 0        | 0  | 0  | 1   |



#### **Our Simple Control Structure**

- All of the logic is combinational
- We wait for everything to settle down, and the right thing to be done
  - > ALU might not produce "right answer" right away
  - > we use write signals along with clock to determine when to write
- Cycle time determined by length of the longest path



#### We are ignoring some details like setup and hold times

41 Computer Organization and Architecture, Fall 2010

### Single Cycle Implementation

Calculate cycle time assuming negligible delays except:
memory (200ps), ALU and adders (100ps), register file access (50ps)



### Minimum clock period

#### F/F propagation time + datapath delay + setup time



### Performance of Single-Cycle Machines

|            | Instruction<br>mix | Instruction<br>fetch | Register<br>read | ALU<br>operation | Data<br>memory | Register<br>write | Total  |
|------------|--------------------|----------------------|------------------|------------------|----------------|-------------------|--------|
| R-type     | 45%                | 200                  | 50               | 100              | 0              | 50                | 400 ps |
| Load word  | 25%                | 200                  | 50               | 100              | 200            | 50                | 600 ps |
| Store word | 10%                | 200                  | 50               | 100              | 200            | 0                 | 550 ps |
| Branch     | 15%                | 200                  | 50               | 100              | 0              | 0                 | 350 ps |
| Jump       | 5%                 | 200                  | 0                | 0                | 0              | 0                 | 200 ps |