### Chapter 2 Instructions: Language of the Computer (Part 3)

王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw

Computer Organization and Architecture, Fall 2010

Department of Electrical Engineering, Feng-Chia University

- 2.12 Translating and Starting a Program
- 2.14 Arrays versus Pointers
- 2.16 Real Stuff: ARM Instructions
- 2.17 Real Stuff: x86 Instructions
- 2.18 Fallacies and Pitfalls
- 2.19 Concluding Remarks

#### Software

#### Source Program

- Any sequence of statements and/or declarations written in some human-readable computer programming language (e.g. C++, Assembly program)
- Usually created using a text editor (ASCII file)

#### Object Program

- Produced from a source program by compiling/assembling to intermediate machine code
- Also contain data for use by the code at runtime, relocation information, program symbols for linking and/or debugging purposes, and other debugging information

#### Executable Program

Machine code directly executed by a computer's CPU

Computer Organization and Architecture, Fall 2010



#### Assembler Pseudo-instructions

- Most assembler instructions represent machine instructions one-to-one
- Pseudo-instructions: figments of the assembler's imagination

 move \$t0, \$t1
 →
 add \$t0, \$zero, \$t1

 blt \$t0, \$t1, L
 →
 slt \$at, \$t0, \$t1

 bne \$at, \$zero, L

\$at (register 1): assembler temporary

5 Computer Organization and Architecture, Fall 2010



### Dynamic linking



Computer Organization and Architecture, Fall 2010

7

Department of Electrical Engineering, Feng-Chia University











Computer Organization and Architecture, Fall 2010

9



Department of Electrical Engineering, Feng-Chia University

### Dynamic linking



11 Computer Organization and Architecture, Fall 2010

- 2.12 Translating and Starting a Program
- 2.14 Arrays versus Pointers
- 2.16 Real Stuff: ARM Instructions
- 2.17 Real Stuff: x86 Instructions
- 2.18 Fallacies and Pitfalls
- 2.19 Concluding Remarks

#### Arrays vs. Pointers

#### C Code

| Array                                                                                                              | Pointer                                                                                                               |
|--------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|
| <pre>clear1(int array[], int size) {     int i:     for (i = 0; i &lt; size; i += 1)         array[i] = 0; }</pre> | <pre>clear2(int *array, int size) {     int *p:     for (p = &amp;array[0]; p &lt; &amp;array[size]; p = p + 1)</pre> |

#### MIPS Code

|                                          | Arra            | ау                  | Pointer  |                 |                                           |  |  |
|------------------------------------------|-----------------|---------------------|----------|-----------------|-------------------------------------------|--|--|
| move                                     | \$t0,\$zero     | # i = 0             | move     | \$t0,\$a0       | # p = & array[0]                          |  |  |
| 100p1:s11                                | \$t1,\$t0,2     | # \$t1 = i ★ 4      | s11      | \$t1,\$a1,2     | <b># \$</b> t1 = size ★ 4                 |  |  |
| add                                      | \$t2.\$a0.\$t1  | # \$t2 = &array[i]  | add      | \$t2.\$a0.\$t1  | <pre># \$t2 = &amp;array[size]</pre>      |  |  |
| SW                                       | \$zero, 0(\$t2) | ∦ array[i] = 0      | loop2:sw | \$zero,0(\$t0)  | <pre># Memory[p] = 0</pre>                |  |  |
| addi                                     | \$t0,\$t0,1     | # i = i + 1         | addi     | \$t0,\$t0.4     | <b>#</b> p = p + 4                        |  |  |
| slt                                      | \$t3,\$t0,\$a1  | # \$t3 = (i < size) | slt      | \$t3,\$t0,\$t2  | <pre># \$t3=(p&lt;&amp;array[size])</pre> |  |  |
| bne \$t3.\$zero.loop1# if () go to loop1 |                 |                     | bne      | \$t3,\$zero,loo | p2# if () go to loop2                     |  |  |

Computer Organization and Architecture, Fall 2010

13

- 2.12 Translating and Starting a Program
- 2.14 Arrays versus Pointers
- 2.16 Real Stuff: ARM Instructions
- 2.17 Real Stuff: x86 Instructions
- 2.18 Fallacies and Pitfalls
- 2.19 Concluding Remarks

#### **ARM & MIPS Similarities**

ARM: the most popular embedded core
Similar basic set of instructions to MIPS

|                       | ARM              | MIPS             |
|-----------------------|------------------|------------------|
| Date announced        | 1985             | 1985             |
| Instruction size      | 32 bits          | 32 bits          |
| Address space         | 32-bit flat      | 32-bit flat      |
| Data alignment        | Aligned          | Aligned          |
| Data addressing modes | 9                | 3                |
| Registers             | 37 	imes 32-bit  | 35 	imes 32-bit  |
| Input/output          | Memory<br>mapped | Memory<br>mapped |

15 Computer Organization and Architecture, Fall 2010

### **ARM** introduction

#### A 32-bit RISC architecture.

- > A large uniform register file
- > Many instructions execute in a single cycle
- > A load/Store architecture
- Simple addressing mode, with all load/store addresses being determined form register contents, not directly on memory contents.
- Uniform and fixed-length instruction fields, to simplify instruction decode.
   (32-bit length and 3-address format)
- Other features
  - Control over both the ALU and Shifter in every data-processing instruction to maximize the use of an ALU and Shifter.
  - Auto-increment and auto-decrement addressing modes to optimize program loops.
  - > Load and Store Multiple instructions to maximize data throughput.
  - > Conditional execution of all instructions to maximize execution throughput.

Department of Electrical Engineering, Feng-Chia University

### **ARM** register briefs

- ARM has 31 general-purpose 32-bit registers. Only 16 of them are visible, R0 to R15.
- ARM has 6 Program status registers (PSR).
- The 16 registers are User mode register. Only exception can change User mode to other processor mode.
- R14 is Link register used for holding the address of next to a Branch and link.
- R15 is program counter (PC).
- PC points to instruction that is two instruction being executed (In EXE).
- R13 is generally used as a Stack Pointer (SP). This is defined by the Software.



### **Register File**

<sup>17</sup> Computer Organization and Architecture, Fall 2010

#### Compare and Branch in ARM

#### Uses condition codes for result of an arithmetic/logical instruction

- Negative, zero, carry, overflow
- Compare instructions to set condition codes without keeping the result

#### Each instruction can be conditional

- Top 4 bits of instruction word: condition value
- Can avoid branches over single instructions

#### Opcode Mnemonic Meaning Condition flag state [31:28] extension 0000 EQ Equal Z set 0001 NE Not equal Z clear 0010 CS/HS Carry set/unsigned higher or same C set CC/LO Carry clear/unsigned lower 0011 C clear MI Minus/negative 0100 N set 0101 Ы N clear Plus/positive or zero 0110 VS V set Overflow 0111VC No overflow V clear 1000 ΗI Unsigned higher C set and Z clear 1001 LS Unsigned lower or same C clear or Z set 1010 GE Signed greater than or equal N set and V set, or N clear and V clear (N == V) 1011 LT Signed less than N set and V clear, or N clear and V set (N != V) 1100 GT Signed greater than Z clear, and either N set and V set, or N clear and V clear (Z == 0,N == V) 1101 LE Signed less than or equal Z set, or N set and V clear, or N clear and V set (Z == 1 or N != V) 1110 AL Always (unconditional) 1111 (NV) See Condition code 0b1111 on page A3-5 \_

**Condition Codes** 

Computer Organization and Architecture, Fall 2010

#### Program status registers



- 4 condition code flags
  - ➤ (N, Z, C, V) flags : Negative, Zero, Carry, oVerflow
- 1 sticky overflow flag
  - Q bit : DSP instruction overflow bit.
  - In E variants of ARM architecture 5 and above.
- 2 interrupt disable bits
  - I bit : disable normal interrupt (IRQ)
  - F bit : disable fast interrupt (FIQ)
- 1 bit which encodes whether ARM or Thumb instructions are being executed.

> T bit

21 Computer Organization and Architecture, Fall 2010

#### Program status registers (cont.)

| 31 | 30 | 29 | 28 | 27 | 26       | 8 | 7 | 6 | 5 | . 4    | 3      | 2      | 1      | 0      |
|----|----|----|----|----|----------|---|---|---|---|--------|--------|--------|--------|--------|
| N  | Z  | С  | V  | Q  | DNM(RAZ) |   | I | F | Т | M<br>4 | M<br>3 | M<br>2 | M<br>1 | M<br>0 |

#### 5 bits that encode the current processor mode.

| $\triangleright$ | M[4:0]   | are | the | mode | bits  |
|------------------|----------|-----|-----|------|-------|
| -                | 1011-101 | arc | uic | moue | DILJ. |

| M[4:0]  | Mode       | Accessible registers                                |
|---------|------------|-----------------------------------------------------|
| 0b10000 | User       | PC, R14 to R0, CPSR                                 |
| 0b10001 | FIQ        | PC, R14_fiq to R8_fiq, R7 to R0, CPSR, SPSR_fiq     |
| 0b10010 | IRQ        | PC, R14_irq, R13_irq, R12 to R0, CPSR, SPSR_irq     |
| 0b10011 | Supervisor | PC, R14_svc, R13_svc, R12 to R0, CPSR, SPSR_svc     |
| 0Ь10111 | Abort      | PC, R14_abt, R13_abt, R12 to R0, CPSR, SPSR_abt     |
| 0b11011 | Undefined  | PC, R14_und, R13_und, R12 to R0, CPSR, SPSR_und     |
| 0Ь11111 | System     | PC, R14 to R0, CPSR (ARM architecture v4 and above) |

#### Vector address

| Exception type                                  | Mode       | Normal<br>address | High vector<br>address |
|-------------------------------------------------|------------|-------------------|------------------------|
| Reset                                           | Supervisor | 0x00000000        | 0xFFFF0000             |
| Undefined instructions                          | Undefined  | 0x0000004         | 0xFFFF0004             |
| Software interrupt (SWI)                        | Supervisor | 0x0000008         | 0xFFFF0008             |
| Prefetch Λbort (instruction fetch memory abort) | Abort      | 0x000000C         | 0xFFFF000C             |
| Data Abort (data access memory abort)           | Abort      | 0x00000010        | 0xFFFF0010             |
| IRQ (interrupt)                                 | IRQ        | 0x0000018         | 0xFFFF0018             |
| FIQ (fast interrupt)                            | FIQ        | 0x0000001C        | 0xFFFF001C             |
|                                                 |            |                   |                        |

23 Computer Organization and Architecture, Fall 2010

### **Exception process**

When an exception occurs, the banked versions of R14 and the SPSR for the exception mode are used to save state as follows:

R14\_<exception\_mode> = return link SPSR\_<exception\_mode> = CPSR CPSR[4:0] = exception mode number CPSR[5] = 0 /\* Execute in ARM state \*/ If <exception\_mode> == Reset or FIQ then CPSR[6] = 1 /\* Disable fast interrupts \*/ /\* else CPSR[6] is unchanged \*/ CPSR[7] = 1 /\* Disable normal interrupts \*/ PC = exception vector address



Text Book : P164

### **Instruction Encoding**



### **Thumb Architecture Extension**

- The Thumb instruction set is a re-encoded subset of the ARM instruction set and the instructions operate on restricted view of the ARM registers. (R0~R7, R13, R14, R15)
- Thumb is designed to increase the performance of ARM implementations that use a 16-bit or narrower memory data bus and to allow better code density than ARM
- Every Thumb instruction is encode in 16 bits.
- Most Thumb instructions are executed unconditionally.
- Many Thumb data processing instructions use 2-address format. (the destination register is the same as one of the source registers)



### Example : ARM vs. Thumb

|                               |             | The e        | quiv                               | alent                                            | : AF                            | RM assem                                                                                                             | ibly  |
|-------------------------------|-------------|--------------|------------------------------------|--------------------------------------------------|---------------------------------|----------------------------------------------------------------------------------------------------------------------|-------|
| Simple<br>if (x>=0)<br>return | e C routine | F            | CMP<br>RSBLT<br>MOV                | pc,lr                                            | ;lf r0<br>;Mov                  | ve Link Register to                                                                                                  |       |
| else                          | ,           | ∖aThe e      | quiv                               | valent                                           | : Th                            | umb asse                                                                                                             | embly |
| returr                        | n -x;       | labs (       | CODE16<br>CMP<br>BGE<br>NEG<br>MOV | 6 ;Directiv<br>r0,#0<br>return<br>r0,r0<br>pc,Ir | ;Con<br>;Jum<br>;equa<br>;If no | ecifying 16-bit (Th<br>npare r0 to zero<br>np to Return if gre<br>al to zero<br>ot, negate r0<br>re Link register to |       |
|                               | Code        | Instructions | Siz                                | e (Byte                                          | es)                             | Normalised                                                                                                           |       |
|                               | ARM         | 3            | 12                                 |                                                  |                                 | 1.0                                                                                                                  |       |
|                               | Thumb       | 4            | 8                                  |                                                  |                                 | 0.67                                                                                                                 |       |

Department of Electrical Engineering, Feng-Chia University

#### Thumb-2

Improved code density with performance and power efficiency.





29 Computer Organization and Architecture, Fall 2010

- 2.12 Translating and Starting a Program
- 2.14 Arrays versus Pointers
- 2.16 Real Stuff: ARM Instructions
- 2.17 Real Stuff: x86 Instructions
- 2.18 Fallacies and Pitfalls
- 2.19 Concluding Remarks

### Alternative Architectures

#### Design alternative:

- provide more powerful operations
- goal is to reduce number of instructions executed
- danger is a slower cycle time and/or a higher CPI

- *"The path toward operation complexity is thus fraught with peril. To avoid these problems, designers have moved toward simpler instructions"* 

Let's look (briefly) at IA-32 (x86)

31 Computer Organization and Architecture, Fall 2010

## The Intel x86 ISA

- Evolution with backward compatibility
  - > 8080 (1974): 8-bit microprocessor
    - Accumulator, plus 3 index-register pairs
  - > 8086 (1978): 16-bit extension to 8080
    - Complex instruction set (CISC)
  - > 8087 (1980): floating-point coprocessor
    - Adds FP instructions and register stack
  - 80286 (1982): 24-bit addresses, MMU
    - Segmented memory mapping and protection
  - > 80386 (1985): 32-bit extension (now IA-32)
    - Additional addressing modes and operations
    - Paged memory mapping as well as segments

### The Intel x86 ISA

#### Further evolution...

- i486 (1989): pipelined, on-chip caches and FPU
   Compatible competitors: AMD, Cyrix, ...
- > Pentium (1993): superscalar, 64-bit datapath
  - Later versions added MMX (Multi-Media eXtension) instructions
  - The infamous FDIV bug
- Pentium Pro (1995), Pentium II (1997)
  - New microarchitecture (see Colwell, *The Pentium Chronicles*)
- Pentium III (1999)
  - Added SSE (Streaming SIMD Extensions) and associated registers
- Pentium 4 (2001)
  - New microarchitecture
  - Added SSE2 instructions

33

Computer Organization and Architecture, Fall 2010

### The Intel x86 ISA

#### And further...

- > AMD64 (2003): extended architecture to 64 bits
- EM64T Extended Memory 64 Technology (2004)
  - AMD64 adopted by Intel (with refinements)
  - Added SSE3 instructions
- ➤ Intel Core (2006)
  - Added SSE4 instructions, virtual machine support
- AMD64 (announced 2007): SSE5 instructions
   Intel declined to follow, instead...
- Advanced Vector Extension (announced 2008)
  - Longer SSE registers, more instructions
- If Intel didn't extend with compatibility, its competitors would!
  - Technical elegance ≠ market success

### x86 Overview

#### Complexity:

- Instructions from 1 to 17 bytes long
- > one operand must act as both a source and destination
- > one operand can come from memory
- complex addressing modes
  - e.g., "base or scaled index with 8 or 32 bit displacement"
- Saving grace:
  - > the most frequently used instructions are not too difficult to build
  - > compilers avoid the portions of the architecture that are slow

"what the 80x86 lacks in style is made up in quantity, making it beautiful from the right perspective"

#### **x86 Instruction Encoding**

#### Variable length encoding

- Postfix bytes specify addressing mode
- Prefix bytes modify operation
  - Operand length, repetition, locking, ...

| a. JE EIP+ dis<br>4 4<br>JE Censi-<br>tion | 8              | ]            |    |
|--------------------------------------------|----------------|--------------|----|
| B.CALL<br>B                                |                | 32           |    |
| CALL                                       |                | Offset       |    |
| a. MOV EED                                 | , (EDI + 45)   |              |    |
| 6 11                                       |                | 8            |    |
| MGY div                                    | yin<br>Pastata | Displacement |    |
| dLPUSH ESI<br>5 8<br>PUSH Plag             | ]              |              |    |
| a ADD EAX, #                               |                |              |    |
| 4 5 1                                      | 1              | 32           |    |
| ACD Flag vi                                | 7              | Immediate    |    |
| f. TEST EDX, #                             | 42<br>8        | з            | a. |
| TEST                                       | 1              |              | 2  |
| 1631 9                                     | Poelbyle       |              |    |



Department of Electrical Engineering, Feng-Chia University

<sup>35</sup> Computer Organization and Architecture, Fall 2010

### **Implementing IA-32**

Complex instruction set makes implementation difficult

Hardware translates instructions to simpler microoperations

- Simple instructions: 1–1
- Complex instructions: 1-many
- Microengine similar to RISC
- Market share makes this economically viable

Comparable performance to RISC

Compilers avoid complex instructions

37

Computer Organization and Architecture, Fall 2010

- 2.12 Translating and Starting a Program
- 2.14 Arrays versus Pointers
- 2.16 Real Stuff: ARM Instructions
- 2.17 Real Stuff: x86 Instructions
- 2.18 Fallacies and Pitfalls
- 2.19 Concluding Remarks

### Fallacies

#### $\clubsuit$ Powerful instruction $\Rightarrow$ higher performance

- Fewer instructions required
- But complex instructions are hard to implement
  - May slow down all instructions, including simple ones
- Compilers are good at making fast code from simple instructions

#### Use assembly code for high performance

- But modern compilers are better at dealing with modern processors
- > More lines of code  $\Rightarrow$  more errors and less productivity

**39** Computer Organization and Architecture, Fall 2010

Text Book : P175

### Fallacies

# Description + Backward compatibility ⇒ instruction set doesn't change

But they do accrete more instructions



#### Pitfalls

Sequential words are not at sequential addresses
 Increment by 4, not by 1!

#### Keeping a pointer to an automatic variable after procedure returns

- > e.g., passing pointer back via an argument
- Pointer becomes invalid when stack popped

41 Computer Organization and Architecture, Fall 2010

Department of Electrical Engineering, Feng-Chia University

- 2.12 Translating and Starting a Program
- 2.14 Arrays versus Pointers
- 2.16 Real Stuff: ARM Instructions
- 2.17 Real Stuff: x86 Instructions
- 2.18 Fallacies and Pitfalls
- 2.19 Concluding Remarks

#### **Concluding Remarks**

#### Instruction complexity is only one variable

> lower instruction count vs. higher CPI / lower clock rate

#### Design Principles:

- simplicity favors regularity
- ➤ smaller is faster
- make the common case fast
- good design demands compromise

#### Instruction set architecture

a very important abstraction indeed!

#### Required instruction groups

- Arithmetic and logic operations
- Load/store
- Control transfer

43 Computer Organization and Architecture, Fall 2010

Text Book : P179

#### **Concluding Remarks**

- Measure MIPS instruction executions in benchmark programs
  - Consider making the common case fast
  - Consider compromises

| Instruction class | MIPS examples                        | SPEC2006 Int | SPEC2006 FP |
|-------------------|--------------------------------------|--------------|-------------|
| Arithmetic        | add, sub, addi                       | 16%          | 48%         |
| Data transfer     | lw, sw, lb, lbu,<br>lh, lhu, sb, lui | 35%          | 36%         |
| Logical           | and, or, nor, andi,<br>ori, sll, srl | 12%          | 4%          |
| Cond. Branch      | beq, bne, slt,<br>slti, sltiu        | 34%          | 8%          |
| Jump              | j, jr, jal                           | 2%           | 0%          |



Imagine memory as long block of boxes that store data. Each box is labeled with an **address**. A **pointer** is simply a variable that holds a particular address. An **array** is a group of contiguous boxes that can be accessed by their index values. Array and pointer variables are mostly the same; we're going to highlight one of the ways they are different.



Selly Ohn