## SRI BALAJI CHOCKALINGAM ENGINEERING COLLEGE, ARNI

William Stallings
Computer Organization
and Architecture

**Chapter 3 Instruction Cycle Review System Buses** 

Faculty Name: T.Karthikeyan
HOD/AP/CSE

# Simple Bus Architecture

A simplified motherboard of a personal computer (top view):



# **Architecture Review - Program Concept**

- #General purpose hardware can do different tasks, given correct control signals
- Instead of re-wiring, supply a new set of control signals



## What is a program?

#### **#**Software

- △A sequence of steps
- □ For each step, an arithmetic or logical operation is done

#### **Function of Control Unit**

- ★ For each operation a unique code is provided 
   △e.g. ADD, MOVE
- **\*\*A** hardware segment accepts the code and issues the control signals

**\*\***We have a computer!

## **Components**

- **#Central Processing Unit** 

  - △Arithmetic and Logic Unit
- \*\*Data and instructions need to get into the CPU and results out
- #Temporary storage of code and results is needed

# Computer Components: Top Level View



Buffers



PC = Program counter
IR = Instruction register
MAR = Memory address register
MBR = Memory buffer register
I/O AR = I/O address register
I/O BR = I/O buffer register

## **Simplified Instruction Cycle**

#### **#Two steps:**

**△** Execute



## **Fetch Cycle**

- #Program Counter (PC) holds address of next instruction to fetch
- #Processor fetches instruction from memory location pointed to by PC
- **#**Instruction loaded into Instruction Register (IR)
- #Processor interprets instruction and performs required actions

### **Execute Cycle**

- **#** Processor-memory
  - □ data transfer between CPU and main memory
- **#Processor I/O** 
  - □ Data transfer between CPU and I/O module
- **#** Data processing
- **#**Control
  - △ Alteration of sequence of operations
  - △e.g. jump
- **#Combination** of above

## **Hypothetical Machine**

```
Opcode Address

0 34
```

**#** Integer Format - Data range?

```
S Magnitude

0 1 15
```

#### **#** Registers

#### # Partial List of Opcodes

△ 0010 = Store AC to Memory

 $\triangle 0101 = Add$  to AC from Memory

### **Example of Program Execution**



# **Modifications to Instruction Cycle**

#### **X** Simple Example

- △ Always added one to PC
- □ Entire operand fetched with instruction

#### **#** More complex examples

- ☑ Instruction set design might require repeat trip to memory to fetch operand
  - ☑In particular, if memory address range exceeds word size

# Instruction Cycle - State Diagram



Start Here

## Introduction to Interrupts

- **X** We will have more to say about interrupts later!
- # Interrupts are a mechanism by which other modules (e.g. I/O) may interrupt normal sequence of processing
- **#** Four general classes of interrupts
  - □ Program e.g. overflow, division by zero
  - **△**Timer
    - **⊠**Generated by internal processor timer
  - ☑ I/O from I/O controller
  - - **⊠**e.g. memory parity error
- ## Particularly useful when one module is much slower than another, e.g. disk access (milliseconds) vs. CPU (microseconds or faster)

### **Interrupt Examples**



(a) No interrupts

(b) Interrupts; short I/O wait

(c) Interrupts; long I/O wait

### **Interrupt Cycle**

- **X** Added to instruction cycle
- **#** Processor checks for interrupt
  - ☐ Indicated by an interrupt signal
- # If no interrupt, fetch next instruction
- **X** If interrupt pending:

  - Save context (what does this mean?)
  - Set PC to start address of interrupt handler routine

  - Restore context and continue interrupted program

# Instruction Cycle (with Interrupts) - State Diagram



## **Multiple Interrupts**

- ★ Disable interrupts Sequential Processing
  - Processor will ignore further interrupts whilst processing one interrupt

  - ☑Interrupts handled in sequence as they occur
- ★ Define priorities Nested Processing

## **Multiple Interrupts - Sequential**



Disabled Interrupts – Nice and Simple

## **Multiple Interrupts - Nested**



How to handle state with an arbitrary number of interrupts?

# Sample Time Sequence of Multiple Interrupts



### Connecting

- **#**All the units must be connected
- **\*** Different type of connection for different type of unit

  - □ Input/Output
  - **CPU**

#### **Memory Connection**

- # Memory typically consists of N words of equal length addressed from 0 to N-1
- # Receives and sends data
  - ☐ To Processor
- **X** Receives addresses (of locations)
- **X** Receives control signals
  - Read

  - **△**Timing

# Input/Output Connection(1)

- # Functionally, similar to memory from internal viewpoint
- # Instead of N words as in memory, we have M ports
- **#** Output
  - □ Receive data from computer
  - Send data to peripheral
- # Input
  - Receive data from peripheral

## Input/Output Connection(2)

- **\*\*** Receive control signals from computer
- **X**Send control signals to peripherals △e.g. spin disk
- **#**Send interrupt signals (control)

#### **CPU Connection**

- **#**Sends control signals to other units
- **\*\*** Reads instruction and data
- **\*\*Writes out data (after processing)**
- Receives (& acts on) interrupts

#### **Buses**

#There are a number of possible interconnection systems. The most common structure is the **bus** 

- **#**Single and multiple BUS structures are most common
- ₩e.g. Control/Address/Data bus (PC)
- **x**e.g. Unibus (DEC-PDP) − replaced the Omnibus

#### What is a Bus?

- **X** A communication pathway connecting two or more devices
- **★ Usually broadcast** 
  - □ Everyone listens, must share the medium
  - Master − can read/write exclusively, only one master
  - Slave − everyone else. Can monitor data but not produce
- **#** Often grouped
  - A number of channels in one bus
  - △e.g. 32 bit data bus is 32 separate single bit channels
- ★ Power lines may not be shown
- # Three major buses: data, address, control

#### **Bus Interconnection Scheme**



#### **Data Bus**

#### **#** Carries data

- □ Remember that there is no difference between "data" and "instruction" at this level
- **#** Width is a key determinant of performance
  - △8, 16, 32, 64 bit

#### **Address bus**

- **X** Identify the source or destination of data
  - ☐ In general, the address specifies a specific memory address or a specific I/O port
- # e.g. CPU needs to read an instruction (data) from a given location in memory
- **#** Bus width determines maximum memory capacity of system
  - △8086 has 20 bit address bus but 16 bit word size for 64k directly addressable address space
  - □ But it could address up to 1MB using a segmented memory model
    - ☑RAM: 0 BFFFF, ROM: C0000 FFFFF
    - ☑DOS only allowed first 640K to be used, remaining memory for BIOS, hardware controllers. Needed High-Memory Manager to "break the 640K barrier"

#### **Control Bus**

#### **★ Control and timing information**

- □ Determines what modules can use the data and address lines
- ☑If a module wants to send data, it must (1) obtain permission to use the bus, and (2) transfer data which might be a request for another module to send data

#### 

- ☑ I/O read
- ☑ I/O write

### Big and Yellow?

- **\*\*What do buses look like?** 
  - □ Parallel lines on circuit boards

  - Strip connectors on mother boards
    - ⊠e.g. PCI
- #Limited by physical proximity time delays, fan out, attenuation are all factors for long buses

#### **Single Bus Problems**

- **X** Lots of devices on one bus leads to:
  - - ∠Long data paths mean that co-ordination of bus use can adversely affect performance bus skew, data arrives at slightly different times
    - ☑If aggregate data transfer approaches bus capacity. Could increase bus width, but expensive
  - Device speed
    - **⊠**Bus can't transmit data faster than the slowest device
    - Slowest device may determine bus speed!
      - Consider a high-speed network module and a slow serial port on the same bus; must run at slow serial port speed so it can process data directed for it
- **\*\*** Most systems use multiple buses to overcome these problems

# Traditional (ISA) (with cache)



This approach breaks down as I/O devices need higher performance

# High Performance Bus – Mezzanine Architecture

Addresses higher speed I/O devices by moving up in the hierarchy



# Bridge Based Bus Architecture

3200 MB/sec 3200 MB/sec 400-MHz 512KB-2MB 400-MHz 512KB-2MB Core Core Cache Cache 800 MB/sec 100-MHz System Bus AGP 100 MHz Intel 440GX 2GB AGP 2X AGPset 100-MHz Graphics (Host Bridge) SDRAM 533 MB/sec 800 MB/sec 133 MB/sec 33-MHz PCI Bus **USB #2** IDE Bus #2 SCSI PCI to ISA 1.5 MB/sec **USB #1** Bridge Interface 33 MB/sec Snapshot Mouse Camera SCSI Bus CD-ROM 33 MB/sec IDE Bus #1 ISA Bus Hard Hard 16.7 MB/sec Disk Disk Ethernet Keyboard Audio Interface

 Bridging with dual Pentium II Xeon processors on Slot 2.

(Source: http:// www.intel.com.)

# **Direct Memory Access**

# DMA Transfer from Disk to Memory Bypasses the CPU



### DMA Flowchart for a Disk Transfer



# **Bus Types**

- **#** Dedicated
- **#** Multiplexed

  - - ▼Time division multiplexing in this case
  - △Advantage fewer lines
  - Disadvantages

    - **⊠**Ultimate performance

## **Bus Arbitration**

- More than one module may want to control the bus
  - ☑e.g. I/O module may need to send data to memory and to the CPU
- #But only one module may control bus at one time
  - △ Arbitration decides who gets to use the bus
  - △Arbitration must be fast or I/O devices might lose data
- **\***Arbitration may be centralized or distributed

### **Centralized Arbitration**

- **#**Single hardware device is responsible for allocating bus access
- **#** May be part of CPU or separate

### **Distributed Arbitration**

- ★ No single arbiter
- **X** Each module may claim the bus
- # Proper control logic on all modules so they behave to share the bus
- # Purpose of both distributed and centralized is to designate the master
- # The recipient of a data transfer is the slave
- **#** Many types of arbitration algorithms: round-robin, priority, etc.

# **Bus Arbitration**

Bus grant

(a)

Arbiter

 (a)Simple centralized bus arbitration; (b) centralized arbitration with priority levels; (c) decentralized bus arbitration. (Adapted from [Tanenbaum, 1999]).

Bus grant level 0
Bus request
Bus request
Busy
Bus grant

Bus request
Busy

1
2

n

Daisy Chaining of devices

What if a device breaks?

Devices to left higher priority

Bus request

Bus request level 0

п

# **Bus Arbitration**Implementations – Centralized

#### **#** Centralized

- ☐ If a device wants the bus, assert bus request
- △ Arbiter decides whether or not to send bus grant
- ☐ If device wants the bus, it uses it and does not propagate bus grant down the line. Otherwise it propagates the bus grant.
- □ Electrically close devices to arbiter get first priority

#### **X** Centralized with Multiple Priority Levels

□ Can add multiple priority levels, grants, for more flexible system. Arbiter can issue bus grant on only highest priority line

# **Bus Arbitration Implementation**- **Decentralized**

#### **#** Decentralized

- ✓ If don't want the bus, propagate bus grant down the line
- - ☑If bus grant is on, propagate negative bus grant
- △ Asserts busy on and begins transfer
- △Leftmost device that wants the bus gets it

# **Timing**

- **#**Co-ordination of events on bus
- **#**Synchronous

  - △A single 1-0 is a bus cycle
  - △All devices can read clock line

  - □ Usually a single cycle for an event

## 100 MHz Bus Clock



100 million cycles per second 1 cycle in (1/100,000,000) seconds = 0.0000001s = 10 ns

In reality, the clock is a bit more sawtoothed



# **Synchronous Timing Diagram Read Operation Timing**



# Synchronous - Disadvantages

- **\*\***Although synchronous clocks are simple, there are some disadvantages
  - ☑ Everything done in multiples of clock, so something finishing in 3.1 cycles takes 4 cycles
  - - ▼Faster devices can't run at their capacity, all devices are tied
       to a fixed clock rate
- **#**One solution: Use asynchronous bus

# **Asynchronous Bus**

- ₩No clock
- **\*\***Occurrence of one event on the bus follows and depends on a previous event
- Requires tracking of state, hard to debug, but potential for higher performance
- **#**Also used with networking
  - Problem with "drift" and loss of synchronization
  - Some use self-clocking codes, e.g. Ethernet

# **Asynchronous Timing Diagram**

