Best Notes For students: What is Pipelining in Computer Architecture?

Pipelining: Basic Idea

More systematically:

Pipeline the execution of multiple instructions

Analogy: “Assembly line processing” of instructions

Idea:

Divide the instruction processing cycle into distinct “stages” of processing

Ensure there are enough hardware resources to process one instruction in each stage

Process a different instruction in each stage

>>Instructions consecutive in program order are processed in consecutive stages

>>Benefit: Increases instruction processing throughput (1/CPI)

>>Downside: Start thinking about this…

An Ideal Pipeline

n Goal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing)
n Repetition of identical operations

The same operation is repeated on a large number of different inputs (e.g., all laundry loads go through the same steps)

n Repetition of independent operations

No dependencies between repeated operations

n Uniformly partitionable suboperations

Processing can be evenly divided into uniform-latency suboperations (that do not share resources)

n Fitting examples: automobile assembly line, doing laundryWhat about the instruction processing “cycle”?

Pipelining Instruction Processing

Remember: The Instruction Processing Cycle

1. Instruction fetch (IF)

2. Instruction decode and

3. Execute/Evaluate memory address (EX/AG)

4. Memory operand fetch (MEM)

5. Store/writeback result (WB)

Remember: An Ideal Pipeline

nGoal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing)

nRepetition of identical operations

qThe same operation is repeated on a large number of different inputs (e.g., all laundry loads go through the same steps)

nRepetition of independent operations

qNo dependencies between repeated operations

nUniformly partition able sub-operations

qProcessing an be evenly divided into uniform-latency suboperations (that do not share resources)

nFitting examples: automobile assembly line, doing laundry

qWhat about the instruction processing “cycle”?

Instruction Pipeline: Not An Ideal Pipeline

nIdentical operations ... NOT!

Þ different instructions à not all need the same stages

Forcing different instructions to go through the same pipe stages

à external fragmentation (some pipe stages idle for some instructions)

nUniform suboperations ... NOT!

Þ different pipeline stages à not the same latency

Need to force each stage to be controlled by the same clock

à internal fragmentation (some pipe stages are too fast but all take the same clock cycle time)

nIndependent operations ... NOT!

Þ instructions are not independent of each other

Need to detect and resolve inter-instruction dependencies to ensure the pipeline provides correct results

à pipeline stalls (pipeline is not always moving)

Issues in Pipeline Design

nBalancing work in pipeline stages

qHow many stages and what is done in each stage

nKeeping the pipeline correct, moving, and full in the presence of events that disrupt pipeline flow

qHandling dependences

nData

nControl

qHandling resource contention

qHandling long-latency (multi-cycle) operations

nHandling exceptions, interrupts

nAdvanced: Improving pipeline throughput

qMinimizing stalls

Causes of Pipeline Stalls

nStall: A condition when the pipeline stops moving

nResource contention

nDependences (between instructions)

qData

qControl

q Long-latency (multi-cycle) operations

Dependences and Their Types

Also called “dependency” or less desirably “hazard”

Dependences dictate ordering requirements between instructions

nTwo types

qData dependence

qControl dependence

nResource contention is sometimes called resource dependence

qHowever, this is not fundamental to (dictated by) program semantics, so we will treat it separately

Handling ResourceContention

Happens when instructions in two pipeline stages need the same resource

Solution 1: Eliminate the cause of contention

Duplicate the resource or increase its throughput

E.g., use separate instruction and data memories (caches)

E.g., use multiple ports for memory structures

Solution 2: Detect the resource contention and stall one of the contending stages

Which stage do you stall?

Example: What if you had a single read and write port for the register file?

Data Dependences

nTypes of data dependences

qFlow dependence (true data dependence – read after write)

qOutput dependence (write after write)

qAnti dependence (write after read)

nWhich ones cause stalls in a pipelined machine?

qFor all of them, we need to ensure semantics of the program is correct

qFlow dependences always need to be obeyed because they constitute true dependence on a value

qAnti and output dependences exist due to limited number of architectural registers

nThey are dependence on a name, not a value

nWe will later see what we can do about them

Data Dependence Types

Flow dependence

r3 ¬ r1 op r2 Read-after-Write

r5 ¬ r3 op r4 (RAW)

Anti dependence

r3 ¬ r1 op r2 Write-after-Read

r1 ¬ r4 op r5 (WAR)

Output-dependence

r3 ¬ r1 op r2 Write-after-Write

r5 ¬ r3 op r4 (WAW)

r3 ¬ r6 op r7

Data Dependence Handling

How to Handle Data Dependences

nAnti and output dependences are easier to handle

qwrite to the destination in one stage and in program order

nFlow dependences are more interesting

nFive fundamental ways of handling flow dependences

qDetect and wait until value is available in register file

qDetect and forward/bypass data to dependent instruction

qDetect and eliminate the dependence at the software level

nNo need for the hardware to detect dependence

qPredict the needed value(s), execute “speculatively”, and verify

qDo something else (fine-grained multithreading)

nNo need to detect

Interlocking

nDetection of dependence between instructions in a pipelined processor to guarantee correct execution

nSoftware based interlocking

vs.

nHardware based interlocking

nMIPS acronym?

Approaches to Dependence Detection (I)

nScoreboarding

qEach register in register file has a Valid bit associated with it

qAn instruction that is writing to the register resets the Valid bit

qAn instruction in Decode stage checks if all its source and destination registers are Valid

nYes: No need to stall… No dependence

nNo: Stall the instruction

nAdvantage:

qSimple. 1 bit per register

nDisadvantage:

qNeed to stall for all types of dependences, not only flow dep.

Approaches to Dependence Detection (II)

nCombinational dependence check logic

qSpecial logic that checks if any instruction in later stages is supposed to write to any source register of the instruction that is being decoded

qYes: stall the instruction/pipeline

qNo: no need to stall… no flow dependence

nAdvantage:

qNo need to stall on anti and output dependences

nDisadvantage:

qLogic is more complex than a scoreboard

qLogic becomes more complex as we make the pipeline deeper and wider (flash-forward: think superscalar execution)

Once You Detect the Dependence in Hardware

nWhat do you do afterwards?

nObservation: Dependence between two instructions is detected before the communicated data value becomes available

nOption 1: Stall the dependent instruction right away

nOption 2: Stall the dependent instruction only when necessary à data forwarding/bypassing

Data Forwarding/Bypassing

nProblem: A consumer (dependent) instruction has to wait in decode stage until the producer instruction writes its value in the register file

nGoal: We do not want to stall the pipeline unnecessarily

nObservation: The data value needed by the consumer instruction can be supplied directly from a later stage in the pipeline (instead of only from the register file)

nIdea: Add additional dependence check logic and data forwarding paths (buses) to supply the producer’s value to the consumer right after the value is available

nBenefit: Consumer can move in the pipeline until the point the value can be supplied à less stalling

A Special Case of Data Dependence

Control Dependence

nQuestion: What should the fetch PC be in the next cycle?

nAnswer: The address of the next instruction

qAll instructions are control dependent on previous ones. Why?

nIf the fetched instruction is a non-control-flow instruction:

qNext Fetch PC is the address of the next-sequential instruction

qEasy to determine if we know the size of the fetched instruction

nIf the instruction that is fetched is a control-flow instruction:

qHow do we determine the next Fetch PC?

nIn fact, how do we know whether or not the fetched instruction is a control-flow instruction?

Best Notes For students

Pages

Wednesday, 8 April 2020

What is Pipelining in Computer Architecture?