Pipelining: Basic Idea
More systematically:
Pipeline the execution of multiple instructions
Analogy: “Assembly line processing” of instructions
Idea:
Divide the
instruction processing cycle into distinct “stages” of processing
Ensure there are enough hardware resources to process one instruction in
each stage
Process a
different instruction in each stage
>>Instructions consecutive in program order are processed in consecutive
stages
>>Benefit: Increases instruction
processing throughput (1/CPI)
>>Downside: Start thinking about
this…
An Ideal Pipeline
n Goal: Increase throughput with little increase in cost (hardware cost, in case of
instruction processing)
n Repetition of identical operations
n Repetition of identical operations
- The same operation is repeated on a large number of different inputs (e.g., all laundry loads go through the same steps)
- No dependencies between repeated operations
- Processing can be evenly divided into uniform-latency suboperations (that do not share resources)
n Fitting examples: automobile
assembly line, doing laundryWhat about the instruction processing “cycle”?
Pipelining Instruction Processing
Remember: The
Instruction Processing Cycle
1. Instruction fetch (IF)
2. Instruction decode and
register operand fetch (ID/RF)
3. Execute/Evaluate memory address
(EX/AG)
4. Memory operand fetch (MEM)
5. Store/writeback result (WB)
Remember: An Ideal Pipeline
nGoal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing)
nRepetition of identical operations
qThe same operation is repeated on a large number of different inputs (e.g., all laundry loads go through the same steps)
nRepetition of independent operations
qNo dependencies between repeated operations
nUniformly partition able sub-operations
qProcessing an be evenly divided into uniform-latency suboperations (that do not share resources)
nFitting examples: automobile assembly line, doing laundry
qWhat about the instruction processing “cycle”?
Instruction Pipeline: Not An Ideal Pipeline
nIdentical operations ... NOT!
Þ different
instructions à not all need the same
stages
Forcing different instructions to go
through the same pipe stages
à external fragmentation (some
pipe stages idle for some instructions)
à
nUniform suboperations ... NOT!
Þ different pipeline
stages à not the same latency
Need to force each stage to be controlled by the same clock
à internal fragmentation (some
pipe stages are too fast but all take the same clock cycle time)
à
nIndependent operations ... NOT!
Þ instructions are not
independent of each other
Need to detect and resolve inter-instruction dependencies to ensure the
pipeline provides correct results
à pipeline stalls (pipeline is not
always moving)
Issues in Pipeline
Design
nBalancing work in pipeline stages
qHow many stages and what is done in each stage
n
nKeeping the pipeline correct,
moving, and full in the presence of events that disrupt
pipeline flow
qHandling dependences
nData
nControl
qHandling resource contention
qHandling long-latency (multi-cycle) operations
n
nHandling exceptions, interrupts
n
nAdvanced: Improving pipeline throughput
qMinimizing stalls
Causes of Pipeline Stalls
nStall: A condition when the
pipeline stops moving
nResource contention
nDependences
(between instructions)
qData
qControl
q
Long-latency (multi-cycle) operations
Dependences and
Their Types
Also called “dependency” or less desirably “hazard”
Dependences dictate ordering requirements between
instructions
nTwo types
qData dependence
qControl dependence
nResource contention is sometimes called resource
dependence
qHowever, this is not fundamental to (dictated by) program semantics, so
we will treat it separately
Handling ResourceContention
Happens when instructions in two pipeline stages need the same resource
Solution 1: Eliminate the cause of contention
Duplicate the resource or increase its throughput
E.g., use separate instruction and data memories (caches)
E.g., use multiple ports for memory structures
Solution 2: Detect the resource contention and stall one of the contending stages
Which stage do you stall?
Example: What if you had a single read and write port for the register file?
Data Dependences
nTypes of data dependences
qFlow dependence (true data dependence – read after write)
qOutput dependence (write after write)
qAnti dependence (write after read)
q
nWhich ones cause stalls in a
pipelined machine?
qFor all of them,
we need to ensure semantics of the program is correct
qFlow dependences always need to be obeyed because they constitute true
dependence on a value
qAnti and output
dependences exist due to limited number of architectural registers
nThey are dependence on a name, not a value
nWe will later see what we can do about them
Data Dependence
Types
Flow dependence
r3 ¬ r1 op r2 Read-after-Write
r5 ¬ r3 op r4 (RAW)
Anti dependence
r3 ¬ r1 op r2 Write-after-Read
r1 ¬ r4 op r5 (WAR)
Output-dependence
r3 ¬ r1 op r2 Write-after-Write
r5 ¬ r3 op r4 (WAW)
r3 ¬ r6 op r7
Data Dependence
Handling
How to Handle Data Dependences
nAnti and output dependences are
easier to handle
qwrite to the destination in one stage and in program order
n
nFlow dependences are more
interesting
n
nFive fundamental ways of handling flow dependences
qDetect and wait until value is available in register file
qDetect and
forward/bypass data to dependent instruction
qDetect and
eliminate the dependence at the software level
nNo need for the hardware to detect dependence
qPredict the needed value(s), execute “speculatively”, and verify
qDo something
else (fine-grained multithreading)
nNo need to detect
Interlocking
nDetection of dependence between instructions
in a pipelined processor to guarantee correct execution
n
nSoftware based interlocking
vs.
nHardware based interlocking
n
nMIPS acronym?
Approaches to
Dependence Detection (I)
nScoreboarding
qEach register in register file has a Valid bit associated with it
qAn instruction that is writing to the register resets the Valid bit
qAn instruction in Decode stage checks if all its source and destination
registers are Valid
nYes: No need to stall… No dependence
nNo: Stall the instruction
q
nAdvantage:
qSimple. 1 bit per register
nDisadvantage:
qNeed to stall for all types of dependences, not only flow dep.
Approaches to
Dependence Detection (II)
nCombinational dependence check logic
qSpecial logic that checks if any instruction in later stages is supposed
to write to any source register of the instruction that is being decoded
qYes: stall the instruction/pipeline
qNo: no need to stall… no flow dependence
q
nAdvantage:
qNo need to stall on anti and output dependences
n
nDisadvantage:
qLogic is more complex than a scoreboard
qLogic becomes more complex as we make the pipeline deeper and wider
(flash-forward: think superscalar execution)
Once You Detect the
Dependence in Hardware
nWhat do you do afterwards?
n
nObservation: Dependence between two instructions is
detected before the communicated data value becomes available
n
nOption 1: Stall the dependent
instruction right away
nOption 2: Stall the dependent
instruction only when necessary à data forwarding/bypassing
Data
Forwarding/Bypassing
nProblem: A consumer (dependent)
instruction has to wait in decode stage until the producer instruction writes
its value in the register file
n
nGoal: We do not want to stall the
pipeline unnecessarily
n
nObservation: The data value needed by the consumer instruction
can be supplied directly from a later stage in the pipeline (instead of only
from the register file)
n
nIdea: Add additional dependence check logic and data
forwarding paths (buses) to supply the producer’s value to the consumer right after the value is
available
n
nBenefit: Consumer can move in the
pipeline until the point the value can be supplied à less stalling
A Special Case of
Data Dependence
Control Dependence
nQuestion: What should the fetch
PC be in the next cycle?
nAnswer: The address of the next instruction
qAll
instructions are control dependent on previous ones. Why?
n
nIf the fetched instruction is a
non-control-flow instruction:
qNext
Fetch PC is the address of the next-sequential instruction
qEasy
to determine if we know the size of the fetched instruction
nIf the instruction that is fetched is a
control-flow instruction:
qHow
do we determine the next Fetch PC?
q
nIn fact, how do we know whether or not the
fetched instruction is a control-flow instruction?
No comments:
Post a Comment
thanks for comment :) this will answer after approval