Wednesday, 8 April 2020

What is Pipelining in Computer Architecture?

Pipelining: Basic Idea

More systematically:

Pipeline the execution of multiple instructions
Analogy: “Assembly line processing” of instructions

Idea:

Divide the instruction processing cycle into distinct “stages” of processing
Ensure there are enough hardware resources to process one instruction in each stage
Process a different instruction in each stage
>>Instructions consecutive in program order are processed in consecutive stages
>>Benefit: Increases instruction processing throughput (1/CPI)
>>Downside: Start thinking about this…






An Ideal Pipeline

n  Goal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing)
n  Repetition of identical operations
  •       The same operation is repeated on a large number of different inputs (e.g., all laundry loads go through the same steps)
n  Repetition of independent operations
  •       No dependencies between repeated operations
n  Uniformly partitionable suboperations 
  •       Processing can be evenly divided into uniform-latency suboperations (that do not share resources)

n  Fitting examples: automobile assembly line, doing laundryWhat about the instruction processing cycle?







Pipelining Instruction Processing

Remember: The Instruction Processing Cycle

1. Instruction fetch (IF)
2. Instruction decode and
    register operand fetch (ID/RF)
3. Execute/Evaluate memory address (EX/AG)
4. Memory operand fetch (MEM)
5. Store/writeback result (WB) 











Remember: An Ideal Pipeline

nGoal: Increase throughput with little increase in cost (hardware cost, in case of instruction processing)
nRepetition of identical operations
     qThe same operation is repeated on a large number of different inputs (e.g., all laundry loads go through the same steps)
nRepetition of independent operations
     qNo dependencies between repeated operations
nUniformly partition able sub-operations
     qProcessing an be evenly divided into uniform-latency suboperations (that do not share resources)
nFitting examples: automobile assembly line, doing laundry 
   qWhat about the instruction processing cycle?

Instruction Pipeline: Not An Ideal Pipeline

nIdentical operations ... NOT!
    Þ different instructions à not all need the same stages
    Forcing different instructions to go through the same pipe stages
à external fragmentation (some pipe stages idle for some instructions)
à
nUniform suboperations  ...  NOT!
    Þ different pipeline stages à not the same latency
            Need to force each stage to be controlled by the same clock
à internal fragmentation (some pipe stages are too fast but all take the same clock cycle time)
à
nIndependent operations ... NOT!
    Þ instructions are not independent of each other
   Need to detect and resolve inter-instruction dependencies to ensure the pipeline provides correct results
à pipeline stalls (pipeline is not always moving)

Issues in Pipeline Design
nBalancing work in pipeline stages
qHow many stages and what is done in each stage
n
nKeeping the pipeline correct, moving, and full in the presence of events that disrupt pipeline flow
qHandling dependences
nData
nControl
qHandling resource contention
qHandling long-latency (multi-cycle) operations
n
nHandling exceptions, interrupts
n
nAdvanced: Improving pipeline throughput
qMinimizing stalls

Causes of Pipeline Stalls
nStall: A condition when the pipeline stops moving
nResource contention
nDependences (between instructions)
qData
qControl
q Long-latency (multi-cycle) operations

Dependences and Their Types
Also called dependency or less desirably hazard
Dependences dictate ordering requirements between instructions
nTwo types
qData dependence
qControl dependence
nResource contention is sometimes called resource dependence
qHowever, this is not fundamental to (dictated by) program semantics, so we will treat it separately

Handling ResourceContention


Happens when instructions in two pipeline stages need the same resource

Solution 1: Eliminate the cause of contention
Duplicate the resource or increase its throughput
E.g., use separate instruction and data memories (caches)
E.g., use multiple ports for memory structures

Solution 2: Detect the resource contention and stall one of the contending stages
Which stage do you stall?
Example: What if you had a single read and write port for the register file?

Data Dependences
nTypes of data dependences
qFlow dependence (true data dependence – read after write)
qOutput dependence (write after write)
qAnti dependence (write after read)
q
nWhich ones cause stalls in a pipelined machine?
qFor all of them, we need to ensure semantics of the program is correct
qFlow dependences always need to be obeyed because they constitute true dependence on a value
qAnti and output dependences exist due to limited number of architectural registers
nThey are dependence on a name, not a value
nWe will later see what we can do about them


Data Dependence Types

Flow dependence
r3        ¬  r1  op  r2              Read-after-Write 
r5   ¬  r3  op  r   (RAW)
Anti dependence
r3   ¬   r1  op  r2   Write-after-Read
r1   ¬   r4  op  r5    (WAR)

Output-dependence
r3   ¬  r1  op  r2    Write-after-Write
r5   ¬  r3  op  r4    (WAW)
r3   ¬  r6  op  r7 

Data Dependence Handling

How to Handle Data Dependences 


nAnti and output dependences are easier to handle
qwrite to the destination in one stage and in program order
n
nFlow dependences are more interesting
n
nFive fundamental ways of handling flow dependences
qDetect and wait until value is available in register file
qDetect and forward/bypass data to dependent instruction
qDetect and eliminate the dependence at the software level
nNo need for the hardware to detect dependence
qPredict the needed value(s), execute speculatively, and verify
qDo something else (fine-grained multithreading)
nNo need to detect

Interlocking
nDetection of dependence between instructions in a pipelined processor to guarantee correct execution
n
nSoftware based interlocking
    vs.
nHardware based interlocking
n
nMIPS acronym?

Approaches to Dependence Detection (I)
nScoreboarding
qEach register in register file has a Valid bit associated with it
qAn instruction that is writing to the register resets the Valid bit
qAn instruction in Decode stage checks if all its source and destination registers are Valid
nYes: No need to stall… No dependence
nNo: Stall the instruction
q
nAdvantage:
qSimple. 1 bit per register

nDisadvantage:
qNeed to stall for all types of dependences, not only flow dep.


Approaches to Dependence Detection (II)
nCombinational dependence check logic
qSpecial logic that checks if any instruction in later stages is supposed to write to any source register of the instruction that is being decoded
qYes: stall the instruction/pipeline
qNo: no need to stall… no flow dependence
q
nAdvantage:
qNo need to stall on anti and output dependences
n
nDisadvantage:
qLogic is more complex than a scoreboard
qLogic becomes more complex as we make the pipeline deeper and wider (flash-forward: think superscalar execution)


Once You Detect the Dependence in Hardware
nWhat do you do afterwards?
n
nObservation: Dependence between two instructions is detected before the communicated data value becomes available
n
nOption 1: Stall the dependent instruction right away
nOption 2: Stall the dependent instruction only when necessary à data forwarding/bypassing


Data Forwarding/Bypassing
nProblem: A consumer (dependent) instruction has to wait in decode stage until the producer instruction writes its value in the register file
n
nGoal: We do not want to stall the pipeline unnecessarily
n
nObservation: The data value needed by the consumer instruction can be supplied directly from a later stage in the pipeline (instead of only from the register file)
n
nIdea: Add additional dependence check logic and data forwarding paths (buses) to supply the producers value to the consumer right after the value is available
n
nBenefit: Consumer can move in the pipeline until the point the value can be supplied à less stalling

A Special Case of Data Dependence

Control Dependence

nQuestion: What should the fetch PC be in the next cycle?
nAnswer: The address of the next instruction
qAll instructions are control dependent on previous ones. Why?
n
nIf the fetched instruction is a non-control-flow instruction:
qNext Fetch PC is the address of the next-sequential instruction
qEasy to determine if we know the size of the fetched instruction
nIf the instruction that is fetched is a control-flow instruction:
qHow do we determine the next Fetch PC?
q
nIn fact, how do we know whether or not the fetched instruction is a control-flow instruction?






No comments:

Post a Comment

thanks for comment :) this will answer after approval