ASSIGNMENT – 3


Q1: Given the following fragment:


Loop: LD R1, 0(R2) ; load R1 from address 0+R2

DADDI R1, R1, #1 ; R1 = R1 + 1

SD R1, 0(R2) ; store R1 at address 0+R2

DADDI R2, R2, #4 ; R2 = R2 + 4

DSUB R4, R3, R2 ; R4 = R3 – R2

BNEZ R4, Loop ; branch to loop if R4 != 0


Assume that the initial value of R3 is R2 + 16 and that branch target address calculation and condition evaluation are completed by end of decode stage.


Answer the following:


  1. Identify all the data dependences (this includes true data dependence, name dependence and output dependence) in the code above whether or not they cause hazards.

  2. Show the timing of this instruction sequence for the 5-stage RISC pipeline without any forwarding but assuming that a register write and read to the register file in the same clock cycle “forwards” the value. Assume that the branch is handled by flushing the pipeline (i.e., all instructions in the pipeline after branch fetch are deleted and fresh instruction fetch happens after branch condition and target address are decided). If each memory reference takes 1 cycle, how many cycles does this loop take to execute.

  3. If full forwarding is implemented, show the timing for the 5-stage pipeline. Further, assume that the branch is handled by predicting it as not taken. How many cycles does this loop take to execute with these conditions?

  4. Assume branch is predicted as taken and draw the timing for the 5-stage pipeline. How many cycles does the loop take to execute with this change?


Q2: Assume that we are using the floating point pipeline with the FP adder taking 4 cycles, FP multiplier taking 7 cycles, the unpipelined divider taking 25 cycles and the regular integer execute stage taking 1 cycle.

  1. Give an example (other than the example in the book) where we end up with a WAW hazard.

  2. Give an example where there will be structural hazard for register file and DM. Remember that regular add/multiply etc. do NOT need any access to DM in MEM stage. The MEM stage is used only by Load/Store instructions.


Q3: A computer implemented with a single cycle implementation has a clock cycle time of 7 ns. Then, the stages are split by functionality but the stages take different amount of time. The times are as follows: IF – 1ns, ID – 1.5 ns, EX – 1ns, MEM – 2ns and WB – 1.5ns. The pipeline register delay is 0.1ns. Anwer the following:

  1. What is the clock cycle time of the 5-stage pipeline?

  2. If there is a stall every 4 instructions, what is the CPI of the new implementation?

  3. What is the speedup of the pipelined machine over the single cycle implementation?


Q4: Consider the usage of critical word first and early restart on L2 cache misses. The L2 cache has 64B blocks and a refill path of 16B wide (i.e., it transfers 16B at a time from the main memory to the L2 cache). The time taken to receive the first 16 byte block from main memory takes 120 cycles and each additional 16B block take another 16 cycles. How many more cycles would it take to service an L2 cache with and without critical word first and early restart? Assume the data needed is in the very first 16B block for early restart and anywhere within the block for critical word first. If the data needed is in the 3rd 16B block, what will be the time taken for early restart and critical word first?