A technique for minimizing the effect of control dependencies is to separate the point where the branch operation takes effect from the branch tests. The branch instruction performs a test on a branch condition. If the test succeeds, the PC is modified, but the modification does not take effect immediately. This delayed branch allows one or more instructions following the branch to be executed in the pipeline whether the branch is taken or not.
In the MIPS CPU, the branch operation is delayed by one instruction. The MAL assembler hides the delayed branch by inserting an instruction after each branch or jump. The instruction following a branch or jump is called the delay slot. By default the assembler inserts an instruction which does nothing, a no-op.
In previous sections describing the branch instruction, it was stated that the PC was incremented when the branch was fetched and therefore the branch offset is relative to the instruction after the branch. The delayed branch means that the instruction following the branch is always executed before the PC is modified to perform the branch.
The delayed branch is a difficult topic to grasp. In the DLX 5-stage pipeline we have found it easy to misunderstand the purpose of filling the branch delay slot with a single necessary instruction. Our focus is to remove the mystery of delayed branches with examples and explanations that clarify the topic. We will consider the case where machines with delayed branches have a single instruction delay, as the Hennessey and Patterson book explains in great detail.
In some examples, it is hard to figure out why certain instructions should be placed after the branch. Also, it might be confusing to some that only one instruction would absorb the stall that would normally occur while a branch instruction is executed.
With the help of key term definitions, it will be easier to learn how to unroll a loop as well as reschedule it. Then, determine which instruction