Pipelining in ARM
ARM Pipelining
The method by which RISC (Reduced Instruction Set Computer) processors carry out instructions is known as "pipelining”. To speed up the execution, the instructions are retrieved while other instructions are being similarly decoded and executed. In turn, it enables the CPU and memory system to operate constantly. Each ARM family has unique pipeline architecture.
Improving the efficiency of information processing in the processor of a computer and microcontroller is made possible by the design method or process known as pipelining. ARM devices require pipelining because RISC places a great focus on compiler complexity. Every stage corresponds to one cycle, therefore n stages = n cycles.
Working of Pipeline
- An instruction is loaded from memory through fetch.
- Decode the instruction that needs to be carried correctly.
- The command is processed by execute, which then sends the outcome back to the register.
- The instruction is processed by execute, and the outcome is then written back to the register.
- Execution speed is boosted by overlapping the aforementioned steps of various instructions.
- Increased throughput is achieved by the core being able to execute an instruction every cycle thanks to pipelining.
ARM pipeline characteristics
- Until an instruction has completed the execution step, it is not processed by the ARM pipeline.
- The PC always refers to the instruction address plus 8 bytes throughout the execution phase.
- In the thumb state of the processor, PC always refers to the instruction address plus four bytes.
- The ARM core flushes its pipeline when executing branch instructions or branching directly from PC.
- Even though an interrupt has been raised, an instruction that is currently in the execution stage will finish.
ARM 7
- As seen in the figure, it contains three stages of pipelining.
- It can finish the procedure in three cycles.
- It has the fundamental F&E cycle, which promotes maximum throughput.
- The ARM 7 has the lowest throughput relative to the other members of its family because of this.
- It handles 32-bit data.
ARM 9
- Pipelining in ARM 9 resembles ARM 7 but has 5 phases instead. There are 5 cycles to finish the procedure:
- Fetch- This command will retrieve instructions from memory.
- Decode- The instructions that were fetched during the first cycle are decoded.
- ALU- It then puts into action the instruction that was previously deciphered.
- LS1 (Memory)- the data supplied by the load or store instructions is loaded or stored.
- LS2 (Write)- It extracts (zero or sign) and expands the data loaded by a byte or half-word load instruction.
- Throughput is 10% to 13% greater than ARM 7 due to an improvement in phases and efficiency.
- The core frequency of the ARM 9 is somewhat greater than the ARM 7's.
ARM 10
- A six step pipeline is required. This requires six cycles to finish the process.
- Similar to ARM 9, but with an issue stage that determines if the instruction is prepared to be decoded in the present stage or not.
- Compared to ARM 7, its throughput is almost two times higher.
- The core's frequency exceeds that of ARM 9's.
Note: According on the number of instruction sets processed every cycle, the pipelining steps may increase or decrease.