انت هنا الان : شبكة جامعة بابل > موقع الكلية > نظام التعليم الالكتروني > مشاهدة المحاضرة

Lec 15 - Pipelining

Share |
الكلية كلية العلوم للبنات     القسم قسم الحاسبات     المرحلة 1
أستاذ المادة احمد محمد حسين الغزالي       07/04/2019 06:59:26
1. PIPELINING
In each execution cycle the control unit would have to wait until the instruction is fetched from memory. Furthermore, the ALU would have to wait until the required operands are fetched from memory. Processor speeds are increasing at a much faster rate than the improvements in memory speeds. Thus, we would be wastingنخسر the control unit and ALU resources by keeping them idle while the system fetches instructions and data. How can we avoid this situation? Let’s suppose that we can prefetch the instruction. That is, we read the instruction before the control unit needs it. These prefetched instructions are typically placed in a set of registers called the prefetch buffers. Then, the control unit doesn’t have to wait. How do we do this prefetch? Given that the program execution is sequential, we can prefetch the next instruction in sequence while the control unit is busy decoding the current instruction. Pipelining generalizes this concept of overlapped execution. Similarly, prefetching the required operands avoids the idle time experienced by the ALU.

Figure 1 shows how pipelining helps us improve the efficiency. The instruction execution can be divided into five parts. In pipelining terminology, each part is called a stage. For simplicity, let’s assume that execution of each stage takes the same time (say, one cycle). As shown in Figure 1, each stage spendsيقضي بالوقت one cycle in executing its part of the execution cycle and passes the instruction on to the next stage. Let’s trace the execution of this pipeline during the first few cycles. During the first cycle, the first stage S1 fetches the instruction. All other stages are idle. During Cycle 2, S1 passes the first instruction I1 to stage S2 for decoding and S1 initiates the next instruction fetch. Thus, during Cycle 2, two of the five stages are busy: S2 decodes I1 while S1 is busy with fetching I2. During Cycle 3, stage S2 passes instruction I1 to stage S3 to fetch any required operands. At the same time, S2 receives I2 from S1 for decoding while S1 fetches the third instruction. This process is repeated in each stage. As you can see, after four cycles, all five stages are busy. This state is called the pipeline full condition. From this point on, all five stages are busy.









This Figure 1 clearly shows that the execution of instruction I1 is completed in Cycle 5. Thus, executing six instructions takes only 10 cycles. Without pipelining, it would have taken 30 cycles.
Notice from this description that pipelining does not speed up execution of individual instructions; each instruction still takes five cycles to execute. However, pipelining increases the number of instructions executed per unit time; that is, instruction throughput increases.
Pipelining a computer architecture designed so that all parts of circuit are always working, so that no part of the circuit is stalled waiting from another part. Pipelining allows overlapped execution to improve throughput.

1.1. STALLS IN PIPELINE
Unfortunately, the scenario presented in the previous section is a little too simplistic. There are two drawbacks to that simple pipeline: bus contention among instructions and non-sequential program execution. Both problems may increase the average execution time of the instructions in the pipeline.
Bus contention occurs whenever an instruction needs to access some item in memory. For example, if a "mov( reg, mem);" instruction needs to store data in memory and a "mov( mem, reg);" instruction is reading data from memory, contention for the address and data bus may develop since the CPU will be trying to simultaneously fetch data and write data in memory. One simplistic way to handle bus contention is through a pipeline stall. The CPU, when faced with contention for the bus, gives priority to the instruction furthest along in the pipeline. The CPU suspends fetching opcodes until the current instruction fetches (or stores) its operand. This causes the new instruction in the pipeline to take two cycles to execute rather than one (see Figure 2).

T1 T2 T3 T4 T5 T6 T7 T8 T9
Opcode Decode Address Values Compute Store Instruction #1
Opcode Decode Address Values Compute store Instruction #2
Opcode decode address
Values Compute store
Pipeline stall occurs here because instruction #3 appears
Instruction #1 is attempting to store a value to take two clock cycles
To memory at the same time instruction #2 is to execute because of
Attempting to read a value from memory the pipeline stall

FIGURE 2.A PIPELINE STALL


Fortunately, the intelligent use of a cache system can eliminate many pipeline stalls like the ones discussed above. However, it is not always possible, even with a cache, to avoid stalling the pipeline. What you cannot fix in hardware, you can take care of with software. If you avoid using memory, you can reduce bus contention and your programs will execute faster. Likewise, using shorter instructions also reduces bus contention and the possibility of a pipeline stall.


المادة المعروضة اعلاه هي مدخل الى المحاضرة المرفوعة بواسطة استاذ(ة) المادة . وقد تبدو لك غير متكاملة . حيث يضع استاذ المادة في بعض الاحيان فقط الجزء الاول من المحاضرة من اجل الاطلاع على ما ستقوم بتحميله لاحقا . في نظام التعليم الالكتروني نوفر هذه الخدمة لكي نبقيك على اطلاع حول محتوى الملف الذي ستقوم بتحميله .
الرجوع الى لوحة التحكم