Chip multi-processors using a micro-threaded model
Hasasneh, Nabil M.
Thesis or dissertation
- © 2006 Nabil M Hasasneh. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright holder.
Most microprocessor chips today use an out-of-order (OOO) instruction execution mechanism. This mechanism allows superscalar processors to extract reasonably high levels of instruction level parallelism (lLP). The most significant problem with this approach is a large instruction window and the logic to support instruction issue from it. This includes generating wake-up signals to waiting instructions and a selection mechanism for issuing them. Wide-issue width also requires a large multi-ported register file, so that each instruction can read and write its operands simultaneously. Neither structure scales well with issue width leading to poor performance relative to the gates used. Furthermore, to obtain this ILP, the execution of instructions must proceed speculatively.
An alternative, which avoids this complexity in instruction issue and eliminates speculative execution, is the microthreaded model. This model fragments sequential code at compile time and executes the fragments OOO while maintaining in-order execution within the fragments. The fragments of code are called microthreads and they capture ILP and loop concurrency. Fragments can be interleaved on a single processor to give tolerance to latency in operands or distributed to many processors to achieve speedup. The major advantage of this model is that it provides sufficient information to implement a penalty free distributed register file organisation.
However, the scalability of the microthreaded register file in terms of the number of required logical read and write ports is not clear yet. In this thesis, we looked at the distribution and frequency of access to the asynchronous (non-pipeline) ports in the synchronising memory and provide a detail analysis and evaluation of this issue. It concluded, using an analysis of a range of different code kernel, that a distributed shared synchronising memory could be implemented with 5-ports per processor, where three ports provided single instruction issue per cycle and the other two asynchronous ports were able to manage all other demands on the local register file.
Also, in the microthreaded CMP a broadcast bus is used for thread creation and to replicate the compiler-defined global state to each processor's local register file. This is done instead of accessing a centralised register file for global variables. The key problem is that, accessing this bus by multiple processors simultaneously caused contention and unfair communication between processors. Therefore, to avoid processor contention and to take the advantages of asynchronous communication, this thesis presents a scalable and partitionable asynchronous bus arbiter for use with chip multiprocessors (eMP) and its corresponding pre-layout simulation results using VHDL. It is shown in this thesis that this arbiter can be extended easily to support large numbers of processors and can be used for chip multiprocessor arbitration purposes. Furthermore, the microthreaded model requires dynamic register allocation and a hardware scheduler, which can support hundreds of microthreads per processor and their associated microcontexts. The scheduler must support thread creation, context switching and thread rescheduling on every machine cycle to fully support this model, which is a significant challenge. In this thesis, scalable implementations and evaluation of these support structures are presented and the feasibility of large-scale CMPs is investigated by giving detailed area estimate of these structures using 0.07-micron technology.
- Department of Engineering, The University of Hull
- Jesshope, C. R.; Bell, Ian M.
- Sponsor (Organisation)
- Jāmiʻat al-Khalīl; Israel. Miśrad ha-ḥinukh ṿeha-tarbut
- Ethos identifier
- Qualification level
- Qualification name
- 29 MB