The
architecture of Pentium Microprocessor:
Question 2. Discuss
the architectural trends in
today’s microprocessor.
Introduction to Today’s
Microprocessor trends
From their humble beginning
25 years ago, microprocessors
have proliferated into an astounding
range of chips, powering devices
ranging from telephones to supercomputers.
Today, microprocessors for personal
computers get widespread attention--and
have enabled Intel to become
the world's largest semiconductor
maker. In addition, embedded
microprocessors are at the heart
of a diverse range of devices
that have become staples of
affluent consumers worldwide.
The past decade has seen the
evolution of microprocessor
packaging from a simple protective
scheme to a complex combination
of different elements that enable
microprocessor performance while
still providing the basic function
of protection. Packaging today's
microprocessor on the one hand
entails tailoring the package
to enable microprocessor performance,
a complex task considering the
rapid rate of microprocessor
performance growth. This challenge
is in terms of schedule and
technical complexity. On the
other hand, the package forms
the interface between the microprocessor
and the external world of the
motherboard and the computing
system. In this capacity, package
design must allow for an easy
interface and must meet a diverse
set of form factor requirements.
The package provides a conduit
for the microprocessor through
a space transformation allowing
small-scale features on the
silicon to be electrically connected
to the external environment.
This is a challenging geometrical
problem and requires that packaging
interconnection densities must
closely track the evolution
of microprocessor interconnection
densities. In connecting the
die to the motherboard, the
package must also ensure that
the connections do not unduly
inhibit the microprocessor performance
by introducing unnecessary electrical
impediments usually referred
to as package “parasitics.”
As microprocessors have evolved,
they have increased in speed,
which in turn needs increasingly
sophisticated power delivery
schemes. Another consequence
of microprocessor evolution
has been increasing power dissipation.
Package design must now provide
a path for thermal dissipation,
requiring a better understanding
of the thermal characteristics
of packaging materials and design.
Package design also requires
a good understanding of the
structural characteristics of
the package to ensure it is
designed for reliability and
robustness. Attention is increasingly
focussed today on understanding
the electrical, thermal and
mechanical characteristics of
packaging to optimize all these
aspects.
The package is also the interface
that connects the microprocessor
to the motherboard. In this
capacity it must have a compatible
interface to allow for easy
acceptance on the motherboard
as well as the system design.
The form factor of the package
is a critical element for easy
interface to the motherboard.
The requirements are usually
different in different market
segments and often drive the
need for form factors that are
tailored to these different
segments. For instance, the
height of the package is critical
to enable a microprocessor in
a mobile market where a slim
and low weight package is critical
to success. On the other hand,
the ability to dissipate high
power, and hence features that
enables this, are critical in
a server or desktop market segment.
Cost, compatibility and fit
within the computer system are
key parameters that must be
designed for in making a microprocessor
successful. This challenges
us into concurrently developing
multiple solutions and technologies
geared towards specific market
segments.
Aside from the challenges of
package design, there is a need
to develop efficient and cost-effective
manufacturing processes that
allow us to meet the schedule
and volume demands of today's
market places. These have presented
us with interesting challenges
in understanding the manufacturability,
testability and reliability
of packaging. Some of these
issues are discussed in greater
detail in this issue.
MAJC is an example of the design
architecture of today’s
microprocessor. We can easily
related to the current trend
by analysing the architecture
of MAJC .
MAJC (pronounced "magic")
is an acronym for "Microprocessor
Architecture for Java Computing."
MAJC is a microprocessor architecture
designed to meet the broadband
demands of the 21st century.
Addressing the challenge of
high bandwidth and the need
for state-of-the-art computational
performance, MAJC architecture
is characterized by:
· Scalability to take
full advantage of advances in
semiconductor technology.
· Broad scalabilty to
systems with large numbers of
processors.
· A new standard of
performance for applications
with DSP or New Media computational
needs.
· Focus on bandwidth
throughput.
Processor Needs into
the 21st Century
Several microprocessor trends
were identified and accommodated
in the design of the MAJC Architecture:
· Convergence of communication
media and computers (audio,
video, and data) require processors
to compute information at wire
speed.
· Advancements in semiconductor
technology will provide rapidly-increasing
resources on each microprocessor
chip.
· As microprocessors
are used in increasingly disparate
applications from smart cards
to supercomputers there is great
value in the ability to create
a wide span of implementations
from a given processor architecture.
· Software, over time,
will become independent of specific
instruction sets; Just-In-Time
(JIT) compilation techniques
are expected to predominate
for general-purpose processors
and eliminate binary compatibility
issues.
· Bandwidth between
processors, memory, and I/O
devices needs to be available
to move information in real-time.
· The content processed
by computers is becoming increasingly
media-rich; DSP-like functions
are required to process this
media content.
Features of Today’s Microprocessors.
· Modular Architecture
To support the creation of a
wide range of implementations
the architecture supports modular
implementations. A basic implementation
might comprise a single processor
unit with four functional units.
By replicating those design
elements, an implementation
can be built that includes a
few or even hundreds of processors,
each with four functional units,
each of which can operate on
many data items simultaneously
with parallel-operation (SIMD)
instructions. Conversely, a
tiny application-specific implementation
can be derived from the basic
one by trimming the complement
of functional units down to
one or two and/or removing hardware
support for any instructions
not needed in its target application.
· Software Portability
The architecture was designed
to efficiently execute code
generated by installation-time
or just-in-time (JIT) compilation
techniques. It may be the first
commercial architecture designed
without a requirement for binary
compatibility between generations.
This allows implementations
to evolve over time without
accumulating the baggage required
to support old binaries, as
traditional architectures have
always done. Instead, software
portability across implementations
is obtained through use of architecture-neutral
means of software distribution.
· Multiple Levels
of Parallelism
The architecture provides the
ability to exploit parallelism
at many levels - at the data
word level through SIMD instructions,
at the instruction level through
multiple functional units per
processor, at the thread-of-execution
level through support for multithreaded
software, and at the system
level through its intrinsic
support for "MPs-on-a-chip"
(multiple processor units per
implementation). A implementation
with more than one functional
unit per processor unit provides
MSIMD: multiple single-instruction
multiple-data parallelism.
· Multiple
Processor Units per Cluster
Although a MAJC implementation
can be a single processor unit,
the architecture explicitly
incorporates the concept of
multiple processors per implementation.
Given 21st century semiconductor
density, each such array of
processor units or "processor
cluster" can be implemented
on a single chip. As semiconductor
technology advances, clusters
with more processors per chip
can be implemented.
· Multiple Functional
Units per Processor Unit
Every MAJC processor unit can
issue multiple instructions
simultaneously, one to each
of its functional units. Most
implementations are expected
to provide two to four functional
units per processor unit.
· Multithreaded
Software
Execution of multithreaded software
comes naturally given the architecture's
ability to execute multiple
threads simultaneously on multiple
processor units. MAJC implementations
with hardware support for vertical
microthreading can efficiently
execute multiple threads on
each processor unit.
· SIMD Instructions
At the lowest level of parallelism,
MAJC architecture provides SIMD
(Single Instruction/ Multiple
Data) or "vector"
instructions. A SIMD instruction
executing in a single functional
unit could perform the same
operation on multiple data items
simultaneously.
· Integral Support
for Media-Rich Data
The MAJC architecture is particularly
well-suited for processing media-rich
content because it directly
supports common media data types
and can process multiple simultaneous
operations on that data. Processing
power is multiplied on three
levels: Single Instruction/Multiple
Data (SIMD) DSP-like instructions
in each functional unit, multiple
functional units per processor
unit, and multiple processor
units per processor cluster.
· Balanced Performance:
Processor versus Memory and
I/O
A MAJC implementation is designed
to utilize several techniques
to balance processor speed with
access to external memory and
I/O devices:
· 100's of general-purpose
registers per processor unit,
which reduce the frequency of
memory accesses
· Load-Group instructions,
which increase bandwidth into
the processor by simultaneously
loading multiple registers from
memory or an I/O device
· Store buffering, which
increases bandwidth out of the
processor by optimizing Store
operations initiated by software
· Data Type-Independent
Registers
The general-purpose register
file in a MAJC implementation
is datatype-agnostic: any register
can hold information of any
data type and be accessed by
any instruction. In particular,
there is no distinction between
integer and floating-point registers.
This allows registers to be
allocated as needed by each
application, without restrictions
imposed by hardware partitioning
of the register set.
· Instruction
Grouping
Grouping instructions across
multiple functional units can
be performed dynamically in
hardware (as in a superscalar
processor), statically by a
compiler, or by some combination
of the two. Rather than devoting
valuable chip area to hardware
grouping logic, MAJC relies
primarily on software compilers
to group instructions across
functional units.
· Data and Address
Size
A MAJC implementation may implement
either 32- or 64-bit addressing
and data operations, as dictated
by the needs of its target applications.
· Context Switch
Optimization
Process (task) context switch
time can be reduced by using
the architecture's "register
dirty bits", which allow
an operating system to minimize
the number of registers saved
and restored during a context
switch.
Memory Byte Order
The MAJC architecture's native
byte order is "big-endian";
that is, multibyte values are
stored in memory with the most
significant byte at the lowest
address and the least significant
byte at the highest address.
However, a MAJC implementation
can manipulate data stored in
any memory byte-order (notably
"little-endian").
The BYTESHUFFLE instruction
can reorder bytes efficiently
in an arbitrary manner. Also,
an implementation may define
an Alternate Space Identifier
(ASI) dedicated to performing
automatic byte reordering whenever
corresponding Load and Store
from Alternate Address Space
instructions are executed.