| Cache
Mapping and Associativity
A very important factor in determining
the effectiveness of the level
2 cache relates to how the cache
is mapped to the system memory.
What this means in brief is that
there are many different ways
to allocate the storage in our
cache to the memory addresses
it serves. Let's take as an example
a system with 512 KB of L2 cache
and 64 MB of main memory. The
burning question is: how do we
decide how to divvy up the 16,384
address lines in our cache amongst
the "huge" 64 MB of
memory?
There are three different ways
that this mapping can generally
be done. The choice of mapping
technique is so critical to the
design that the cache is often
named after this choice:
Direct Mapped Cache:
The simplest way to allocate
the cache to the system memory
is to determine how many cache
lines there are (16,384 in
our example) and just chop
the system memory into the
same number of chunks. Then
each chunk gets the use of
one cache line. This is called
direct mapping. So if we have
64 MB of main memory addresses,
each cache line would be shared
by 4,096 memory addresses
(64 M divided by 16 K). |
Fully Associative
Cache:
Instead of hard-allocating
cache lines to particular
memory locations, it is possible
to design the cache so that
any line can store the contents
of any memory location. This
is called fully associative
mapping. |
N-Way
Set Associative Cache:
"N" here is
a number, typically 2, 4,
8 etc. This is a compromise
between the direct mapped
and fully associative designs.
In this case the cache is
broken into sets where each
set contains "N"
cache lines, let's say 4.
Then, each memory address
is assigned a set, and can
be cached in any one of those
4 locations within the set
that it is assigned to. In
other words, within each set
the cache is associative,
and thus the name. |
This design means that there
are "N" possible places
that a given memory location may
be in the cache. The tradeoff
is that there are "N"
times as many memory locations
competing for the same "N"
lines in the set. Let's suppose
in our example that we are using
a 4-way set associative cache.
So instead of a single block of
16,384 lines, we have 4,096 sets
with 4 lines in each. Each of
these sets is shared by 16,384
memory addresses (64 M divided
by 4 K) instead of 4,096 addresses
as in the case of the direct mapped
cache. So there is more to share
(4 lines instead of 1) but more
addresses sharing it (16,384 instead
of 4,096).
Conceptually, the direct mapped
and fully associative caches are
just "special cases"
of the N-way set associative cache.
You can set "N" to 1
to make a "1-way" set
associative cache. If you do this,
then there is only one line per
set, which is the same as a direct
mapped cache because each memory
address is back to pointing to
only one possible cache location.
On the other hand, suppose you
make "N" really large;
say, you set "N" to
be equal to the number of lines
in the cache (16,384 in our example).
If you do this, then you only
have one set, containing all of
the cache lines, and every memory
location points to that huge set.
This means that any memory address
can be in any line, and you are
back to a fully associative cache.
Comparison of Cache Mapping
Techniques
There is a critical tradeoff
in cache performance that has
led to the creation of the various
cache mapping techniques described
in the previous section. In order
for the cache to have good performance
you want to maximize both of the
following:
Hit Ratio: You
want to increase as much as possible
the likelihood of the cache containing
the memory addresses that the
processor wants. Otherwise, you
lose much of the benefit of caching
because there will be too many
misses.
Search Speed:
You want to be able to determine
as quickly as possible if you
have scored a hit in the cache.
Otherwise, you lose a small amount
of time on every access, hit or
miss, while you search the cache.
Now let's look at the three cache
types and see how they fare:
Direct Mapped Cache:
The direct mapped cache is the
simplest form of cache and the
easiest to check for a hit. Since
there is only one possible place
that any memory location can be
cached, there is nothing to search;
the line either contains the memory
information we are looking for,
or it doesn't.
Unfortunately, the direct mapped
cache also has the worst performance,
because again there is only one
place that any address can be
stored. Let's look again at our
512 KB level 2 cache and 64 MB
of system memory. As you recall
this cache has 16,384 lines (assuming
32-byte cache lines) and so each
one is shared by 4,096 memory
addresses. In the absolute worst
case, imagine that the processor
needs 2 different addresses (call
them X and Y) that both map to
the same cache line, in alternating
sequence (X, Y, X, Y). This could
happen in a small loop if you
were unlucky. The processor will
load X from memory and store it
in cache. Then it will look in
the cache for Y, but Y uses the
same cache line as X, so it won't
be there. So Y is loaded from
memory, and stored in the cache
for future use. But then the processor
requests X, and looks in the cache
only to find Y. This conflict
repeats over and over. The net
result is that the hit ratio here
is 0%. This is a worst case scenario,
but in general the performance
is worst for this type of mapping.
Fully Associative Cache:
The fully associative cache has
the best hit ratio because any
line in the cache can hold any
address that needs to be cached.
This means the problem seen in
the direct mapped cache disappears,
because there is no dedicated
single line that an address must
use.
However (you knew it was coming),
this cache suffers from problems
involving searching the cache.
If a given address can be stored
in any of 16,384 lines, how do
you know where it is? Even with
specialized hardware to do the
searching, a performance penalty
is incurred. And this penalty
occurs for all accesses to memory,
whether a cache hit occurs or
not, because it is part of searching
the cache to determine a hit.
In addition, more logic must be
added to determine which of the
various lines to use when a new
entry must be added (usually some
form of a "least recently
used" algorithm is employed
to decide which cache line to
use next). All this overhead adds
cost, complexity and execution
time.
N-Way Set Associative
Cache:
The set associative cache is a
good compromise between the direct
mapped and set associative caches.
Let's consider the 4-way set associative
cache. Here, each address can
be cached in any of 4 places.
This means that in the example
described in the direct mapped
cache description above, where
we accessed alternately two addresses
that map to the same cache line,
they would now map to the same
cache set instead. This set has
4 lines in it, so one could hold
X and another could hold Y. This
raises the hit ratio from 0% to
near 100%! Again an extreme example,
of course. As for searching, since
the set only has 4 lines to examine
this is not very complicated to
deal with, although it does have
to do this small search, and it
also requires additional circuitry
to decide which cache line to
use when saving a fresh read from
memory. Again, some form of LRU
(least recently used) algorithm
is typically used.
Here's a summary table of the
different cache mapping techniques
and their relative performance:
Cache
Type
|
Hit
Ratio |
Search
Speed |
| Direct
Mapped |
Good |
Best |
| Fully
Associative
|
Best |
Moderate |
| N-Way
Set Associative |
Very Good,
Better as N Increases
|
Good,
Worse as N Increases |
|
 |