| Direct
Mapped Cache
Direct mapped cache works like
this. Picture cache as an array
with elements. These elements
are called "cache blocks."
Each cache block holds a "valid
bit" that tells us if anything
is contained in the line of this
cache block or if the cache block
has not yet had any memory put
into it. There is also a block
of data called a "line"
which stores a recently used block
of data. Perhaps most importantly,
there is a number called a "tag"
composed of more significant bits
of the memory address from which
the line's data came from.
You can think about the direct
mapped cache this way. Each row
in the table to the left represents
a cache block. We have our valid
bit which tells us if this cache
block currently holds data. Next
is our tag, a number that tell
us where in memory this bit is
from. After that, we have our
line, which is the data that we
have stored in cache.
The number to the right is just
the cache index. Cache blocks
are indexed like elements in an
array, and there is a cache index
associated with each cache block.
As you see, I show blocks N through
N+4, where N is some integer such
that these are valid cache blocks
for this particular cache.
As a nota bene, you must know
that for dirct mapped cache, cache
sizes and line sizes (and thus
the number of cache blocks) are
made in powers of two. It works
out better for the hardware design
if this is the case. Therefore,
we're not going to see direct
mapped cache's with 15 kilobytes,
or 700 bytes, or what have you.
We're working only in powers of
two for direct mapped cache.
For a direct mapped cache, suppose
we have a cache size of 2M bytes
with a line size of 2L bytes.
That means that there are 2M-L
cache blocks. Now, we use the
address of the data we're trying
to access to determine which of
these cache blocks we should look
at to determine if the data we
want is already in memory.
Working in the Fields
32-M
bits |
M-L
bits |
L
bits |
Tag
|
Index |
Byte
Select |
We use the address in this way.
Certain bits in our 32 bit memory
address have special significance.
We group the bits into three distinct
groups we call fields. In this discussion,
M and L are defined just as they
were a paragraph ago.
The rightmost L bits of the address
is the byte select field. If data
is found in a cache block, since
addresses are byte addressed,
we use this field to determine
which of the bytes in a line we're
trying to access.
The M-L bits just to the left
of the byte select field is called
the index field. As we said before,
we have 2M-L cache blocks in our
cache. These cache blocks are
ordered, just as in an array.
There is a first one (with index
0) and a last one (with index
M-L-1). If we're given an address,
the value represented by the bits
of the index field tell us which
of these cache blocks we should
examine.
The leftmost 32-M bits of the
address, just to the left of the
index field, is called the tag
field. This field is put into
the tag part of the cache block,
and that uniquely identifies from
where in main memory the data
in the line of this cache block
came from. If we're looking at
a cache block, we presumably know
what index of this cache is, and
therefore in conjunction with
the cache tag we can compose the
addresses from which the data
in the line came from. When we
store a line of data into a cache
block, we store the value represented
by the tag field into the tag
part of the cache block.
As a side note, it is not written
in stone that the index field
must be composed of the bits immediately
to the left of the byte select
field. However, the way that memory
accesses tend to work, it happens
that if the index field is in
the bits immediately to the left
of the byte select field, when
we look for data, it is in cache
rather than in main memory a larger
proportion of the time. While
there might be some special cases
in which having the index field
elsewhere would yield a higher
hit ratio, in general it is best
to have the index field immediately
to the left of the byte select
field.
When we're checking cache to
determine if the memory at the
address we're looking for is actually
in cache, we take the address,
and extract the index field. We
find the corresponding cache block.
We then check to see if this cache
block is valid. If it's not valid,
then nothing is stored in this
cache block, so we obviously must
reference main memory. However,
if the valid bit is set, then
we compare the tag in the cache
block to the tag field of our
address. If the tags are the same,
then we know that data we're looking
for and the data in the cache
block are the same. Therefore,
we may access this data from cache.
At this point we take the value
in the byte select field of the
address. Suppose this value is
N. We are then looking for byte
N in this line (hence the name
byte select).
Example
However, these things are always
more clear when we talk about
a specific example. Let's talk
about my first computer, a Macintosh
IIcx. Suppose we have a direct
mapped cache whose cache lines
are 32 bytes (25 bytes) in length,
and suppose that the cache size
is 512 bytes (29 bytes). Since
there are 512 bytes in the cache,
and 32 bytes per cache block,
that means that there are 512/32
= 16 cache blocks.
In reality, a 512 byte cache is
a very small cache; you'd never
find a cache that small in today's
computers. However, it makes the
example that follows managable.
Now, suppose we have addresses
that are thirty-two bits in length.
In this example, taking M and
L to have the same definitions
as they did at the beginning of
our discussion on direct mapped
cache, M = 9 and L = 5. So, our
byte select field is the rightmost
5 bits. Our index field is the
next M-L = 4 bits. Our tag field
is the leftmost 32-M = 23 bits.
This actually makes sense. Since
we have 16 cache blocks, we can
uniquely represent the 16 possibilities
of cache blocks we may address
by 4 bits, which is the size of
our index field. Similarly, we
have 32 bytes that we can access
in the data potion of our cache
block since line size is 32 bytes,
and we may uniquely address these
32 bytes with 5 bits, and indeed
that is the size of our byte select
field.
To illustrate the working of
this, I'll go step by step through
a series of memory accesses. In
my table represenation of cache
I omitted the cache line to save
space, because it really doesn't
matter what the line holds. The
data in cache is the data in cache,
and that's all there is to it.
We're interested in what memory
is accessed where, because that
is what affects the workings of
the cache. That changes the state
of the valid bits and the tag
numbers of our cache blocks.
For this example, I provide a
series of tables, which represents
the cache blocks, and their corresponding
valid bits and tags. The first
number in a row in the table is
the index of this particular cache
block. The next number is the
valid bit. The rightmost number
is the tag, which tells us where
the data in the line (which I
do not show) is from.
Initially
The table for the initial conditions
shows that the valid bits are
all set to zero, because nothing
has yet been stored in cache.
Step 1
For step one, suppose we access
the memory at address 0x0023AF7C.
The looking at that in binary,
that is 000000000010001110101111011111002.
If we separate these bits into
the lengths of the fields we have
determined with dashes between
each field, that is more easily
read as 00000000001000111010111-1011-11100.
So, our index is 10112=1110. We
look at index 11. Is there anything
in there yet? We see there is
not. Therefore, we load the data
contained at memory addresses
0x0023AF60 through 0x0023AF7F
into the 32 byte line of the cache
block with index 11.
Note that the memory addresses
between those two addresses are
the memory addresses that would
have the same tag and index fields
if we broke the addresses into
their respective fields.
Now that the data is loaded into
the line, we set the tag for this
cache block to be the same as
the tag of our address, since
this cache block will now hold
a line of data from the addresses
with the same index and tag. This
tag is 10001110101112=456710.
We also must set our valid bit
to true since this cache block
now holds a block of data. These
changes are reflected in the table
for step 1.
After these changes and the memory
loads, we know that the data we
want is in cache block 11. Our
byte select field is 111002 =
2810, so we get the 29th byte
of our line.
Since 0 would address the first
byte, 28 addresses the 29th byte.
It's exactly as though the cache
line is an array of chars (a byte
length quantity), and the byte
select is the index of the particular
element of that array that we're
looking at right now.
Now, this is a cache miss, but
there are different sorts of cache
misses. Notice that no data has
been loaded into cache yet. That
is because no memory accesses
have taken place. Because we're
just starting out, this is called
a "compulsory miss."
We cannot help but not have data
in this cache block yet, because
we just started.
Step 2
After step 1, suppose
we have another memory access,
this time at memory address 0x0023AE4F.
Ignoring the intervening binary
operations, this works out to
have a byte select field of 15,
an index field of 2, and a tag
field of 4567.
If we look at cache block 2,
we see that it is not valid. Similar
to last time, we load the 32 bytes
of memory from addresses 0x0023AE40
to 0x0023AE5F into the line of
this cache block, change the tag
to 4567, and set the valid bit
to true.
The tag field is the same as
in the previous operation; this
was intentional on my part. As
we see, the fact that the tag
is the same here as it was last
time is of no significance, since
the two are part of completely
different indexed cache blocks;
they are unrelated insofar as
cache is concerned.
Similar to last time, the data
we want is in the cache line.
Our byte select field is 15, so
we access the 16th byte (since
0 would address the first byte)
of the line in cache block 2.
Even though we've already had
a memory access in step two, this
miss is also called a compulsory
miss because the cache block that
we're loading into hasn't yet
had any data loaded into it.
Step 3
Suppose now we have a memory access
at address 0x148FE054 (in binary,
00010100100011111110000 0010 10100).
This works out to have a byte
select field of 101002=2010, index
field of 00102=210, and a tag
field of 000101001000111111100002=67377610.
We look at cache block 2. We
see that it is valid. We then
compare the tags. 673776 is not
equal to 4567, which is the tag
of cache block 2. This means that,
even though there is data in this
cache block, the data we want
is not in here. We must load the
data, and we do so just as if
this block had not been valid.
We load the data at addresses
0x147FE040 to 0x147FE05F into
the line at cache block 2. We
set the tag to 673776. We don't
need to set the valid bit to one
since it already is one. Now this
cache block holds a new portion
of data.
As you see, we had two pieces
of data that wanted to be in the
same cache block: there was the
data we loaded in the previous
step, and the data we just loaded
in this step. Now, this is not
a conflict miss. What would be
a conflict miss is if we tried
to load the data that used to
be in the cache (the data that
we just replaced) back into cache.
When this sort of miss occurs,
that is called a "conflict
miss," or rather a "collision."
We also see one of the primary
weaknesses of the direct mapped
cache. At the end of step two
we had two blocks used up, which
means that we're using 64 bytes
of our 512 byte cache. However,
in this step, step three, because
of a conflict miss we overwrote
one of those blocks even though
we had 448 bytes unused in our
cache. Though this example is
completely contrived, this sort
of cache inefficiency is a common
problem with direct mapped cache.
Once we load the data into this
cache and set the tag field appropriately,
we can find the byte that we want
using the byte select field, at
index 2 (the 3rd byte).
Step 4
Now suppose we try to access memory
address 0x0023AF64. This works
out to have byte select field
of 001002=410, index field of
10112=1110, and tag field of 10001110101112=456710.
We look at cache block 11, and
it is valid, so we compare the
tags. The tags are the same, it
turns out, so we finally have
a cache hit! The data that we
want is in this cache block. Since
our byte select field is 4, we
access the fifth byte of this
block's cache line. We don't modify
cache at all; tags and valid bits
remain the same, and we needn't
pull any data from main main memory
since the memory we want has already
been loaded into cache.
This block was loaded with data
in step one. The memory access
for step one was 0x0023AF7C. Now,
this step's memory access, 0x0023AF64,
is obviously a different address
than that, but 0x0023AF64 and
0x0023AF7C are pretty close. Since
in the memory access for 0x0023AF7C
we loaded that cache block's line
with memory from 0x0023AF60 to
0x0023AF7F (32 bytes in all),
the data for 0x0023AF64 could
be found there. So, two memory
accesses may access the same data
in cache even though they are
not exactly the same.
This set of tables trace the 16
cache blocks of the cache example
through each of the memory accesses
described above. Each "step"
table reflects the state of cache
at the end of each step.
Wow. That just kept going, didn't
it?
Practice Problems
These are just a couple of questions
that demonstrate the most basic
concepts of direct mapped cache.
The answers are provided after
all the questions. These aren't
terribly difficult, but they should
get you thinking about cache.
All of these questions pertain
to direct mapped cache, in a computer
with 32-bit addressing.
Questions
1. If we have a cache size of
4KB with a 16 byte line size,
what will be the sizes of the
three fields (tag, line, and byte
select) which we divide our address
into?
2. Suppose our fields are arranged
as described above (first tag
field, then index field, then
byte select field) in a cache
of 16KB with 32 byte line sizes.
What would be the values of each
of the three fields for the following
addresses?
a.
0x00B248AC
b.
0x5002AEF3
0x10203000
0x0023AF7C
Answers
Our cache size is 4KB = 212 bytes,
with a line size of 16 bytes =
24 bytes. Therefore, our byte
select field is 4 bits, our index
field is 12-4 = 8 bits, which
leaves 20 bits for our tag field.
This is actually two problems
in one. We must first determine
the sizes of each of the fields.
Our cache size is 16KB = 214 bytes,
with a line size of 32 bytes =
25 bytes. So, our byte select
field is 5 bits, our index field
is 14-5 = 9 bits, which leaves
18 bits for our tag field.
For 0x00B248AC, tag is 0x2C9,
index is 0x45, and byte select
is 0xC.
For 0x5002AEF3, tag is 0x1400A,
index is 0x177, and byte select
is 0x13.
For 0x10203000, tag is 0x4080,
index is 0x180, and byte select
is 0x0.
For 0x0023AF7C, tag is 0x8E, index
is 0x17B, and byte select is 0x1C.

|