We use the address in this way. Certain bits in our 32 bit memory address have special significance. We group the bits into three distinct groups we call fields. In this discussion, M and L are defined just as they were a paragraph ago. The rightmost L bits of the address is the byte select field. If data is found in a cache block, since addresses are byte addressed, we use this field to determine which of the bytes in a line we're trying to access.
The M-L bits just to the left of the byte select field is called the index field. As we said before, we have 2M-L cache blocks in our cache. These cache blocks are ordered, just as in an array. There is a first one (with index 0) and a last one (with index M-L-1). If we're given an address, the value represented by the bits of the index field tell us which of these cache blocks we should examine.
The leftmost 32-M bits of the address, just to the left of the index field, is called the tag field. This field is put into the tag part of the cache block, and that uniquely identifies from where in main memory the data in the line of this cache block came from. If we're looking at a cache block, we presumably know what index of this cache is, and therefore in conjunction with the cache tag we can compose the addresses from which the data in the line came from. When we store a line of data into a cache block, we store the value represented by the tag field into the tag part of the cache block.
As a side note, it is not written in stone that the index field must be composed of the bits immediately to the left of the byte select field. However, the way that memory accesses tend to work, it happens that if the index field is in the bits immediately to the left of the byte select field, when we look for data, it is in cache rather than in main memory a larger proportion of the time. While there might be some special cases in which having the index field elsewhere would yield a higher hit ratio, in general it is best to have the index field immediately to the left of the byte select field.
When we're checking cache to determine if the memory at the address we're looking for is actually in cache, we take the address, and extract the index field. We find the corresponding cache block. We then check to see if this cache block is valid. If it's not valid, then nothing is stored in this cache block, so we obviously must reference main memory. However, if the valid bit is set, then we compare the tag in the cache block to the tag field of our address. If the tags are the same, then we know that data we're looking for and the data in the cache block are the same. Therefore, we may access this data from cache. At this point we take the value in the byte select field of the address. Suppose this value is N. We are then looking for byte N in this line (hence the name byte select).
However, these things are always more clear when we talk about a specific example. Let's talk about my first computer, a Macintosh IIcx. Suppose we have a direct mapped cache whose cache lines are 32 bytes (25 bytes) in length, and suppose that the cache size is 512 bytes (29 bytes). Since there are 512 bytes in the cache, and 32 bytes per cache block, that means that there are 512/32 = 16 cache blocks.
In reality, a 512 byte cache is a very small cache; you'd never find a cache that small in today's computers. However, it makes the example that follows managable.
Now, suppose we have addresses that are thirty-two bits in length. In this example, taking M and L to have the same definitions as they did at the beginning of our discussion on direct mapped cache, M = 9 and L = 5. So, our byte select field is the rightmost 5 bits. Our index field is the next M-L = 4 bits. Our tag field is the leftmost 32-M = 23 bits.
This actually makes sense. Since we have 16 cache blocks, we can uniquely represent the 16 possibilities of cache blocks we may address by 4 bits, which is the size of our index field. Similarly, we have 32 bytes that we can access in the data potion of our cache block since line size is 32 bytes, and we may uniquely address these 32 bytes with 5 bits, and indeed that is the size of our byte select field.
To illustrate the working of this, I'll go step by step through a series of memory accesses. In my table represenation of cache I omitted the cache line to save space, because it really doesn't matter what the line holds. The data in cache is the data in cache, and that's all there is to it. We're interested in what memory is accessed where, because that is what affects the workings of the cache. That changes the state of the valid bits and the tag numbers of our cache blocks.
For this example, I provide a series of tables, which represents the cache blocks, and their corresponding valid bits and tags. The first number in a row in the table is the index of this particular cache block. The next number is the valid bit. The rightmost number is the tag, which tells us where the data in the line (which I do not show) is from.
The table for the initial conditions shows that the valid bits are all set to zero, because nothing has yet been stored in cache.
For step one, suppose we access the memory at address 0x0023AF7C. The looking at that in binary, that is 000000000010001110101111011111002. If we separate these bits into the lengths of the fields we have determined with dashes between each field, that is more easily read as 00000000001000111010111-1011-11100. So, our index is 10112=1110. We look at index 11. Is there anything in there yet? We see there is not. Therefore, we load the data contained at memory addresses 0x0023AF60 through 0x0023AF7F into the 32 byte line of the cache block with index 11.
Note that the memory addresses between those two addresses are the memory addresses that would have the same tag and index fields if we broke the addresses into their respective fields.
Now that the data is loaded into the line, we set the tag for this cache block to be the same as the tag of our address, since this cache block will now hold a line of data from the addresses with the same index and tag. This tag is 10001110101112=456710. We also must set our valid bit to true since this cache block now holds a block of data. These changes are reflected in the table for step 1.
After these changes and the memory loads, we know that the data we want is in cache block 11. Our byte select field is 111002 = 2810, so we get the 29th byte of our line.
Since 0 would address the first byte, 28 addresses the 29th byte. It's exactly as though the cache line is an array of chars (a byte length quantity), and the byte select is the index of the particular element of that array that we're looking at right now.
Now, this is a cache miss, but there are different sorts of cache misses. Notice that no data has been loaded into cache yet. That is because no memory accesses have taken place. Because we're just starting out, this is called a "compulsory miss." We cannot help but not have data in this cache block yet, because we just started.
After step 1, suppose we have another memory access, this time at memory address 0x0023AE4F. Ignoring the intervening binary operations, this works out to have a byte select field of 15, an index field of 2, and a tag field of 4567.
If we look at cache block 2, we see that it is not valid. Similar to last time, we load the 32 bytes of memory from addresses 0x0023AE40 to 0x0023AE5F into the line of this cache block, change the tag to 4567, and set the valid bit to true.
The tag field is the same as in the previous operation; this was intentional on my part. As we see, the fact that the tag is the same here as it was last time is of no significance, since the two are part of completely different indexed cache blocks; they are unrelated insofar as cache is concerned.
Similar to last time, the data we want is in the cache line. Our byte select field is 15, so we access the 16th byte (since 0 would address the first byte) of the line in cache block 2.
Even though we've already had a memory access in step two, this miss is also called a compulsory miss because the cache block that we're loading into hasn't yet had any data loaded into it.
Suppose now we have a memory access at address 0x148FE054 (in binary, 00010100100011111110000 0010 10100). This works out to have a byte select field of 101002=2010, index field of 00102=210, and a tag field of 000101001000111111100002=67377610.
We look at cache block 2. We see that it is valid. We then compare the tags. 673776 is not equal to 4567, which is the tag of cache block 2. This means that, even though there is data in this cache block, the data we want is not in here. We must load the data, and we do so just as if this block had not been valid.
We load the data at addresses 0x147FE040 to 0x147FE05F into the line at cache block 2. We set the tag to 673776. We don't need to set the valid bit to one since it already is one. Now this cache block holds a new portion of data.
As you see, we had two pieces of data that wanted to be in the same cache block: there was the data we loaded in the previous step, and the data we just loaded in this step. Now, this is not a conflict miss. What would be a conflict miss is if we tried to load the data that used to be in the cache (the data that we just replaced) back into cache. When this sort of miss occurs, that is called a "conflict miss," or rather a "collision."
We also see one of the primary weaknesses of the direct mapped cache. At the end of step two we had two blocks used up, which means that we're using 64 bytes of our 512 byte cache. However, in this step, step three, because of a conflict miss we overwrote one of those blocks even though we had 448 bytes unused in our cache. Though this example is completely contrived, this sort of cache inefficiency is a common problem with direct mapped cache.
Once we load the data into this cache and set the tag field appropriately, we can find the byte that we want using the byte select field, at index 2 (the 3rd byte).
Now suppose we try to access memory address 0x0023AF64. This works out to have byte select field of 001002=410, index field of 10112=1110, and tag field of 10001110101112=456710.
We look at cache block 11, and it is valid, so we compare the tags. The tags are the same, it turns out, so we finally have a cache hit! The data that we want is in this cache block. Since our byte select field is 4, we access the fifth byte of this block's cache line. We don't modify cache at all; tags and valid bits remain the same, and we needn't pull any data from main main memory since the memory we want has already been loaded into cache.
This block was loaded with data in step one. The memory access for step one was 0x0023AF7C. Now, this step's memory access, 0x0023AF64, is obviously a different address than that, but 0x0023AF64 and 0x0023AF7C are pretty close. Since in the memory access for 0x0023AF7C we loaded that cache block's line with memory from 0x0023AF60 to 0x0023AF7F (32 bytes in all), the data for 0x0023AF64 could be found there. So, two memory accesses may access the same data in cache even though they are not exactly the same.
This set of tables trace the 16 cache blocks of the cache example through each of the memory accesses described above. Each "step" table reflects the state of cache at the end of each step.
Wow. That just kept going, didn't it?
These are just a couple of questions that demonstrate the most basic concepts of direct mapped cache. The answers are provided after all the questions. These aren't terribly difficult, but they should get you thinking about cache. All of these questions pertain to direct mapped cache, in a computer with 32-bit addressing.
- If we have a cache size of 4KB with a 16 byte line size, what will be the sizes of the three fields (tag, line, and byte select) which we divide our address into?
- Suppose our fields are arranged as described above (first tag field, then index field, then byte select field) in a cache of 16KB with 32 byte line sizes. What would be the values of each of the three fields for the following addresses?
Our cache size is 4KB = 212 bytes, with a line size of 16 bytes = 24 bytes. Therefore, our byte select field is 4 bits, our index field is 12-4 = 8 bits, which leaves 20 bits for our tag field.
This is actually two problems in one. We must first determine the sizes of each of the fields. Our cache size is 16KB = 214 bytes, with a line size of 32 bytes = 25 bytes. So, our byte select field is 5 bits, our index field is 14-5 = 9 bits, which leaves 18 bits for our tag field.
For 0x00B248AC, tag is 0x2C9, index is 0x45, and byte select is 0xC.
For 0x5002AEF3, tag is 0x1400A, index is 0x177, and byte select is 0x13.
For 0x10203000, tag is 0x4080, index is 0x180, and byte select is 0x0.
For 0x0023AF7C, tag is 0x8E, index is 0x17B, and byte select is 0x1C.