51

I am taking a System Architecture course and I have trouble understanding how a direct mapped cache works.

I have looked in several places and they explain it in a different manner which gets me even more confused.

What I cannot understand is what is the Tag and Index, and how are they selected?

The explanation from my lecture is: "Address divided is into two parts index (e.g 15 bits) used to address (32k) RAMs directly Rest of address, tag is stored and compared with incoming tag. "

Where does that tag come from? It cannot be the full address of the memory location in RAM since it renders direct mapped cache useless (when compared with the fully associative cache).

Thank you very much.

Kimi
  • 11,963
  • 7
  • 49
  • 79
Percentage
  • 680
  • 1
  • 6
  • 9
  • 1
    u should read this topic from "Computer Architecture" by Morris Mano book. This is explained there in good way. – Jatin Khurana Apr 11 '13 at 16:34
  • 1
    plz.. go through the below link, i think it will help you to understand clearly the concept of direct mapping in cache and the parameters like tag, index http://csciwww.etsu.edu/tarnoff/labs4717/x86_sim/direct.html – user2891771 Oct 17 '13 at 18:32
  • @user2891771 in the link you referenced, in the diagram, why are the last two blocks not consecutive(both 2^n -1)? That doesn't follow pattern where the block identifiers were consecutive - 1,2,3,.... – committedandroider May 31 '15 at 08:02
  • @user2891771 and in the same link, what does the memory in " tag uniquely identifying where that block came from in memory." mean(what kind of memory)? – committedandroider May 31 '15 at 08:04
  • See also [tag and index terminology for associative caches](https://web.archive.org/web/20131031045450/http://semipublic.comp-arch.net/wiki/Cache_Ways_and_Sets). Link from [Paul Clayton's archive of Andy Glew's CompArch wiki](https://sites.google.com/site/paulclaytonplace/andy-glew-s-comparch-wiki). – Peter Cordes Aug 30 '16 at 01:35

4 Answers4

96

Okay. So let's first understand how the CPU interacts with the cache.

There are three layers of memory (broadly speaking) - cache (generally made of SRAM chips), main memory (generally made of DRAM chips), and storage (generally magnetic, like hard disks). Whenever CPU needs any data from some particular location, it first searches the cache to see if it is there. Cache memory lies closest to the CPU in terms of memory hierarchy, hence its access time is the least (and cost is the highest), so if the data CPU is looking for can be found there, it constitutes a 'hit', and data is obtained from there for use by CPU. If it is not there, then the data has to be moved from the main memory to the cache before it can be accessed by the CPU (CPU generally interacts only with the cache), that incurs a time penalty.

So to find out whether the data is there or not in the cache, various algorithms are applied. One is this direct mapped cache method. For simplicity, let's assume a memory system where there are 10 cache memory locations available (numbered 0 to 9), and 40 main memory locations available (numbered 0 to 39). This picture sums it up:

enter image description here

There are 40 main memory locations available, but only upto 10 can be accommodated in the cache. So now, by some means, the incoming request from CPU needs to be redirected to a cache location. That has two problems:

  1. How to redirect? Specifically, how to do it in a predictable way which will not change over time?

  2. If the cache location is already filled up with some data, the incoming request from CPU has to identify whether the address from which it requires the data is same as the address whose data is stored in that location.

In our simple example, we can redirect by a simple logic. Given that we have to map 40 main memory locations numbered serially from 0 to 39 to 10 cache locations numbered 0 to 9, the cache location for a memory location n can be n%10. So 21 corresponds to 1, 37 corresponds to 7, etc. That becomes the index.

But 37, 17, 7 all correspond to 7. So to differentiate between them, comes the tag. So just like index is n%10, tag is int(n/10). So now 37, 17, 7 will have the same index 7, but different tags like 3, 1, 0, etc. That is, the mapping can be completely specified by the two data - tag and index.

So now if a request comes for address location 29, that will translate to a tag of 2 and index of 9. Index corresponds to cache location number, so cache location no. 9 will be queried to see if it contains any data, and if so, if the associated tag is 2. If yes, it's a CPU hit and the data will be fetched from that location immediately. If it is empty, or the tag is not 2, it means that it contains the data corresponding to some other memory address and not 29 (although it will have the same index, which means it contains a data from address like 9, 19, 39, etc.). So it is a CPU miss, and data from location no. 29 in main memory will have to be loaded into the cache at location 29 (and the tag changed to 2, and deleting any data which was there before), after which it will be fetched by CPU.

SexyBeast
  • 8,635
  • 25
  • 92
  • 179
12

Lets use an example. A 64 kilobyte cache, with 16 byte cache-lines has 4096 different cache lines.

You need to break the address down into three different parts.

  1. The lowest bits are used to tell you the byte within a cache line when you get it back, this part isn't directly used in the cache lookup. (bits 0-3 in this example)
  2. The next bits are used to INDEX the cache. If you think of the cache as a big column of cache lines, the index bits tell you which row you need to look in for your data. (bits 4-15 in this example)
  3. All the other bits are TAG bits. These bits are stored in the tag store for the data you have stored in the cache, and we compare the corresponding bits of the cache request to what we have stored to figure out if the data we are cacheing are the data that are being requested.

The number of bits you use for the index is log_base_2(number_of_cache_lines) [it's really the number of sets, but in a direct mapped cache, there are the same number of lines and sets]

Daniel
  • 73
  • 8
Danny
  • 395
  • 2
  • 9
  • I think I got it, but I have another question now. So far I imagined this as a single table where somehow the address and the data should be stored. It seems to me that this cache would better be represented in 3 tables: one is the one with the cache lines, with the TAG, INDEX and selection bits, one is the tag store and one is the data store. When the CPU is trying to access a location, it checks the TAG to see if that is the address it is looking for, it checks if it is still valid, then it loads the data from the data store using the INDEX. – Percentage Apr 12 '13 at 09:25
  • 1
    @Percentage I don't think you got it. There are only two tables. One for the tags and another for the data. Both use the same index, i.e. you can think of them being just one table. That is all you need. Think about it. – Mackie Messer Apr 12 '13 at 21:51
  • @MackieMesser Just to make sure I understand. Each Cache line has different TAG bits to identify where in RAM the bytes within the cache line are coming from? – committedandroider May 31 '15 at 08:18
  • @committedandroider It's not just the TAG bits but the combination of TAG and INDEX bits which tells you where in main memory the cache line belongs. The clever thing is that you don't need to actually store the INDEX bits because they are always the same for a specific cache line. – Mackie Messer May 31 '15 at 21:59
  • @MackieMesser Like Danny said above and from what I've learned, aren't the index lines just to identify what cache line the data is in when you have a cache address? It shouldn't have anything to do with main memory. – committedandroider Jun 01 '15 at 05:43
  • I think there is a mistake in the example. Bits for index should be 4-15, rather than 4-13, because log2(4096) = 12 bits – SomeWittyUsername Apr 25 '16 at 19:36
2

A direct mapped cache is like a table that has rows also called cache line and at least 2 columns one for the data and the other one for the tags.

Here is how it works: A read access to the cache takes the middle part of the address that is called index and use it as the row number. The data and the tag are looked up at the same time. Next, the tag needs to be compared with the upper part of the address to decide if the line is from the same address range in memory and is valid. At the same time, the lower part of the address can be used to select the requested data from cache line (I assume a cache line can hold data for several words).

I emphasized a little on data access and tag access+compare happens at the same time, because that is key to reduce the latency (purpose of a cache). The data path ram access doesn't need to be two steps.

The advantage is that a read is basically a simple table lookup and a compare.

But it is direct mapped that means for every read address there is exactly one place in the cache where this data could be cached. So the disadvantage is that a lot of other addresses would be mapped to the same place and may compete for this cache line.

deepsubmicron
  • 341
  • 1
  • 4
  • 1
    Talking about parallelism: One of the noteworthy properties of a direct mapped cache is that the tag path and the data path are independent. In the tag path, reading and comparing the tag to the address are two sequential operations which produce the hit/miss signal. In the data path, there is just one operation. The middle and the lower part of the address form a single address for the data RAM, to produce a single output word. How the RAM is organized internally doesn't really matter. – Mackie Messer Apr 13 '13 at 20:39
  • Yes, thanks for pointing this out. The two steps I described in the data path is just an implementation detail of the ram. – deepsubmicron Apr 14 '13 at 15:25
  • @deepsubmicron Can you go into more detail on the lower and upper parts of an address? How would you identify what the lower part is and what the upper part is? – committedandroider May 31 '15 at 07:53
  • How wide is the ram? The amount of words in a cache line that is the lower part of the address. The middle part is determined by the amount of rows in the ram. So the remaining bits is the upper part. – deepsubmicron Jun 23 '15 at 10:15
1

I have found a good book at the library that has offered me the clear explanation I needed and I will now share it here in case some other student stumbles across this thread while searching about caches.

The book is "Computer Architecture - A Quantitative Approach" 3rd edition by Hennesy and Patterson, page 390.

First, keep in mind that the main memory is divided into blocks for the cache. If we have a 64 Bytes cache and 1 GB of RAM, the RAM would be divided into 128 KB blocks (1 GB of RAM / 64B of Cache = 128 KB Block size).

From the book:

Where can a block be placed in a cache?

  • If each block has only one place it can appear in the cache, the cache is said to be direct mapped. The destination block is calculated using this formula: <RAM Block Address> MOD <Number of Blocks in the Cache>

So, let's assume we have 32 blocks of RAM and 8 blocks of cache.

If we want to store block 12 from RAM to the cache, RAM block 12 would be stored into Cache block 4. Why? Because 12 / 8 = 1 remainder 4. The remainder is the destination block.

  • If a block can be placed anywhere in the cache, the cache is said to be fully associative.

  • If a block can be placed anywhere in a restricted set of places in the cache, the cache is set associative.

Basically, a set is a group of blocks in the cache. A block is first mapped onto a set and then the block can be placed anywhere inside the set.

The formula is: <RAM Block Address> MOD <Number of Sets in the Cache>

So, let's assume we have 32 blocks of RAM and a cache divided into 4 sets (each set having two blocks, meaning 8 blocks in total). This way set 0 would have blocks 0 and 1, set 1 would have blocks 2 and 3, and so on...

If we want to store RAM block 12 into the cache, the RAM block would be stored in the Cache blocks 0 or 1. Why? Because 12 / 4 = 3 remainder 0. Therefore set 0 is selected and the block can be placed anywhere inside set 0 (meaning block 0 and 1).

Now I'll go back to my original problem with the addresses.

How is a block found if it is in the cache?

Each block frame in the cache has an address. Just to make it clear, a block has both address and data.

The block address is divided into multiple pieces: Tag, Index and Offset.

The tag is used to find the block inside the cache, the index only shows the set in which the block is situated (making it quite redundant) and the offset is used to select the data.

By "select the data" I mean that in a cache block there will obviously be more than one memory locations, the offset is used to select between them.

So, if you want to imagine a table, these would be the columns:

TAG | INDEX | OFFSET | DATA 1 | DATA 2 | ... | DATA N

Tag would be used to find the block, index would show in which set the block is, offset would select one of the fields to its right.

I hope that my understanding of this is correct, if it is not please let me know.

Percentage
  • 680
  • 1
  • 6
  • 9
  • 5
    This is wrong. The table contains only tag and data but not index and offset. – Mackie Messer Apr 24 '13 at 15:18
  • I got a good answer from this link: http://csciwww.etsu.edu/tarnoff/labs4717/x86_sim/direct.html – Percentage Dec 07 '13 at 16:08
  • 1
    The index and offset correspond to position within the table. They're not stored explicitly. I'm sure Hennesy and Patterson explain it correctly, because that textbook is excellent and well known, but you've mangled the last bit of it in this answer. – Peter Cordes Nov 21 '16 at 04:59
  • Also, the index is not redundant as you are saying, it is essential. You don't use just the tag to find the block in the cache. You use both the tag and the index. – golddove Dec 02 '18 at 22:22