Cache Size
Mapping Function
- The cache can hold 64 Kbytes.
- Data is transferred between main memory and cache in blocks of 4 bytes each. This means that the cache is organized as 16K = 214 lines of 4 bytes each.
- The main memory consists of 16 Mbytes, with each byte directly addressable by a 24-bit address (224 = 16M). Thus, for mapping purposes, can consider main memory to consist of 4Mblocks of 4 bytes each.
Because there are fewer cache lines than main memory blocks, an algorithm is needed for mapping main memory blocks into cache lines.
Three techniques can be used: direct, associative, and set associative
Direct mapping, maps each block of main memory into only one possible cache line. Figure 4.7 illustrates the general mechanism. The mapping is expressed as
Where i = cache line number
j = main memory block number
m = number of lines in the cache
i = j modulo m
Where i = cache line number
j = main memory block number
m = number of lines in the cache
Figure 3.1 Direct-Mapping Cache organization[HWAN93]
The effect of this mapping is that blocks of main memory are assigned to lines of the cache as follows
Figure 3.2 shows our example system using direct mapping. In the example, m = 16K = 224 and I = j modulo 224. The mapping becomes as follows
Figure 3.2 Direct Mapping Example
Associative mapping overcomes the disadvantage of direct mapping by permitting each main memory block to be loaded into any line of the cache. Figure 3.3 shows our example using associative mapping.
Figure 3.3 Associative Mapping Example
With associative mapping, there is flexibility as to which block to replace when a new block is read into the cache. Replacement algorithms, discussed later in this section, are designed to maximize the hit ratio.
Where i = cache set number
j = main memory block number
m = number of lines in the cache
Figure 3.5 shows our example using set associative mapping with two lines in each set, referred to as two-way set associative
Set associative mapping is a compromise that exhibits the strengths of both the direct and associative approaches while reducing their disadvantages. The relationships are
m = v x k
i = j modulo v
Where i = cache set number
j = main memory block number
m = number of lines in the cache
Figure 3.5 shows our example using set associative mapping with two lines in each set, referred to as two-way set associative
Replacement Algorithms
Figure 3.4 k-Way Set Associative Cache Organization
Figure 3.5 Two-Way Set Associative Mapping Example
For 2-way set associative, this is easily implemented. Each line includes a USE bit. When a line is referenced, its USE bit is set to 1 and the USE bit of the other line in that set in set to 0. When a block is to be read into the set, the line whose USE bit is 0 is used. LRU should give the best hit ratio. First-in-first-out (FIFO): Replace that block in the set that has been in the cache longest. FIFO is easily implemented. Least frequently used(LFU): Replace that block in the set that has experienced the fewest references. A technique not based on usage is to pick a line at random from among the candidate lines.
Write Policy
The simplest technique is called write through. Using this technique, all write operation are made to main memory as well as to the cache, ensuring that main memory is always valid. Any other processor-cache module can monitor traffic to main memory to maintain consistency within its own cache.
In a bus organization in which more than one device has a cache and main memory is shared, a new problem is introduced. Even if a write-through policy is used, the other caches may contain invalid data. A system that prevents this problem is said to maintain cache coherency. Possible approaches to cache coherency include the following.
- Bus watching with write through: Each cache controller monitors the address lines to detect write operation to memory by other bus masters.
- Hardware transparency: Additional hardware is used to ensure that all updates to main memory via cache are reflected in all caches.
- Noncacheable memory: Only a portion of main memory is shared by more than one processor, and this is designated as noncacheable.
Line Size
- Larger blocks reduce the number of blocks that fit into a cache. Because each block fetch overwrites older cache contents, a small number of blocks results in data being overwritten shortly after they are fetched.
- As a block become larger, each additional world is farther from the requested word, and therefore less likely to be needed in the near future.
Number of Caches
Multilevel Caches
As logic density has increased, it has become possible to have a cache on the same chip as the processor: the on-chip cache. Because of the short data paths internal to the processor, compared with bus lengths, on-chip cache accesses will complete appreciably faster than would even zero-wait state bus cycle.
The inclusion of an on-chip cache leaves open the question of whether an off-chip, or external, cache is still desirable. The simplest such organization is known as a two-level cache, with the internal cache designated as level 1 (L1) and the external cache designated as level 2 (L2). Because if there is no L2 cache and the processor makes an access request for a memory location not in the L1 cache, then the processor must access DRAM or ROM memory across the bus. Due to the typically slow bus speed and slow memory access time, this results in poor performance. if an L2 SRAM (static RAM) cache is used, then frequently the missing information can be quickly retrieved. If the SRAM is fast enough to match the bus speed, then the data can be accessed using a zero-wait state transaction, the fastest type of bus transfer.
Two features of contemporary cache design for multilevel caches are noteworthy. First, for an off-chip L2 cache, use a separate data path, so as to reduce the burden on the system bus. Second, with the continued shrinkage of processor components.
The potential savings due to the use of an L2 cache depends on the hit rates in both the L1 and L2 caches. Several studies have shown that, in general, the use of a second-level cache does improve performance (e.g., see [AZIM92], [NOVI93], [HAND98]).
Unified Versus Split Caches
There are two potential advantages of a unified cache:
- For a given cache size, a unified cache has a higher hit rate than split caches because it balances the load between instruction and data fetches automatically.
- Only one cache needs to be designed and implemented.
Reference: William Stallings. (2003). Computer Organization & Architecture DESIGN FOR PERFORMANCE(6th ed.): Elements of Cache Design(pp 108-119). Upper Saddle River, NJ: Pearson
No comments:
Post a Comment