Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
CS 220 / ARM / ARM1176JZ-S Technical Reference Mmanual.pdf
Источник:
Скачиваний:
142
Добавлен:
16.04.2015
Размер:
4.47 Mб
Скачать

Level One Memory System

7.2Cache organization

Each cache is implemented as a four-way set associative cache of configurable size. The caches are virtually indexed and physically tagged. You can configure the cache sizes in the range of 4 to 64KB. Both the Instruction Cache and the Data Cache can provide two words per cycle for all requesting sources.

Each cache way is architecturally limited to 16KB in size, because of the limitations of the virtually indexed, physically tagged implementation. The number of cache ways is fixed at four, but the cache way size can vary between 1KB and 16KB in powers of 2. The line length is not configurable and is fixed at eight words per line.

Write operations must occur after the Tag RAM reads and associated address comparisons are complete. A three-entry Write Buffer is included in the cache to enable the written words to be held until they can be written to cache. One or two words can be written in a single store operation. The addresses of these outstanding writes provide an additional input to the Tag RAM comparison for reads.

To avoid a critical path from the Tag RAM comparison to the enable signals for the data RAMs, there is a minimum of one cycle of latency between the determination of a hit to a particular way, and the start of writing to the data RAM of that way. This requires the Data Cache Write Buffer to hold three entries, for back-to-back writes. Accesses that read the dirty bits must also check the Data Cache Write Buffer for pending writes that result in dirty bits being set. The cache dirty bits for the Data Cache are updated when the Data Cache Write Buffer data is written to the RAM. This requires the dirty bits to be held as a separate storage array. Significantly, the Tag arrays cannot be written, because the arrays are not accessed during the data RAM writes, but permits the dirty bits to be implemented as a small RAM.

The other main operations performed by the cache are cache line refills and Write-Back. These occur to particular cache ways, that are determined at the point of the detection of the cache miss by the victim selection logic.

To reduce overall power consumption, the number of full cache reads is reduced by the sequential nature of many cache operations, especially on the instruction side. On a cache read that is sequential to the previous cache read, only the data RAM set that was previously read is accessed, if the read is within the same cache line. The Tag RAM is not accessed at all during this sequential operation.

To reduce unnecessary power consumption additionally, only the addressed words within a cache line are read at any time. With the required 64-bit read interface, this is achieved by disabling half of the RAMs on occasions when only a 32-bit value is required. The implementation uses two 32-bit wide RAMs to implement the cache data RAM shown in Figure 7-1 on page 7-4, with the words of each line folded into the RAMs on an odd and even basis. This means that cache refills can take several cycles, depending on the cache line lengths. The cache line length is eight words.

The control of the level one memory system and the associated functionality, together with other system wide control attributes are handled through the system control coprocessor, CP15. Chapter 3 System Control Coprocessor describes this.

Figure 7-1 on page 7-4 shows the block diagram of the cache subsystem. It does not show the cache refill paths.

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

7-3

ID012410

Non-Confidential, Unrestricted Access

 

 

Level One Memory System

CP15

Virtual

Write

data

interface

address

 

RAMSet base address and size

Write buffer data (1-2 words)

Write buffer addresses

 

Micro

 

 

TLB

 

 

TAGRAM

DATARAM

TCM

 

 

 

Way

 

 

select

 

Comparator

 

 

Micro TLB

Cache

Data

miss and

hit

out

Data Abort

 

 

Figure 7-1 Level one cache block diagram

7.2.1Features of the cache system

The level one cache system has the following features:

The cache is a Harvard implementation.

The caches are lockable at a granularity of a cache way, using Format C lockdown. See

Cache control and configuration on page 3-7.

Cache replacement policies are Pseudo-Random or Round-Robin, as controlled by the RR bit in CP15 register c1. Round-Robin uses a single counter for all sets, that selects the way used for replacement.

Cache line allocation uses the cache replacement algorithm when all cache lines are valid. If one or more lines is invalid, then the invalid cache line with the lowest way number is allocated to in preference to replacing a valid cache line. This mechanism does not allocate to locked cache ways unless all cache ways are locked. See Cache miss handling when all ways are locked down on page 7-6.

Cache lines can contain either Secure or Non-secure data and the NS Tag, that the MicroTLB provides, indicates when the cache line comes from Secure or Non-secure memory.

Cache lines can be either Write-Back or Write-Through, determined by the MicroTLB entry.

Only read allocation is supported.

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

7-4

ID012410

Non-Confidential, Unrestricted Access

 

Level One Memory System

The cache can be disabled independently from the TCM, under control of the appropriate bits in CP15 c1. The cache can be disabled in Secure state while enabled in Non-secure state and enabled in Secure state while disabled in Non-secure state.

The CL bit in the system control coprocessor, see c1, Non-Secure Access Control Register on page 3-55, reserves cache lockdown registers for Secure world operation. When the CL bit is 0 the cache lockdown registers are only available in the Secure world. When the CL bit is 1 they are available for both Secure and Non-secure operation.

Data cache misses are nonblocking with three outstanding Data Cache misses being supported.

Streaming of sequential data from LDM and LDRD operations, and for sequential instruction fetches is supported.

7.2.2Cache functional description

The cache and TCM exist to perform associative reads and writes on requested addresses. The steps involved in this for reads are as follows:

1.The lower bits of the virtual address are used as the virtual index for the Tag and RAM blocks, including the TCM.

2.In parallel the MicroTLB is accessed to perform the virtual to physical address translation.

3.The physical addresses read from the Tag RAMs and the TCM base address register, and the Write Buffer address registers, in parallel with the NS Tag, are compared with the physical address from the MicroTLB. The processor also compares the NS Tag, that the processor stores in the Tag RAMs along with the physical address, with the NS attribute from the MicroTLB. Both comparisons form hit signals for each of the cache ways.

4.The hit signals are used to select the data from the cache way that has a hit. Any bytes contained in both the data RAMs and the Write Buffer entries are taken from the Write Buffer. If two or three Write Buffer entries are to the same bytes, the most recently written bytes are taken.

The steps for writes are as follows:

1.The lower bits of the virtual address are used as the virtual index for the Tag blocks.

2.In parallel, the MicroTLB is accessed to perform the virtual to physical address translation.

3.The physical addresses read from the Tag RAMs and the TCM base address register are compared with the physical address from the MicroTLB. The processor also compares the NS Tag, that it stores in the Tag RAMs along with the physical address, with the NS attribute from the MicroTLB. Both comparisons form hit signals for each of the cache ways.

4.If a cache way, or the TCM, has recorded a hit, then the write data is written to an entry in the Cache Write Buffer, along with the cache way, or TCM, that it must take place to.

5.The contents of the Cache Write Buffer are held until a subsequent write or CP15 operation requires space in the Write Buffer. At this point the oldest entry in the Cache Write Buffer is written into the cache.

7.2.3Cache control operations

c7, Cache operations on page 3-69 describes the cache control operations that are supported by the processor. The processor supports all the block cache control operations in hardware.

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

7-5

ID012410

Non-Confidential, Unrestricted Access

 

Level One Memory System

Note

The cache operations executed in Secure state might affect all cache lines but cache operations executed in Non-secure state only affect Non-secure lines.

You can restrict the functional size of each cache to 16KB, even when the physical cache is larger. This enables the processor to run software that does not support ARMv6 page coloring restrictions. You enable the this feature with the CZ bit, see c1, Auxiliary Control Register on page 3-49.

For more information about ARMv6 page coloring see Restrictions on page table mappings page coloring on page 6-41.

7.2.4Cache miss handling

A cache miss results in the requests required to do the line fill being made to the level two interface, with a Write-Back occurring if the line to be replaced contains dirty data.

The Write-Back data is transferred to the Write Buffer. This is arranged to handle this data as a sequential burst. Because of the requirement for nonblocking caches, additional write transactions can occur during the transfer of Write-Back data from the cache to the Write Buffer. These transactions do not interfere with the burst nature of the Write-Back data. The Write Buffer is responsible for handling the potential Read After Write (RAW) data hazards that might exist from a Data Cache line Write-Back. The caches perform critical word-first cache refilling. The internal bandwidth from the level two data read port to the Data Caches is eight bytes per cycle, and supports streaming.

Cache miss handling when all ways are locked down

The ARM architecture describes the behavior of the cache as being Unpredictable when all ways in the cache are locked down. However, for ARM1176JZ-S processors a cache miss is serviced as if Way 0 is not locked.

7.2.5Cache disabled behavior

If the cache is disabled, then the cache is not accessed for reads or for writes. This ensures that maximum power savings can be achieved. It is therefore important that before the cache is disabled, all of the entries are cleaned to ensure that the external memory has been updated. In addition, if the cache is enabled with valid entries in it, then it is possible that the entries in the cache contain old data. Therefore, the cache must be disabled with clean and invalid entries.

Cache maintenance operations can be performed even if the cache is disabled. The system can disable the cache in Secure state when it is enabled in Non-secure state and enable the cache in Secure state when it is disabled in Non-secure state.

7.2.6Unexpected hit behavior

An unexpected hit is where the cache reports a hit on a memory location that is marked as Noncacheable or Shared. The unexpected hit behavior is that these hits are ignored and a level two access occurs. The unexpected hit is ignored because the cache hit signal is qualified by the cacheability.

For writes, an unexpected cache hit does not result in the cache being updated. Therefore, writes appear to be Noncacheable accesses. For a data access, if it lies in the range of memory specified by the Instruction TCM, then the access is made to that RAM rather than to level two memory. This applies to both writes and reads.

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

7-6

ID012410

Non-Confidential, Unrestricted Access

 

Level One Memory System

7.3Tightly-coupled memory

The TCM is designed to provide low-latency memory that can be used by the processor without the unpredictability that is a feature of caches.

You can use such memory to hold critical routines, such as interrupt handling routines or real-time tasks where the indeterminacy of a cache is highly undesirable. In addition you can use it to hold scratch pad data, data types whose locality properties are not well suited to caching, and critical data structures such as interrupt stacks.

You can separately configure the size of the Instruction TCM (ITCM) and the size of the Data TCM (DTCM) to be 0KB, 4KB. 8KB, 16KB, 32KB or 64KB. For each side, ITCM and DTCM:

If you configure the TCM size to be 4KB you get one TCM, of 4KB, on this side.

If you configure the TCM size to be larger than 4KB you get two TCMs on this side, each of half the configured size. So, for example, if you configure an ITCM size of 16KB you get two ITCMs, each of size 8KB.

Table 7-1 lists all possible TCM configurations:

Table 7-1 TCM configurations

Configured TCM size

Number of TCMs

Size of each TCM

 

 

 

0KB

0

0

 

 

 

4KB

1

4KB

 

 

 

8KB

2

4KB

 

 

 

16KB

2

8KB

 

 

 

32KB

2

16KB

 

 

 

64KB

2

32KB

 

 

 

When the number of TCM on one side is 2, to make the implementation easier, the TCM for this side are implemented as one single RAM. This RAM then has a size in the 0-64 KB range. The lower part of the RAM corresponds to the TCM called TCM0 and the upper part corresponds to TCM1.

You can also configure each individual TCM to contain Secure or Non-secure data. You make this configuration in CP15 register c9, accessible in Secure state only. See c9, Data TCM Non-secure Control Access Register on page 3-94 and c9, Instruction TCM Non-secure Control Access Register on page 3-95 for more information. After reset, all TCMs are configured as Secure.

The TCM Status Register in CP15 c0 describes what TCM options and TCM sizes can be implemented, see c0, TCM Status Register on page 3-24.

Each Data TCM is implemented in parallel with the Data Cache and each Instruction TCM is implemented in parallel with the Instruction Cache. Each TCM has a single movable base address, specified in CP15 register c9, see c9, Data TCM Region Register on page 3-90 and c9, Instruction TCM Region Register on page 3-92.

The size of each TCM can differ from the size of a cache way, but forms a single contiguous area of memory. Figure 7-1 on page 7-4 shows the entire level one memory system. To access each of the TCM region and TCM Access Control registers, the TCM Selection registers are set to the TCM of interest, see c9, TCM Selection Register on page 3-97.

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

7-7

ID012410

Non-Confidential, Unrestricted Access

 

Соседние файлы в папке ARM