Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
CS 220 / ARM / ARM1176JZ-S Technical Reference Mmanual.pdf
Источник:
Скачиваний:
45
Добавлен:
16.04.2015
Размер:
4.47 Mб
Скачать

Level One Memory System

7.5TCM and cache interactions

In the event that a TCM and a cache both contain the requested address, it is architecturally Unpredictable which memory the instruction data is returned from. It is expected that such an event only arises from a failure to invalidate the cache when the base register of the TCM is changed, and so is clearly a programming error. For a Harvard arrangement of caches and TCM, data reads and writes can access any Instruction TCM for both reads and writes. This ensures that accesses to literal pools, Undefined instructions, and SVC numbers are possible, and aids debugging. For this reason, an Instruction TCM must behave as a unified TCM, but can be optimized for instruction fetches.

You must not program an Instruction TCM to the same base address as a Data TCM and, if the two RAMs are different sizes, the regions in physical memory of the two RAMs must not be overlapped. This is because the resulting behavior is architecturally Unpredictable.

In these cases, you must not rely on the behavior of ARM1176JZ-S processor for code that is intended to be ported to other ARM platforms.

In all cases, no security consideration is necessary because there cannot be a conflict between accesses targeting Secure and Non-secure memory. Any cache line or TCM data is marked as being Secure or Non-secure and no Unpredictable situations can result from this.

7.5.1Overlapping between TCM regions

Where TCM regions overlap, the access priority is worked out using these rules, starting with the highest priority rule:

1.Where there is an overlap between a DTCM and an ITCM, the DTCM has priority for data accesses.

Note

Instruction accesses to the DTCM are not possible.

2.Where there is an overlap between two TCMs on the same side, TCM0 has priority. This means that DTCM0 has priority over DTCM1, and ITCM0 has priority over ITCM1.

This means that, for data accesses, the priority order if all four TCMs overlap is:

1.DTCM0, highest priority

2.DTCM1

3.ITCM0

4.ITCM1, lowest priority.

For instruction accesses, the priority order is:

1.ITCM0, highest priority

2.ITCM1, lowest priority.

These priority rules are not affected by whether the TCMs are Secure or Non-secure. The only effect of configuring TCMs as Secure or Non-secure is that a Secure TCM cannot overlap a Non-secure TCM.

7.5.2DMA and core access arbitration

DMA and core accesses to both the Instruction TCM and the Data TCM can occur in parallel. So as not to disrupt the execution of the core, core-generated accesses have priority over those requested by the DMA engine, regardless of the security level of the accesses.

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

7-12

ID012410

Non-Confidential, Unrestricted Access

 

Level One Memory System

7.5.3Instruction accesses to TCM

If the Instruction TCM and the Instruction Cache both contain the requested instruction address, the processor returns data from the TCM. The instruction prefetch port of the processor cannot access the Data TCM. If an instruction prefetch misses the Instruction TCM and Instruction Cache but hits the Data TCM, then the result is an access to the level two memory.

An IMB must be inserted between a write to an Instruction TCM and the instructions being written that it relies on. In addition, any branch prediction mechanism must be invalidated or disabled if a branch in the Instruction TCM is overwritten.

7.5.4Data accesses to the Instruction TCM

If the Data TCM and the Data Cache both contain the requested data address for a read, the processor returns data from the Data TCM. For a write, the write occurs to the Data TCM. The majority of data accesses are expected to go to the Data Cache or to the Data TCM, but it is necessary for the Instruction TCM to be read or written on occasion.

The Instruction TCM base addresses are read by the processor data port as a possible source for data for all memory accesses. This increases the data comparisons associated with the data, compared with the number required for the instruction memory lookup, for the level one memory hit generation. This functionality is required for reading literal values and for debug purposes, such as setting software breakpoints.

Access to the Instruction TCM involves a delay of 5-12 cycles in reading or writing the data. This delay enables the Instruction TCM access to be scheduled to take place only when the presence of a hit to the Instruction TCM is known. This saves power and avoids unnecessary delays being inserted into the instruction-fetch side. This delay is applied to all accesses in a multiple operation in the case of an LDM, an LDCL, an STM, or an STCL.

Literal pool accesses

It can take 5-12 cycles for the data port to read data from the Instruction TCM.

Because the path lengths are short, there might sometimes be an increase in latency to achieve greater clock speeds. Therefore, avoid literal pool accesses inside critical loops. This does not affect code in cache, because the literal pool is loaded into the D cache.

Switching penalty between cache & TCM

Normally, an access to the cache or TCM takes a single cycle. However, it can take three cycles in certain cases.

To perform a cache or TCM read in a single cycle, the processor speculatively reads the RAM contents. It does not know if it was the correct RAM until after the read is complete. To save power, the processor performs a speculative read either to the TCM or to the cache. If the read is wrong, the processor must repeat the access to the correct location.

There is a penalty of three clock cycles when the core switches between accessing cache and TCM, for example if it thinks the access is in TCM, but it is in fact in cache. So. three cycles for the first non-sequential access to TCM, when the previous access on that side, I-side or D-side, was to cache and similarly, three cycles penalty for the first non-sequential access to cache, when the previous access on that side was to TCM. This is not an issue on the I-side, where code does not typically branch between TCM and cacheable areas, but can be an issue for data.

For example, in the following code:

Loop LDR r0, [r2],#4 ; reads an item from D-TCM

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

7-13

ID012410

Non-Confidential, Unrestricted Access

 

Level One Memory System

LDR r1, [r3],#4 ; reads an item from D-cache

ADD r4, r0, r1 ; perform some calculation on the loaded data

CMP r1, r5 ; finished yet?

BLT loop

Each iteration of this loop pays the three cycle penalty twice, because the loads alternate between cache & TCM. This is an extreme example, of course. Because of hit-under-miss, this 3 cycle penalty might not stall the integer core. If the same code uses only D-TCM, or only D-cache, each load typically takes one cycle.

This can be important if a performance critical loop operates on two blocks of data, one in D-TCM and one in main memory, especially if the data is consumed in small blocks of a byte or word, rather than multiple words per iteration.

So, if you have all of the dhrystone code and data in TCM, you get better performance than if you have nearly all in TCM.

It is not required for instruction port(s) to be able to access the Data TCM. An attempt to access addresses in the range covered by a Data TCM from an instruction port does not result in an access to the Data TCM. In this case, the instruction is fetched from main memory. It is anticipated that such accesses can result in external aborts in some systems, because the address range might not be supported in main memory.

Instruction TCMs must not be programmed to the same base address as a Data TCM and, if the RAMs are of different sizes, the regions in physical memory of the two RAMs must not be overlapped because the resulting behavior is architecturally Unpredictable. If an access is made to a location that is covered by both an Instruction TCM and a Data TCM, the access is only to the Data TCM.

Table 7-4 summarizes the results of data accesses to TCM and the cache. This also embodies the unexpected hit behavior for the cache that Unexpected hit behavior on page 7-6 describes. In Table 7-4, the Data Cache can only be hit if the memory location being accessed is marked as being Cacheable and Not shareable. A hit to the Data TCM and Instruction TCM refers to hitting an address in the range covered by that TCM.

Table 7-4 Summary of data accesses to TCM and caches

Data

Data

Instruction

Read behavior

Write behavior

TCM

cache

TCMa

 

 

Hit

Hit

Hit

Read from Data TCM.

Write to Data TCM. No write to the Instruction

 

 

 

 

TCM or Data Cache.

 

 

 

 

No write to level two, even if marked as

 

 

 

 

Write-Through.

 

 

 

 

 

Hit

Hit

Miss

Read from Data TCM.

Write to Data TCM. No write to Data Cache.

 

 

 

 

No write to level two even if marked as

 

 

 

 

Write-Through.

 

 

 

 

 

Hit

Miss

Hit

Read from Data TCM.

Write to Data TCM. No write to Instruction TCM.

 

 

 

No linefill to Data Cache fill

No write to level two even if marked as

 

 

 

even if marked Cacheable.

Write-Through.

 

 

 

 

 

Hit

Miss

Miss

Read from Data TCM.

Write to Data TCM.

No linefill to Data Cache even if marked Cacheable.

No write to level two even if marked as Write-Through.

Miss

Hit

Hit

Read from Data Cache.

Write to Data Cache.

If Write-Through, write to Instruction TCM.

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

7-14

ID012410

Non-Confidential, Unrestricted Access

 

Level One Memory System

Table 7-4 Summary of data accesses to TCM and caches (continued)

Data

Data

Instruction

Read behavior

Write behavior

TCM

cache

TCMa

 

 

Miss

Hit

Miss

Read from Data Cache.

Write to Data Cache.

 

 

 

 

If Write-Through, write to level two.

 

 

 

 

 

Miss

Miss

Hit

Read from Instruction TCM.

Write to Instruction TCM.

 

 

 

No cache fill even if marked

No write to level two even if marked as

 

 

 

Cacheable.

Write-Through.

 

 

 

 

 

Miss

Miss

Miss

If Cacheable and cache

Write to level two.

 

 

 

enabled, cache linefill.

 

 

 

 

If Noncacheable or cache

 

 

 

 

disabled, read to level two.

 

 

 

 

 

a.

Excludes unexpected hit.

 

 

Table 7-5 summarizes the results of instruction accesses to TCM and the cache. This also embodies the unexpected hit behavior for the cache that Unexpected hit behavior on page 7-6 describes. In Table 7-5, the Instruction Cache can only be hit if the memory location being accessed is marked as being Cacheable and not shareable. A hit to the Instruction TCM refers to hitting an address in the range covered by that TCM.

Table 7-5 Summary of instruction accesses to TCM and caches

Instruction TCM Instruction cachea

Data TCM

Read behavior

Hit

Hit

Don’t care

Read from I TCMNo linefill to I Cache even if marked

 

 

 

Cacheable

 

 

 

 

Hit

Miss

Don’t care

Read from Instruction TCM.

 

 

 

No linefill to Instruction Cache, even if marked cacheable.

 

 

 

 

Miss

Hit

Don’t care

Read from Instruction Cache.

 

 

 

 

Miss

Miss

Don’t care

If Cacheable and cache enabled, cache linefill.

 

 

 

If Noncacheable or cache disabled, read to level two.

 

 

 

 

a.

Excludes unexpected hit.

 

 

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

7-15

ID012410

Non-Confidential, Unrestricted Access

 

Соседние файлы в папке ARM