現(xiàn)在android智能手機(jī)市場異?;馃幔?a href="http://ttokpm.com/v/tag/1751/" target="_blank">硬件升級非常迅猛,arm cortex A9 + 1GB DDR似乎已經(jīng)跟不上主流配置了。雖說硬件是王道,可我們還是不禁還懷疑這么強(qiáng)大的硬件配置得到充分利用了嗎?因此以后我都會正對ARM平臺分析kernel的內(nèi)容。?
正文
在linux內(nèi)存管理中,有兩個資源非常重要,一個是虛擬地址,一個是物理地址。聽起來似乎是廢話,實際上內(nèi)存管理主要就是圍繞這兩個概念展開的。如果對linux kernel如果管理虛擬地址和物理地址還沒有概念的,建議瀏覽一下文獻(xiàn)【2】,這是一本很棒的書,言簡意賅。文獻(xiàn)【1】會講更多的實現(xiàn)細(xì)節(jié)。
本文主要目的是對內(nèi)核1GB虛擬地址空間映射有個總體了解,包括:
1. 1GB內(nèi)核虛擬地址空間具體用于什么地方?
2. 其和實際物理地址的映射關(guān)系.
3. 一些板級相關(guān)的宏定義,為了便于日后查閱,我也將這些宏定義整理了出來。根據(jù)這些宏定義,你也可以輕松畫出你所用的平臺的內(nèi)核虛擬地址空間映射關(guān)系。
首先申明,實例中的映射規(guī)劃不見得就是最優(yōu)的,但它卻是一個實際的例子。實際上我個人覺得還是有很多值得商榷的地方。
從下圖我們可以看到,粉色部分0xbf80 0000 ~ 0xc000 0000是為modules及kpmap的,從下面的板級宏定義我們可以看到,modules放在這段位置是因為它需要和kernel code段在32MB尋址空間內(nèi)。kpmap為什么放這段空間我還不清楚,這個是在map highmem時用到的。
橙色部分0xc000 0000 ~ 0xe000 0000映射?lowmem(低端內(nèi)存,即zone[Normal])。這段映射是一對一的平坦映射,也就是說kernel初始化這段映射后,頁表將不會改變。這樣即可以省去不斷的修改頁表,刷新TLB(TLB可以認(rèn)為是頁表的硬件cache,如果訪問的虛擬地址的頁表在這個cache中,則CPU無需訪問DDR尋址頁表了,這樣可以提高IO效率)了。顯然這段地址空間非常珍貴,因為這段映射的效率高。從圖中我們可以看到,在512MB映射空間中,有128MB預(yù)留給PMEM(android特有的連續(xù)物理內(nèi)存管理機(jī)制),16MB預(yù)留CP(modem運行空間)。實際可用lowmem大致只有360MB。
藍(lán)色部分0xe000 0000 ~ 0xf000 0000銀蛇h(yuǎn)ighmem(高端內(nèi)存,即zone[HighMem])。因為示例為1GB DDR,因此需要高端內(nèi)存映射部分物理地址空間。
綠色部分0xf000 0000 ~ 0xffc0 0000為IO映射區(qū)域。我們知道在內(nèi)核空間,比如寫驅(qū)動的時候,需要訪問芯片的寄存器(IO空間),部分IO空間映射是通過ioremap在VMALLOC區(qū)域動態(tài)申請映射,還有部分是系統(tǒng)初始化時通過iotable_init靜態(tài)映射的。圖中我們可以看到在IO靜態(tài)映射區(qū)域有大約200MB的空間沒有使用。這個是不是太浪費了呢?
紫色部分沒什么花頭,ARM default定義就是這樣的。
下圖給出了內(nèi)核虛擬地址空間和實際物理地址的映射關(guān)系。
下面開始玩點激情的,看看這個mapping存在什么問題。
實際上我在這個平臺上遇到一個bug,即在用monkey test做壓力測試的時候,系統(tǒng)運行很長時間后會出現(xiàn)vmalloc失敗。OMG,調(diào)用vmalloc都會失敗,而且此時還有足夠多的物理內(nèi)存,神奇吧?
【錯誤log】系統(tǒng)的graphic模塊在用vmalloc申請1MB內(nèi)存時失敗
【分析】
1. 首先查看此時基本的內(nèi)存信息。通過/proc/meminfo可以看到,實際可用物理內(nèi)存還剩156MB,內(nèi)存此時并未耗盡。vmalloc所使用的VMALLOC虛擬地址還剩余22MB,也是夠用的。根據(jù)vmalloc實現(xiàn)原理,它會通過調(diào)用alloc_page()去buddy系統(tǒng)中取一個個孤立的page(即在2^0鏈表上取page)。page此時是足夠多的,為什么會申請失敗呢?vmalloc要求虛擬地址是連續(xù)的,難道是VMALLOC中沒有連續(xù)的1MB虛擬地址了?
2. 帶著這個問題,我們繼續(xù)分析/proc/vmallocinfo.
從/proc/vmallocinfo的信息看到,VMALLOC已經(jīng)用到0xefeff00了,那么最大可用連續(xù)空間為0xf0000000 - 0xefeff000 = 0x101000. 還記得我們要申請的內(nèi)存空間大小嗎?沒錯,是0x1a0000。哇,第一次發(fā)現(xiàn)kernel虛擬地址也能耗盡。那為什么從meminfo信息來看還有22MB VMALLOC虛擬地址呢?顯然這段虛擬地址空間也產(chǎn)生了大量碎片。
好吧,虛擬地址資源耗盡,我們似乎也沒辦法了,窮途末路。不過本著研究的精神,我們還得懷疑為什么VMALLOC這段虛擬地址使用這么多,畢竟我們給這段空間規(guī)劃了256MB。物理內(nèi)存還有這么多,為什么不直接調(diào)用kmalloc或者get_free_pages呢?
3. 繼續(xù)分析看下此時物理內(nèi)存分布情況
/proc/buddyinfo可以看到buddy系統(tǒng)總得內(nèi)存分配狀態(tài),?及更多關(guān)于碎片管理的信息。
大致了解下pagetypeinfo,kernel會將物理內(nèi)存分為不同的zone, 在我的平臺上上,有zone[Normal]及zone[HighMem]。migrate type是為避免內(nèi)存碎片而設(shè)計的,不明的可以參考文獻(xiàn)【1】。從/proc/pagetypeinfo看到我們可以得到的最大連續(xù)內(nèi)存為2^7個page,即512KB??磥泶藭r是滿足不了graphic需求,進(jìn)一步驗證的graphic為什么會大量使用vmalloc.
/proc/buddyinfo信息。
4. 結(jié)論
根據(jù)上面分析,graphic通過get_free_pages()向kernel的buddy系統(tǒng)申請連續(xù)內(nèi)存,經(jīng)過一段時間,buddy系統(tǒng)產(chǎn)生了大量碎片,graphic無法獲取連續(xù)的物理內(nèi)存,因此通過vmalloc想從buddy系統(tǒng)申請不連續(xù)的內(nèi)存,不幸的是VMALLOC的虛擬地址空間耗盡,盡管這是還有大量物理內(nèi)存,vmalloc申請失敗。
5. 從新審視內(nèi)存映射
這里一個問題就是lowmem的規(guī)劃空間太小了,vmalloc默認(rèn)會從zone[HighMem]申請內(nèi)存,這樣很容易在highmem產(chǎn)生碎片??吹阶铋_始我們kernel虛擬映射圖了嗎?我們不是有200MB的虛擬空間沒有使用嗎?如果把它mapping給lowmem多好啊。
下面我對這段映射做了修改。最大的變化就是lowmem從512MB增加到了720MB。200MB未使用的虛擬地址空間得到了充分利用。
修改后,我們再看看buddy信息吧,最大可申請的連續(xù)內(nèi)存為2^15個page=128MB。這樣的規(guī)劃也增加內(nèi)存利用效率。
下面列表是板級相關(guān)的一些宏定義,這些宏定義決定了如何規(guī)劃內(nèi)核虛擬地址?,F(xiàn)在一般也沒什么機(jī)會從零開始bringup一塊新的芯片,因此這些定義大家可能不會關(guān)注。不過在研究內(nèi)存規(guī)劃時,這些定義還是非常重要的,我將它們整理出來也是為了日后方便查閱。大家也可以試著根據(jù)自己的板子填寫這些宏定義,這樣整個內(nèi)核空間映射視圖就會展現(xiàn)出來。
Board specific macro definition
Refer to [Documentation/arm/Porting]
Decompressor Symbols
Macro name
description
example
ZTEXTADDR
[arch/arm/boot/compressed/Makefile]
Start address of decompressor.? There's no point in talking about virtual or physical addresses here, since the MMU will be off at the time when you call the decompressor code.? You normally call the kernel at this address to start it booting.? This doesn't have to be located in RAM, it can be in flash or other read-only or read-write addressable medium.
0x0
ZTEXTADDR??????? := $(CONFIG_ZBOOT_ROM_TEXT)
ONFIG_ZBOOT_ROM_TEXT=0x0
ZBSSADDR
[arch/arm/boot/compressed/Makefile]
Start address of zero-initialised work area for the decompressor. This must be pointing at RAM.? The decompressor will zero initialize this for you.? Again, the MMU will be off.
0x0
ZBSSADDR?? := $(CONFIG_ZBOOT_ROM_BSS)
CONFIG_ZBOOT_ROM_BSS=0x0
ZRELADDR
[arch/arm/boot/Makefile]
This is the address where the decompressed kernel will be written, and eventually executed.? The following constraint must be valid:
__virt_to_phys(TEXTADDR) == ZRELADDR
The initial part of the kernel is carefully coded to be position independent.
Note: the following conditions must always be true:
ZRELADDR == virt_to_phys(PAGE_OFFSET + TEXT_OFFSET)
0x81088000
ZRELADDR??? := $(zreladdr-y)
zreladdr-y?????? := $(__ZRELADDR)
__ZRELADDR = TEXT_OFFSET + 0x80000000
[arch/arm/mach-pxa/Makefile.boot]
INITRD_PHYS
Physical address to place the initial RAM disk.? Only relevant if you are using the bootpImage stuff (which only works on the old struct param_struct).
INITRD_PHYS must be in RAM
Not defined
INITRD_VIRT
Virtual address of the initial RAM disk.? The following constraint must be valid:
__virt_to_phys(INITRD_VIRT) == INITRD_PHYS
Not defined
PARAMS_PHYS
Physical address of the struct param_struct or tag list, giving the kernel various parameters about its execution environment.
PARAMS_PHYS must be within 4MB of ZRELADDR
Not defined
Kernel Symbols
PHYS_OFFSET
[arch/arm/include/asm/memory.h]
Physical start address of the first bank of RAM.
#define PHYS_OFFSET????? PLAT_PHYS_OFFSET
#define PLAT_PHYS_OFFSET??? UL(0x80000000)
[arch/arm/mach-pxa/include/mach/memory.h]
PAGE_OFFSET
[arch/arm/include/asm/memory.h]
Virtual start address of the first bank of RAM.? During the kernel boot phase, virtual address PAGE_OFFSET will be mapped to physical address PHYS_OFFSET, along with any other mappings you supply. This should be the same value as TASK_SIZE.
CONFIG_PAGE_OFFSET
=0xC0000000
TASK_SIZE
[arch/arm/include/asm/memory.h]
The maximum size of a user process in bytes.? Since user space always starts at zero, this is the maximum address that a user process can access+1.? The user space stack grows down from this address.
Any virtual address below TASK_SIZE is deemed to be user process area, and therefore managed dynamically on a process by process basis by the kernel.? I'll call this the user segment.
Anything above TASK_SIZE is common to all processes.? I'll call this the kernel segment.
(In other words, you can't put IO mappings below TASK_SIZE, and hence PAGE_OFFSET).
CONFIG_PAGE_OFFSET
-0x01000000
=0xBF000000
TASK_UNMAPPED_BASE
[arch/arm/include/asm/memory.h]
the lower boundary of the mmap VM area
CONFIG_PAGE_OFFSET/3
=0x40000000
MODULES_VADDR
[arch/arm/include/asm/memory.h]
The module space lives between the addresses given by TASK_SIZE and PAGE_OFFSET - it must be within 32MB of the kernel text.
TEXT_OFFSET does not allow to use 16MB modules area as ARM32 branches to kernel may go out of range taking into account the kernel .text size
PAGE_OFFSET
- 8*1024*1024
=0x0XBF800000
MODULES_END
[arch/arm/include/asm/memory.h]
The highmem pkmap virtual space shares the end of the module area.
0XBFE00000
#ifdef CONFIG_HIGHMEM
#define MODULES_END?????????? (PAGE_OFFSET - PMD_SIZE)
#else
#define MODULES_END?????????? (PAGE_OFFSET)
#endif
TEXTADDR
Virtual start address of kernel, normally PAGE_OFFSET + 0x8000.
This is where the kernel image ends up.? With the latest kernels, it must be located at 32768 bytes into a 128MB region.? Previous kernels placed a restriction of 256MB here.
DATAADDR
Virtual address for the kernel data segment.? Must not be defined when using the decompressor.
VMALLOC_START
VMALLOC_END
[arch/arm/mach-pxa/include/mach/vmalloc.h]
Virtual addresses bounding the vmalloc() area.? There must not be any static mappings in this area; vmalloc will overwrite them. The addresses must also be in the kernel segment (see above). Normally, the vmalloc() area starts VMALLOC_OFFSET bytes above the last virtual RAM address (found using variable high_memory).
#define VMALLOC_END?????? (0xf0000000UL)
The default vmalloc size is 128MB.
vmalloc_min = (VMALLOC_END - SZ_128M);
[defined in arch/arm/mm/mmu.c]
If vmalloc is configured passed by OSL, then it’s redefined.
early_param("vmalloc", early_vmalloc);
[defined in arch/arm/mm/mmu.c]
VMALLOC_OFFSET
[arch/arm/include/asm/pgtable.h]
Offset normally incorporated into VMALLOC_START to provide a hole between virtual RAM and the vmalloc area.? We do this to allow out of bounds memory accesses (eg, something writing off the end of the mapped memory map) to be caught.? Normally set to 8MB.
#define VMALLOC_OFFSET?????????????? (8*1024*1024)
CONSISTENT_DMA_SIZE
CONSISTENT_BASE
CONSISTENT_END
[arch/arm/include/asm/memory.h]
Size of DMA-consistent memory region.? Must be multiple of 2M, between 2MB and 14MB inclusive.
CONSISTENT_DMA_SIZE = 2MB
CONSISTENT_BASE = 0XFFC00000
CONSISTENT_END = 0XFFE00000
FIXADDR_START
FIXADDR_TOP
FIXADDR_SIZE
[arch/arm/include/asm/fixmap.h]
fixed virtual addresses
#define FIXADDR_START????????? 0xfff00000UL
#define FIXADDR_TOP????????????? 0xfffe0000UL
#define FIXADDR_SIZE????????????? (FIXADDR_TOP - FIXADDR_START)
PKMAP_BASE
[arch/arm/include/asm/highmen.h]
0XBFE00000
#define PKMAP_BASE?????????????? (PAGE_OFFSET - PMD_SIZE)
?
評論
查看更多