Entire physical memory is modelled as an acyclic graph of MemoryRegion
objects 1.
Sinks(leaves) are RAM and MIMO regions.
While other nodes represents buses, memory controllers, and memory regions that have been rerouted.
In addition to MemoryRegsion
objects, the memory API provides AddressSpace
objects for every root and possibly for intermediate MemoryRegsions too.
These represent memory as seen from the CPU or a device’s viewpoint.
All are C type MemoryRegion
:
memory_region_init_ram()
, ormemory_region_init_resizeable_ram()
,memory_region_init_ram_from_file()
,memory_region_init_ram_ptr()
memory_region_init_io()
, passing it a MemoryRegsionOps
structure describing the callbacks.memory_region_init_rom()
.memory_region_init_rom_device
.memory_region_init_iommu()
.memory_region_init()
.memory_region_init_alias()
.memory_region_init_io()
.system_memory: container@0-2^48-1
|
+---- lomem: alias@0-0xdfffffff ---> #ram (0-0xdfffffff)
|
+---- himem: alias@0x100000000-0x11fffffff ---> #ram (0xe0000000-0xffffffff)
|
+---- vga-window: alias@0xa0000-0xbffff ---> #pci (0xa0000-0xbffff)
| (prio 1)
|
+---- pci-hole: alias@0xe0000000-0xffffffff ---> #pci (0xe0000000-0xffffffff)
pci (0-2^32-1)
|
+--- vga-area: container@0xa0000-0xbffff
| |
| +--- alias@0x00000-0x7fff ---> #vram (0x010000-0x017fff)
| |
| +--- alias@0x08000-0xffff ---> #vram (0x020000-0x027fff)
|
+---- vram: ram@0xe1000000-0xe1ffffff
|
+---- vga-mmio: mmio@0xe2000000-0xe200ffff
ram: ram@0x00000000-0xffffffff
Above is a (simplified) PC memory map. The 4GB RAM block is mapped into the system address space via two aliases: “lomem” is a 1:1 mapping of the first 3.5GB; “himem” maps the last 0.5GB at address 4GB. This leaves 0.5GB for the so-called PCI hole, that allows a 32-bit PCI bus to exist in a system with 4GB of memory.
The memory controller diverts addresses in the range 640K-768K to the PCI address space. This is modelled using the “vga-window” alias, mapped at a higher priority so it obscures the RAM at the same addresses. The vga window can be removed by programming the memory controller; this is modelled by removing the alias and exposing the RAM underneath.
The pci address space is not a direct child of the system address space, since we only want parts of it to be visible (we accomplish this using aliases). It has two subregions: vga-area models the legacy vga window and is occupied by two 32K memory banks pointing at two sections of the framebuffer. In addition the vram is mapped as a BAR at address e1000000, and an additional BAR, vga-mmio, containing MMIO registers is mapped after it.
A region is created by one of memory_region_init*
functions and attached to an object, which act as its owner or parent.
A region can be added to an address space or a container with memory_region_add_subregion()
, and removed using memory_region_del_subregion()
Priority: 2>1
0 1000 2000 3000 4000 5000 6000 7000 8000
|------|------|------|------|------|------|------|------|
A: [ ]
C: [CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC]
B: [ ]
D: [DDDDD]
E: [EEEEE]
Container A@0x0 – 0x8000. Region C @0x0 – 0x6000, priority 1.
Container B@0x2000 – 0x6000, priority 2. Region D @0x0 – 0x1000, E @ 0x2000 – 0x3000
The regions will be seen within this address range:
[CCCCCCCCCCCC][DDDDD][CCCCC][EEEEE][CCCCC]
Overlap created in memory_region_add_subregion_overlap()
Priority can be set in any regions, RAM, containers, alias, etc..
Rules to select a memory region when the guest accesses an address:
all direct subregions of the root region are matched against the address, in descending priority order
if none of the subregions match the address then the search terminates with no match found
Implementation:
Guest RAM: memory backend (-m [size=]megs
) + hotpluggable guest memory (DIMM, pc-dimm, slots=n, maxmem=size
)
The “pc-dimm” and “memory-backend” objects are user-visible parts of guest RAM in QEMU. They can be managed using the QEMU command-line and QMP monitor interface.
Defined in hw/mem/pc-dimm.c
. A pc-dimm
device models a DIMM.
A pc-dimm must be associated with a “memory-backend” object.
Defined in backends/hostmem.c
Contains the actual host memory that backs guest RAM. Can either be anonymous mmapped memory or file-backed mmapped memory. (File-backed guest RAM allows Linux hugetlbfs usage for huge pages on the host and also shared-memory so other host applications can access to guest RAM).
RAMBlock:
Memory inside a “memory-backend” is acutally mmapped by RAMBlock through qemu_ram_alloc()
in exec.c
.
Each RAMBlock has a pointer to the mmap memory and also a ram_addr_t
offset.
The ram_addr_t
offset is in the global namespace and is used to identify the RAMBlock.
However, ram_addr_t
namespace is just a part of the entire guest physical memory space.
It is tightly packed address space containing all RAMBlocks.
But some guest physical memory regions, such as reserved memory, memory mapped I/O, etc., are not being identified by ram_addr_t
.
All RAMBlocks are in a global list RAMList
.
Definition of RAMBlock
// include/exec/ramblock.h
struct RAMBlock {
struct rcu_head rcu;
struct MemoryRegion *mr;
uint8_t *host;
uint8_t *colo_cache; /* For colo, VM's ram cache */
ram_addr_t offset; // Lele: offset used for dirty bitmap
ram_addr_t used_length;
ram_addr_t max_length;
void (*resized)(const char*, uint64_t length, void *host);
uint32_t flags;
/* Protected by iothread lock. */
char idstr[256];
/* RCU-enabled, writes protected by the ramlist lock */
QLIST_ENTRY(RAMBlock) next;
QLIST_HEAD(, RAMBlockNotifier) ramblock_notifiers;
int fd;
size_t page_size;
/* dirty bitmap used during migration */
unsigned long *bmap;
/* bitmap of already received pages in postcopy */
unsigned long *receivedmap;
/* Bitmap of CHERI tag bits */
struct CheriTagMem *cheri_tags;
/*
* bitmap to track already cleared dirty bitmap. When the bit is
* set, it means the corresponding memory chunk needs a log-clear.
* Set this up to non-NULL to enable the capability to postpone
* and split clearing of dirty bitmap on the remote node (e.g.,
* KVM). The bitmap will be set only when doing global sync.
*
* NOTE: this bitmap is different comparing to the other bitmaps
* in that one bit can represent multiple guest pages (which is
* decided by the `clear_bmap_shift' variable below). On
* destination side, this should always be NULL, and the variable
* `clear_bmap_shift' is meaningless.
*/
unsigned long *clear_bmap;
uint8_t clear_bmap_shift;
};
Definition of MemoryRegion:
// include/exec/memory.h
/** MemoryRegion:
*
* A struct representing a memory region.
*/
struct MemoryRegion {
Object parent_obj;
/* private: */
/* The following fields should fit in a cache line */
bool romd_mode;
bool ram;
bool subpage;
bool readonly; /* For RAM regions */
bool nonvolatile;
bool rom_device;
bool flush_coalesced_mmio;
bool global_locking;
uint8_t dirty_log_mask;
bool is_iommu;
RAMBlock *ram_block;
Object *owner;
const MemoryRegionOps *ops;
void *opaque;
MemoryRegion *container;
Int128 size;
hwaddr addr;
void (*destructor)(MemoryRegion *mr);
uint64_t align;
bool terminates;
bool ram_device;
bool enabled;
bool warning_printed; /* For reservations */
uint8_t vga_logging_count;
MemoryRegion *alias;
hwaddr alias_offset;
int32_t priority;
QTAILQ_HEAD(, MemoryRegion) subregions;
QTAILQ_ENTRY(MemoryRegion) subregions_link;
QTAILQ_HEAD(, CoalescedMemoryRange) coalesced;
const char *name;
unsigned ioeventfd_nb;
MemoryRegionIoeventfd *ioeventfds;
};
Q&A Where is the virt to physical address translated? What is the fast path without tlb? Addr Translation tlb_vaddr_to_host In accel/tcg/cputlb.c: tlb_vaddr_to_host. If tlb hit: return host vaddr as guest physical address. If tlb miss: tlb_fill: Called to resize the TLB. All the caller’s prior references to the TLB table must be discard and looked up again via tlb_entry(). // accel/tcg/cputlb.c: void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr, MMUAccessType access_type, int mmu_idx) { CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr); target_ulong tlb_addr, page; //.
If you could revise
the fundmental principles of
computer system design
to improve security...
... what would you change?