Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# SMMUV3



## Introduction

A *System Memory Management Unit* (SMMU) performs a task that is analogous to that of an MMU in a PE, translating addresses for DMA requests from system I/O devices before the requests are passed into the system interconnect. It is active for DMA only. Traffic in the other direction, from the system or PE to the device, is managed by other means – for example, the PE MMUs.

<img src="doc/figures/System_MMU_in_DMA_traffic.png" alt="System_MMU_in_DMA_traffic" style="zoom:25%;" />

Several SMMUs might exist within a system. An SMMU might translate traffic from just one device or a set of devices. The SMMU supports two stages of translation in a similar way to PEs supporting the Virtualization Extensions. Each stage of translation can be independently enabled. An incoming address is logically translated from VA to IPA in stage 1, then the IPA is input to stage 2 which translates the IPA to the output PA. Stage 1 is intended to be used by a software entity to provide isolation or translation to buffers within the entity, for example DMA isolation within an OS. Stage 2 is intended to be available in systems supporting the Virtualization Extensions and is intended to virtualize device DMA to guest VM address spaces.



## SMMU device

The form of SMMU in SOC is not fixed and depends on the design of the chip manufacturer. When an SOC is equipped with an SMMU, the system architecture changes, and some bus nodes are hidden behind the SMMU, becoming its Client Device.

As shown in the figure, an SOC can contain multiple SMMUs, with different devices connected to different SMMUs. Devices not connected to an SMMU cannot use it for address translation.

<img src="doc/figures/Example_SMMU_implementations.png" alt="Example_SMMU_implementations" style="zoom:50%;" />



## StreamID

PCIe devices use BDF (Bus/Device/Function) as the base value for StreamID, sid = (B << 5) | (D << 3) | F. The StreamID for other devices is defined by the SoC vendor during design, such as the streamID that can be viewed in the device tree.



## Data Structures

```
pub trait PagingHandler: Sized {
const SID_BITS_SET: u32 ; //linear STE counter=2^SID_BITS_SET
const CMDQ_EVENTQ_BITS_SET: u32; //cmd Queue and event Queue depth
fn alloc_pages(num_pages: usize) -> Option<PhysAddr>;
fn dealloc_pages(paddr: PhysAddr, num_pages: usize);
fn phys_to_virt(paddr: PhysAddr) -> VirtAddr;
fn flush(start: usize, len: usize);
}
```

#### Key Alignment Requirements

**1. Stream Table Base Address Alignment**

**Rule**: When using a linear Stream Table, the base address must be aligned to the table size: `Effective Base Address = ADDR & ~((1 << (LOG2SIZE + 6)) - 1)`, meaning memory allocation must satisfy `2^(LOG2SIZE+6)`-byte boundary alignment.

**Example**: If `SID_BITS_SET=16` (16-bit StreamID width), **4MB alignment** is required (calculation: `2^(16+6)=2^22=4MB`).

---

**2. Command Queue Base Address Alignment**

**Rule**: The base address must satisfy: `ADDR % MAX(queue_size, 32) = 0`, where `queue_size = number_of_entries × entry_size` (entry size fixed at 16 bytes).

**Example**: For a queue with `2^8=256` entries: Total size = `256 × 16 = 4096 bytes` (i.e., 4KB), requiring 4KB boundary alignment (since `4096 > 32`, `MAX(4096,32)=4096`).

---

**3. Misalignment Risks**

If the software-allocated base address violates alignment:

- The SMMU implicitly truncates lower address bits (e.g., ignoring `ADDR[21:0]`), using the nearest valid address.
- Causes Stream Table Entry (STE) resolution errors or incorrect command fetching, triggering device DMA access faults or permission violations.



```
const DEFAULT_S2VTCR: u64 = VTCR_EL2::PS::PA_40B_1TB.value
| VTCR_EL2::TG0::Granule4KB.value
| VTCR_EL2::SH0::Inner.value
| VTCR_EL2::ORGN0::NormalWBRAWA.value
| VTCR_EL2::IRGN0::NormalWBRAWA.value
| VTCR_EL2::SL0.val(0b01).value
| VTCR_EL2::T0SZ.val(64-39).value;
```

`DEFAULT_S2VTCR` is located at `bits [160:178]` of the STE (Stream Table Entry). Its value must be consistent with the values in the corresponding bits of the Virtualization Translation Control Register at EL2 (`VTCR_EL2`).



## Usage

`base_address` is the base address of the SMMU, which can be obtained from the device tree or datasheet. For example:

- In QEMU VIRT_SMMU: memory region starts at `0x09050000` with size `0x20000`
- In phytium e2000: memory region starts at `0x30000000` with size `0x800000`

Before initializing the SMMU, its register memory must be mapped as **Device type** in the CPU page tables.

```rust
let mut smmuv3 = SMMUv3::<Smmuv3PagingHandler>::new(base_address as *mut u8);

smmuv3.init(); // Initialization

smmuv3.add_device(streamID, vm.id(), vm.ept_root()); // Configure STE
```
Binary file added doc/figures/Example_SMMU_implementations.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/figures/System_MMU_in_DMA_traffic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 25 additions & 0 deletions src/hal.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,29 @@ use memory_addr::{PhysAddr, VirtAddr};
/// The low-level **OS-dependent** helpers that must be provided for
/// [`crate::SMMUv3`].
pub trait PagingHandler: Sized {
/// 6.3.24 SMMU_STRTAB_BASE
/// • When a Linear Stream table is used, that is when SMMU_STRTAB_BASE_CFG.FMT == 0b00, the
/// effective base address is aligned by the SMMU to the table size, ignoring the least-significant bits in the
/// ADDR range as required to do so:
/// ADDR[LOG2SIZE + 5:0] = 0.
/// • When a 2-level Stream table is used, that is when SMMU_STRTAB_BASE_CFG.FMT == 0b01, the
/// effective base address is aligned by the SMMU to the larger of 64 bytes or the first-level table size:
/// ADDR[MAX(5, (LOG2SIZE - SPLIT - 1 + 3)):0] = 0.
/// The alignment of ADDR is affected by the literal value of the respective
/// SMMU_STRTAB_BASE_CFG.LOG2SIZE field and is not limited by SIDSIZE.
/// Note: This means that configuring a table that is larger than required by the incoming StreamID span results
/// in some entries being unreachable, but the table is still aligned to the configured size.
/// For example, SID_BITS_SET = 16, when alloc page alignment is to 2^(16 + 6) = 2^22 = 4MB.
const SID_BITS_SET: u32 ;

/// 6.3.26 SMMU_CMDQ_BASE
/// • The effective base address is aligned by the SMMU to the larger of the queue size in bytes or 32 bytes,
/// ignoring the least-significant bits of ADDR as required. ADDR bits [4:0] are treated as zero.
/// – Note: For example, a queue with 2^8 entries is 4096 bytes in size so software must align an allocation,
/// and therefore ADDR, to a 4KB boundary
/// 2^8*16=4096 bytes.this means 256 entries, 16 bytes per entry.
const CMDQ_EVENTQ_BITS_SET: u32;

/// Request to allocate contiguous 4K-sized pages.
fn alloc_pages(num_pages: usize) -> Option<PhysAddr>;
/// Request to free allocated physical pages.
Expand All @@ -11,4 +34,6 @@ pub trait PagingHandler: Sized {
///
/// Used to access the physical memory directly in page table implementation.
fn phys_to_virt(paddr: PhysAddr) -> VirtAddr;
///flush the memory range [start, start+len)
fn flush(start: usize, len: usize);
}
90 changes: 53 additions & 37 deletions src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
//! ARM System Memory Management Unit (SMMU) v3 driver written in Rust.

#![no_std]
#![feature(const_option)]
#![feature(const_nonnull_new)]

#[macro_use]
extern crate log;
Expand Down Expand Up @@ -45,7 +43,7 @@ register_structs! {
(0x0020 => CR0: Cr0Reg),
(0x0024 => CR0ACK: Cr0AckReg),
(0x0028 => CR1: Cr1Reg),
(0x002c => CR2: ReadWrite<u32>),
(0x002c => CR2: Cr2Reg),
(0x0030 => _reserved0),
(0x0050 => IRQ_CTRL: ReadWrite<u32>),
(0x0054 => IRQ_CTRLACK: ReadOnly<u32>),
Expand All @@ -60,14 +58,14 @@ register_structs! {
(0x0090 => CMDQ_BASE: CmdQBaseReg),
(0x0098 => CMDQ_PROD: CmdQProdReg),
(0x009c => CMDQ_CONS: CmdQConsReg),
(0x00a0 => EVENTQ_BASE: ReadWrite<u64>),
(0x00a0 => EVENTQ_BASE: EventQBaseReg),
(0x00a8 => _reserved4),
(0x00b0 => EVENTQ_IRQ_CFG0: ReadWrite<u64>),
(0x00b8 => EVENTQ_IRQ_CFG1: ReadWrite<u32>),
(0x00bc => EVENTQ_IRQ_CFG2: ReadWrite<u32>),
(0x00c0 => _reserved5),
(0x100a8 => EVENTQ_PROD: ReadWrite<u32>),
(0x100ac => EVENTQ_CONS: ReadWrite<u32>),
(0x100a8 => EVENTQ_PROD: EventQProdReg),
(0x100ac => EVENTQ_CONS: EventQConsReg),
(0x100b0 => _reserved6),
(0x20000 => @END),
}
Expand All @@ -78,29 +76,29 @@ pub struct SMMUv3<H: PagingHandler> {
base: NonNull<SMMUv3Regs>,
stream_table: LinearStreamTable<H>,
cmd_queue: Queue<H>,
event_queue: Queue<H>,
}

unsafe impl<H: PagingHandler> Send for SMMUv3<H> {}
unsafe impl<H: PagingHandler> Sync for SMMUv3<H> {}

const ARM_SMMU_SYNC_TIMEOUT: usize = 0x1000000;

impl<H: PagingHandler> SMMUv3<H> {
/// Construct a new SMMUv3 instance from the base address.
pub const fn new(base: *mut u8) -> Self {
Self {
base: NonNull::new(base).unwrap().cast(),
stream_table: LinearStreamTable::uninit(),
cmd_queue: Queue::uninit(),
event_queue: Queue::uninit(),
}
}

/// Initialize the SMMUv3 instance.
pub fn init(&mut self) {
let sid_max_bits = self.regs().IDR1.read(IDR1::SIDSIZE);
info!(
"Max SID bits: {}, max SIE count {}",
sid_max_bits,
1 << sid_max_bits
);
info!("Max SID bits: {}, max SIE count {}", sid_max_bits, 1 << sid_max_bits);

if sid_max_bits >= 7
&& self.regs().IDR0.read(IDR0::ST_LEVEL) == IDR0::ST_LEVEL::LinearStreamTable.into()
Expand All @@ -109,32 +107,25 @@ impl<H: PagingHandler> SMMUv3<H> {
panic!("Smmuv3 the system must support for 2-level table");
}

self.stream_table.init(sid_max_bits);

self.regs().STRTAB_BASE.write(
STRTAB_BASE::RA::Enable
+ STRTAB_BASE::ADDR.val(self.stream_table.base_addr().as_usize() as u64 >> 6),
);

self.regs()
.STRTAB_BASE_CFG
.write(STRTAB_BASE_CFG::FMT::Linear + STRTAB_BASE_CFG::LOG2SIZE.val(sid_max_bits));

let cmdqs_log2 = self.regs().IDR1.read(IDR1::CMDQS);
let cmdqs_log2 = H::CMDQ_EVENTQ_BITS_SET;
self.cmd_queue.init(cmdqs_log2);
self.regs().CMDQ_BASE.write(
CMDQ_BASE::RA::ReadAllocate
+ CMDQ_BASE::ADDR.val(self.cmd_queue.base_addr().as_usize() as u64 >> 5)
+ CMDQ_BASE::LOG2SIZE.val(cmdqs_log2 as _),
);

self.regs()
.CMDQ_PROD
.write(CMDQ_PROD::WR.val(self.cmd_queue.prod_value()));
self.regs()
.CMDQ_CONS
.write(CMDQ_CONS::RD.val(self.cmd_queue.cons_value()));

self.stream_table_init();

self.enable();

}

fn enable(&mut self) {
Expand All @@ -147,17 +138,32 @@ impl<H: PagingHandler> SMMUv3<H> {
+ CR1::QUEUE_SH::InnerShareable,
);

self.regs().CR0.write(CR0::SMMUEN::Enable);

const ARM_SMMU_SYNC_TIMEOUT: usize = 0x1000000;
self.regs().CR2.write(CR2::VALID::defaul);
self.regs()
.CR0
.write(CR0::SMMUEN::Enable + CR0::CMDQEN::Enable);

for _timeout in 0..ARM_SMMU_SYNC_TIMEOUT {
if self.regs().CR0ACK.is_set(CR0ACK::SMMUEN) {
if self.regs().CR0ACK.is_set(CR0ACK::SMMUEN)
&& self.regs().CR0ACK.is_set(CR0ACK::CMDQEN)
{
info!("SMMUv3 enabled");
return;
}
}
error!("CR0 write err!");
error!("SMMUv3 enabled timeout");
}

pub fn stream_table_init(&mut self) {
self.stream_table.init(H::SID_BITS_SET);

self.regs().STRTAB_BASE_CFG.write(
STRTAB_BASE_CFG::FMT::Linear + STRTAB_BASE_CFG::LOG2SIZE.val(H::SID_BITS_SET),
);
self.regs().STRTAB_BASE.write(
STRTAB_BASE::RA::Enable
+ STRTAB_BASE::ADDR.val(self.stream_table.base_addr().as_usize() as u64 >> 6),
);
}

/// Get the SMMUv3 registers.
Expand All @@ -182,32 +188,33 @@ impl<H: PagingHandler> SMMUv3<H> {
while self.cmd_queue.full() {
warn!("Command queue is full, try consuming");
let cmdq_cons = self.regs().CMDQ_CONS.get();
if cmdq_cons & CMDQ_CONS::ERR.mask != 0 {
if cmdq_cons & (CMDQ_CONS::ERR.mask << CMDQ_CONS::ERR.shift) != 0 {
warn!(
"CMDQ_CONS ERR code {}",
(cmdq_cons & CMDQ_CONS::ERR.mask) >> CMDQ_CONS::ERR.shift
(cmdq_cons & (CMDQ_CONS::ERR.mask << CMDQ_CONS::ERR.shift)) >> CMDQ_CONS::ERR.shift
);
}

let cons_value = cmdq_cons & CMDQ_CONS::RD.mask;
let cons_value = cmdq_cons & (CMDQ_CONS::RD.mask << CMDQ_CONS::RD.shift);
self.cmd_queue.set_cons_value(cons_value);
}

self.cmd_queue.cmd_insert(cmd);
self.cmd_queue.cmd_insert(cmd.clone());

self.regs()
.CMDQ_PROD
.write(CMDQ_PROD::WR.val(self.cmd_queue.prod_value()));

while !self.cmd_queue.empty() {
debug!("Command queue is not empty, consuming");
trace!("Command queue is not empty, consuming");
let cmdq_cons = self.regs().CMDQ_CONS.get();
if cmdq_cons & CMDQ_CONS::ERR.mask != 0 {
if cmdq_cons & (CMDQ_CONS::ERR.mask << CMDQ_CONS::ERR.shift) != 0 {
warn!(
"CMDQ_CONS ERR code {}",
(cmdq_cons & CMDQ_CONS::ERR.mask) >> CMDQ_CONS::ERR.shift
(cmdq_cons & (CMDQ_CONS::ERR.mask << CMDQ_CONS::ERR.shift)) >> CMDQ_CONS::ERR.shift
);
}
let cons_value = cmdq_cons & CMDQ_CONS::RD.mask;
let cons_value = cmdq_cons & (CMDQ_CONS::RD.mask << CMDQ_CONS::RD.shift);
self.cmd_queue.set_cons_value(cons_value);
}

Expand All @@ -219,9 +226,18 @@ impl<H: PagingHandler> SMMUv3<H> {
/// Add a passthrough device, updating the stream table.
pub fn add_device(&mut self, sid: usize, vmid: usize, s2pt_base: PhysAddr) {
let cmd = Cmd::cmd_cfgi_ste(sid as u32);
self.add_cmd(cmd, true);

self.stream_table
.set_s2_translated_ste(sid, vmid, s2pt_base);

self.add_cmd(cmd, true);

//prefetch can optimize the initial use STE lookup time
self.cmd_prefetch(sid);
}

pub fn cmd_prefetch(&mut self, sid: usize) {
let cmd = Cmd::cmd_prefetch_config(sid as u32);
self.add_cmd(cmd, true);
}
}
Loading
Loading