Code Monkey home page Code Monkey logo

Comments (2)

jinbpark avatar jinbpark commented on July 20, 2024 1

In-depth analysis

in short

This is a software bug in our rmm codes, precisely uninitialized use of variables.

the surface reason

At times, RMM starts a realm execution with x0 of "zero". In kernel boot code, "x0" is used to convey the address of FDT.
So, when it is "zero", linux's fixmap_remap_fdt() returns an error so kernel just falls in an infinite loop.

// realm-linux kernel code
void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
{
    // in error cases, dt_phys (the address of FDT) is zero, so this function returns NULL.
    if (!dt_phys || dt_phys % MIN_FDT_ALIGN)
        return NULL;
}

// islet rmm log (I added for debugging)
[JB] vcpu x0: 0x0, x1: 0xA, elr: 0x80000000  --> when we try to jump to Realm's kernel start, x0 contains "zero" for some reason.

the underlying reason

This is a bug of uninitialized use that has to do with "REC_CREATE and REC_ENTER". Let's walk through what actually went wrong.

  1. REC_CREATE: uninitialized variables coming up
// (1) nw-linux kernel code
// host creates a page (not initialized page by zero) for rec, and give ISLET RMM the physical address of the page.
rec->rec_page = (void *)__get_free_page(GFP_KERNEL);    

// (2) Rec initialization in ISLET RMM
// The below new() function is used to initialize the rec page that the host provided.
// But, we by accident do not initialize "Rec.ripas".
// Even worse, the rec page is not initialized by the host kernel as well.
// So, "Rec.ripas" may contain an unexpected value. In most cases, it will be "zero", but sometimes it may be not "zero".
This is the reason why this bug is triggered sporadically.
pub struct Rec {
    pub rd: &'static Rd,
    vcpuid: usize,
    ripas: Ripas,
}
pub unsafe fn new(
        rec_addr: usize,
        vcpuid: usize,
        rd: &'static Rd,
    ) -> ManuallyDrop<&'static mut Rec> {
        let rec: &mut Rec = &mut *(rec_addr as *mut Rec);
        rec.vcpuid = vcpuid;
        rec.rd = rd;
        ManuallyDrop::new(rec)
    }
  1. REC_ENTER: set "zero" for FDT address
listen!(mainloop, rmi::REC_ENTER, |arg, ret, rmm| {
    ....
    let ripas = rec.ripas_addr();   // (1) we try to read "Rec.ripas" that is uninitialized variable.
     if ripas > 0 {   // (2) if the uninitialized variable happens to be a larger than 0,
         rmi.set_reg(rec.rd.id(), rec.id(), 0, 0)?;   // (3) we set "zero" to "x0" register for the vcpu.
         rmi.set_reg(rec.rd.id(), rec.id(), 1, ripas)?;
         rec.set_ripas(0, 0, 0, 0);
      }
    ....
     // (4) when we run a realm with the above vcpu context, it faces the mentioned FDT error!
     match rmi.run(rec.rd.id(), rec.id(), 0) {
}

fix

initialize all variables of Rec properly in the first place
++ we need to introduce a safe interface to handle Rd/Rec/Data/... like we're doing with Params via host_pointer.
that interface can prevent this kind of bug from happening.

pub unsafe fn new(
        rec_addr: usize,
        vcpuid: usize,
        rd: &'static Rd,
    ) -> ManuallyDrop<&'static mut Rec> {
        let rec: &mut Rec = &mut *(rec_addr as *mut Rec);
        rec.vcpuid = vcpuid;
        rec.rd = rd;
        rec.set_ripas(0, 0, 0, 0);   // This is a fix!
        ManuallyDrop::new(rec)
    }

from islet.

bokdeuk-jeong avatar bokdeuk-jeong commented on July 20, 2024

Update: There's still the symptom even with the single core configuration. The workaround above doesn't take any effect.

from islet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.