Code Monkey home page Code Monkey logo

Comments (14)

rennergade avatar rennergade commented on July 17, 2024

I've finished my initial thoughts on this and documented it here.

From talking to Brendan, it seems like the best path forward is to try to work with the EXEC implementation we have, since it's not explicitly broken but inefficient and not exactly compliant (we launch a new cage instead of doing what exec natively does).

My gameplan right now is to consolidate the threading infrastructure to give us the flexibility to set up threads and their relationships correctly, and then debug/clarify the mess that is wait from there.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

I've "finished" the initial stage of my refactor, which unified the thread pipelines. I now can run ls through bash to completion without using the hack I threw in before.

It's still hanging in wait with multiple jobs, which I mostly expected. Something like bash ls | cat hangs infinitely, but it after it gives the correct output.

Unfortunately, I earlier thought this meant pipes were working, but after further debugging it seems that ls process is running normally and printing via stdout, and then cat is running separately. This could point to the cause of the hang, but something like bash ls | date also ends up in an infinite wait loop (after printing out both ls and dates outputs to stdout).

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

My next step is to create a minimal ls | cat using fork, dup2, pipe, exec, and wait to try to ascertain if the issues are in any of those underlying syscalls or if they're in edge cases presented by bash.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

I found that it's writing to stdout and not to the pipe like it's expected, because it thinks it should be writing in Cage 0, which hasn't had dup2 configure the file descriptors properly.

It seems that the cageid's for stdin/stdout/stderr do not get setup properly because of an alternate route NaCl takes to configure them withe the host system. In the case of a dup followed by an exec, the cage id's for these descriptors get overwritten to zero, resulting in unexpected behavior.

I should put together a fix for this shortly.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

I fixed the above stdout descriptor/cage issue. ls is now properly writing into the pipe in bash/the basic example.

From that I uncovered two new issues:

  1. In the basic example, the ls output is being piped, but Lind terminates before cat can even run. cat exec's but all of Lind terminates early in the loader. I need to debug this further but something is telling the initial thread that the program has finished running.
  2. There seems to still be some dup problems when I'm running bash, this time with the stdin of cat not duplicating. Debug so far indicates that the problem is happening in SafePOSIX.

At this point I'm not sure if the "wait" problems I've been having are actually a combo of these in disguise. I'm going to debug these first and see the results.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

I've figured out the root of problem 2. The cagetable currently sets unallocated hostfd's (this is the number NaCl uses for desc identification) to zero, but that's also the number set for stdin. This surprisingly doesn't interfere elsewhere, but gives us problems when we try to use dup with stdin.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

Problem 1 is a little hard to track down. The output isn't deterministic and actually runs to completion (at least runs into problem 2 instead of shutting down in the loader) when I record with RR.

In the past when I've seen non-determinism it's usually been an issue with cageid's and the RPC, so I'll start there.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

I'm realizing the changes I'm making on top of changing the threading functions are starting to pile up.

I'm going to checkout a branch with just the threading changes and queue up a PR for that to separate these out.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

I'm going to use this issue to concentrate on threads, which is still problem one from above:

In the basic example, the ls output is being piped, but Lind terminates before cat can even run. cat exec's but all of Lind terminates early in the loader. I need to debug this further but something is telling the initial thread that the program has finished running.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

I was able to fix the sample programs problem of early termination. This was a mistake on my end of not implementing wait in the user code correctly.

My sample and bash are now getting stuck in wait in the same way. It's easiest to talk about this by naming the threads/cages that are spawned:

Cage 1 - Bash
Cage 2 - First Fork (for ls)
Cage 3 - Second Fork (for cat)
Cage 4 - ls (exec'ed)
Cage 5 - cat (exec'ed)

Both bash and the sample are getting stuck when tearing down Cage 3. At this point cages 2, 4, and 5 have all all exited and have gone through the teardown process. This means that both ls and cat have executed, but for some reason the forked cage is getting stuck after the exec'ed cage has run its course.

The line where its getting stuck is here in the teardown process.

This is somewhat due to our weird "new cage for exec" infrastructure, but I believe that this bug is probably pretty fixable. Will continue to dig into this.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

I believe I've found the (well at least one, hopefully only one) problem with wait.

When we're creating new nap's here its storing info about the new nap's in both our "master" and "parent" nap's as well. However the index for which these are stored is a simple "num_children" counter that's being incremented or decremented. However, the manner which these cages are constructed and destructed isn't linear so in this case it's overwriting a valid cage, and because of that cages later absence it gets stuck while waiting on it.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

It seems pretty redundant here that we have an array containing children ids when we already have a dynamic array of ptr's to all the children. I think there's a pretty clear refactor here to get rid of the id's array entirely instead of attempting to fix the bug. My feeling is this will be cleaner and won't take much more time than fixing the bug would.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

I removed the ID array and set it so wait relies solely on the dynamic array of children pointers we have.

This has fixed the wait issue! And Bash is able to run simple pipe scripts.

from lind_project.

rennergade avatar rennergade commented on July 17, 2024

This was complete with the wait() merge.

from lind_project.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.