Comments (14)
I've finished my initial thoughts on this and documented it here.
From talking to Brendan, it seems like the best path forward is to try to work with the EXEC implementation we have, since it's not explicitly broken but inefficient and not exactly compliant (we launch a new cage instead of doing what exec natively does).
My gameplan right now is to consolidate the threading infrastructure to give us the flexibility to set up threads and their relationships correctly, and then debug/clarify the mess that is wait from there.
from lind_project.
I've "finished" the initial stage of my refactor, which unified the thread pipelines. I now can run ls through bash to completion without using the hack I threw in before.
It's still hanging in wait with multiple jobs, which I mostly expected. Something like bash ls | cat
hangs infinitely, but it after it gives the correct output.
Unfortunately, I earlier thought this meant pipes were working, but after further debugging it seems that ls process is running normally and printing via stdout, and then cat is running separately. This could point to the cause of the hang, but something like bash ls | date
also ends up in an infinite wait loop (after printing out both ls and dates outputs to stdout).
from lind_project.
My next step is to create a minimal ls | cat
using fork, dup2, pipe, exec, and wait to try to ascertain if the issues are in any of those underlying syscalls or if they're in edge cases presented by bash.
from lind_project.
I found that it's writing to stdout and not to the pipe like it's expected, because it thinks it should be writing in Cage 0, which hasn't had dup2 configure the file descriptors properly.
It seems that the cageid's for stdin/stdout/stderr do not get setup properly because of an alternate route NaCl takes to configure them withe the host system. In the case of a dup followed by an exec, the cage id's for these descriptors get overwritten to zero, resulting in unexpected behavior.
I should put together a fix for this shortly.
from lind_project.
I fixed the above stdout descriptor/cage issue. ls
is now properly writing into the pipe in bash/the basic example.
From that I uncovered two new issues:
- In the basic example, the
ls
output is being piped, but Lind terminates beforecat
can even run.cat
exec's but all of Lind terminates early in the loader. I need to debug this further but something is telling the initial thread that the program has finished running. - There seems to still be some dup problems when I'm running bash, this time with the stdin of
cat
not duplicating. Debug so far indicates that the problem is happening in SafePOSIX.
At this point I'm not sure if the "wait" problems I've been having are actually a combo of these in disguise. I'm going to debug these first and see the results.
from lind_project.
I've figured out the root of problem 2. The cagetable currently sets unallocated hostfd's (this is the number NaCl uses for desc identification) to zero, but that's also the number set for stdin. This surprisingly doesn't interfere elsewhere, but gives us problems when we try to use dup with stdin.
from lind_project.
Problem 1 is a little hard to track down. The output isn't deterministic and actually runs to completion (at least runs into problem 2 instead of shutting down in the loader) when I record with RR.
In the past when I've seen non-determinism it's usually been an issue with cageid's and the RPC, so I'll start there.
from lind_project.
I'm realizing the changes I'm making on top of changing the threading functions are starting to pile up.
I'm going to checkout a branch with just the threading changes and queue up a PR for that to separate these out.
from lind_project.
I'm going to use this issue to concentrate on threads, which is still problem one from above:
In the basic example, the ls output is being piped, but Lind terminates before cat can even run. cat exec's but all of Lind terminates early in the loader. I need to debug this further but something is telling the initial thread that the program has finished running.
from lind_project.
I was able to fix the sample programs problem of early termination. This was a mistake on my end of not implementing wait in the user code correctly.
My sample and bash are now getting stuck in wait in the same way. It's easiest to talk about this by naming the threads/cages that are spawned:
Cage 1 - Bash
Cage 2 - First Fork (for ls)
Cage 3 - Second Fork (for cat)
Cage 4 - ls (exec'ed)
Cage 5 - cat (exec'ed)
Both bash and the sample are getting stuck when tearing down Cage 3. At this point cages 2, 4, and 5 have all all exited and have gone through the teardown process. This means that both ls and cat have executed, but for some reason the forked cage is getting stuck after the exec'ed cage has run its course.
The line where its getting stuck is here in the teardown process.
This is somewhat due to our weird "new cage for exec" infrastructure, but I believe that this bug is probably pretty fixable. Will continue to dig into this.
from lind_project.
I believe I've found the (well at least one, hopefully only one) problem with wait.
When we're creating new nap's here its storing info about the new nap's in both our "master" and "parent" nap's as well. However the index for which these are stored is a simple "num_children" counter that's being incremented or decremented. However, the manner which these cages are constructed and destructed isn't linear so in this case it's overwriting a valid cage, and because of that cages later absence it gets stuck while waiting on it.
from lind_project.
It seems pretty redundant here that we have an array containing children ids when we already have a dynamic array of ptr's to all the children. I think there's a pretty clear refactor here to get rid of the id's array entirely instead of attempting to fix the bug. My feeling is this will be cleaner and won't take much more time than fixing the bug would.
from lind_project.
I removed the ID array and set it so wait relies solely on the dynamic array of children pointers we have.
This has fixed the wait issue! And Bash is able to run simple pipe scripts.
from lind_project.
This was complete with the wait() merge.
from lind_project.
Related Issues (20)
- Implement fsync() HOT 3
- Replace rustposix select()/poll()/epoll() DashSets with bitmaps HOT 1
- Log Based FS Issue HOT 3
- NaCl MemLeaks HOT 1
- Accept-Inet Rust Panic! HOT 5
- Refactor open() to remove TOCTTOU HOT 1
- Postgres Deadlock with Multiple Clients
- More Strace Fixes HOT 2
- Lind Personas Issue Tracker
- Update Lind Wikis HOT 2
- Signal Documentation
- Set environment variables HOT 1
- Failing Signal Tests HOT 2
- Let's isolate xz from sshd
- Create "encasement" library for new 3i interfaces HOT 9
- Create Lind-Repy image
- Fix Lind-Project repos
- Fix gunicorn error HOT 1
- Use rust's clippy utility... HOT 2
- Add careful and miri to our usual Rust development processes...
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lind_project.