Code Monkey home page Code Monkey logo

Comments (5)

landley avatar landley commented on June 24, 2024

I prefer dealing with this sort of thing through the mailing list, I
only really use github as repository hosting.

On 02/25/2016 01:31 AM, drinkcat wrote:

We use toybox-0.7.0 as part of the Chromium OS project, and sometimes
hit an issue when building it on our automated builders (see this issue
https://bugs.chromium.org/p/chromium/issues/detail?id=584542):

|toybox-0.7.0: armv7a-cros-linux-gnueabi-gcc -O2 -O2 -pipe -march=armv7-a
-mtune=cortex-a15 -mfpu=neon -mfloat-abi=hard -g -fno-exceptions
-fno-unwind-tables -fno-asynchronous-unwind-tables -clang-syntax
-funsigned-char -Wno-string-plus-int -I . -Os -ffunction-sections
-fdata-sections -fno-asynchronous-unwind-tables -fno-strict-aliasing -c
toys/posix/tail.c -o generated/obj/tail.o toybox-0.7.0: scripts/make.sh:
line 270: wait: pid 8477 is not a child of this shell toybox-0.7.0:

Hmmm... PID wrap, maybe?

Makefile:19: recipe for target 'toybox' failed toybox-0.7.0: make: ***
[toybox] Error 1 toybox-0.7.0: * ERROR:
sys-apps/toybox-0.7.0::portage-stable failed (compile phase):
toybox-0.7.0: * emake failed |

For some reason we cannot reproduce locally (it only happens on these
builders that are compiling many other packages at the same time).

Neither can I.

Maybe I could do something with a restricted process ID range forcing
quick wrapping, but this seems more a bash problem than my script, so a
workaround's more likely than a proper fix. (I wonder if I can
distinguish this error from a compiler error? Hmmm... 127 is nonexistent
process or job, except how to distinguish "gcc not in $PATH" from "PID
we waited on went bye-bye and took its exit status with it"?)

Looking at the code (|script/make.sh|), we are wondering about your use
of |$(jobs -rp)|. Wouldn't it be more correct to add jobs to PENDING
using |$!| right after you launch the job (|do_loudly|)?

If you think that'll help, I'm happy to give it a try, sure.

Thanks,

Rob

from toybox.

drinkcat avatar drinkcat commented on June 24, 2024

On Fri, Feb 26, 2016 at 1:53 PM, Rob Landley [email protected]
wrote:

I prefer dealing with this sort of thing through the mailing list, I
only really use github as repository hosting.

Oh, sorry, adding the list to this reply.

On 02/25/2016 01:31 AM, drinkcat wrote:

We use toybox-0.7.0 as part of the Chromium OS project, and sometimes
hit an issue when building it on our automated builders (see this issue
https://bugs.chromium.org/p/chromium/issues/detail?id=584542):

|toybox-0.7.0: armv7a-cros-linux-gnueabi-gcc -O2 -O2 -pipe -march=armv7-a
-mtune=cortex-a15 -mfpu=neon -mfloat-abi=hard -g -fno-exceptions
-fno-unwind-tables -fno-asynchronous-unwind-tables -clang-syntax
-funsigned-char -Wno-string-plus-int -I . -Os -ffunction-sections
-fdata-sections -fno-asynchronous-unwind-tables -fno-strict-aliasing -c
toys/posix/tail.c -o generated/obj/tail.o toybox-0.7.0: scripts/make.sh:
line 270: wait: pid 8477 is not a child of this shell toybox-0.7.0:

Hmmm... PID wrap, maybe?

That's what we were wondering about... The builder is building a lot of
other packages at the same time, including Chromium, so it's not unlikely
that the PID space is saturated... Also, the builder retries after the
first failure, and the second try always works (probably when the builder
is less busy...)

Looking at the code (|script/make.sh|), we are wondering about your use
of |$(jobs -rp)|. Wouldn't it be more correct to add jobs to PENDING
using |$!| right after you launch the job (|do_loudly|)?

If you think that'll help, I'm happy to give it a try, sure.

I have a commit ready here, that appears to fix the problem:
drinkcat@4c70562

It's a little less aggressive at parallelizing, as it always waits for the
first PID if PENDING is full (instead of refreshing the PENDING list every
time)...

I guess that you prefer I send the patch to the list? Or is a github PR
fine too?

Thanks!

Best,

Nicolas

from toybox.

drinkcat avatar drinkcat commented on June 24, 2024

On 02/26/2016 12:31 AM, Nicolas Boichat wrote:

On Fri, Feb 26, 2016 at 1:53 PM, Rob Landley <[email protected]
On 02/25/2016 01:31 AM, drinkcat wrote:
> We use toybox-0.7.0 as part of the Chromium OS project,

P.S. Yay!

> and sometimes
> hit an issue when building it on our automated builders (see this
issue
> <https://bugs.chromium.org/p/chromium/issues/detail?id=584542>):
>
> |toybox-0.7.0: armv7a-cros-linux-gnueabi-gcc -O2 -O2 -pipe
-march=armv7-a
> -mtune=cortex-a15 -mfpu=neon -mfloat-abi=hard -g -fno-exceptions
> -fno-unwind-tables -fno-asynchronous-unwind-tables -clang-syntax
> -funsigned-char -Wno-string-plus-int -I . -Os -ffunction-sections
> -fdata-sections -fno-asynchronous-unwind-tables
-fno-strict-aliasing -c
> toys/posix/tail.c -o generated/obj/tail.o toybox-0.7.0:
scripts/make.sh:
> line 270: wait: pid 8477 is not a child of this shell toybox-0.7.0:

Hmmm... PID wrap, maybe?

That's what we were wondering about... The builder is building a lot of
other packages at the same time, including Chromium, so it's not
unlikely that the PID space is saturated... Also, the builder retries
after the first failure, and the second try always works (probably when
the builder is less busy...)

Possibly the OS is killing zombies if it wants to reuse that PID before
the zombie is reaped? (Which would be a horrible heuristic because
process exit could happen after a long runtime but right before a new fork.)

Or maybe it's doing so if it there are no more free PIDs, instead of
fork failing?

In either case, moving to $! wouldn't fix it. But that also wouldn't
explain why only bash was seeing the problem...

It's an interesting bug and I'd be interested in tracking it down if I
was willing to get sucked into debugging GPLv3 bash. (GPLv2 bash I spent
days tracking down weirdness, ala:

The initial problem:
http://landley.net/notes-2011.html#24-08-2011

Mentioned in passing:
http://landley.net/notes-2011.html#26-08-2011
http://landley.net/notes-2011.html#28-08-2011

Deep dig:
http://landley.net/notes-2011.html#02-09-2011
http://landley.net/notes-2011.html#03-09-2011
http://landley.net/notes-2011.html#04-09-2011

And finally finding it:
http://landley.net/notes-2011.html#05-09-2011

Yes, that's me happily digging through libc, kernel, and back into a
userspace program to find a problem. But if a GPLv3 program is involved,
"it's broken, let's replace it".

> Looking at the code (|script/make.sh|), we are wondering about
your use
> of |$(jobs -rp)|. Wouldn't it be more correct to add jobs to PENDING
> using |$!| right after you launch the job (|do_loudly|)?

If you think that'll help, I'm happy to give it a try, sure.

I have a commit ready here, that appears to fix the problem:
drinkcat@4c70562

I pushed a change last night based on your $! suggestion, did that fix
it? (Your patch is using ${%%} to filter, which is interesting. I
couldn't make ${//} work right but maybe that could replace my sed
invocation? Trying to get the number of execs in the dispatch/monitoring
cycle down as small as possible. Then again once it can build under a
toybox shell then it's just a fork() and not an exec, which is cheaper.
Eh, worry about it later...)

It's a little less aggressive at parallelizing, as it always waits for
the first PID if PENDING is full (instead of refreshing the PENDING list
every time)...

So's the one I did last night. I should poke around on my 8-way machine
and see how it's doing keeping the cpus busy...

I guess that you prefer I send the patch to the list? Or is a github PR
fine too?

What would be really nice is if github gave me a button to get the
"git format-patch" version of the patch at the above URL. But of course
they don't do that, why would they do that?

When github emails me a pull requests I can wget and "git am" from
there, so it's usable. (It's then up to the submitter to close said
request, but having a list of old irrelevant pull requests I've already
dealt with one way or another is github's problem, as far as I'm concerned.)

Posting them to the list gives other people the chance to chime in, but
I think we covered that here. :)

Thanks,

Rob

from toybox.

drinkcat avatar drinkcat commented on June 24, 2024

Followed up on list.

from toybox.

drinkcat avatar drinkcat commented on June 24, 2024

Also, e17fbf1 seems to fix it.

from toybox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.