Code Monkey home page Code Monkey logo

Comments (19)

shadowhand avatar shadowhand commented on September 26, 2024

The same error happens with god 0.7.22 as well.

from god.

cmelbye avatar cmelbye commented on September 26, 2024

It looks like God is no longer using that "critical=" method of Thread which no longer exists in Ruby 1.9, so it might work now.

from god.

eric avatar eric commented on September 26, 2024

god 0.8 has removed the usage of Thread.critical=.

I believe there may be a couple other problems that need to be solved before 1.9 works, though.

from god.

geoffgarside avatar geoffgarside commented on September 26, 2024

I'm getting this with 0.8.0 on Ruby 1.9.1

    Unhandled exception in driver loop - (NotImplementedError): timeout is not implemented yet 
    /usr/local/lib/ruby/1.9/monitor.rb:92:in `wait'
    /usr/local/lib/ruby19/gems/1.9/gems/god-0.8.0/lib/god/driver.rb:88:in `block in pop'
    /usr/local/lib/ruby/1.9/monitor.rb:190:in `mon_synchronize'
    /usr/local/lib/ruby19/gems/1.9/gems/god-0.8.0/lib/god/driver.rb:82:in `pop' 
    /usr/local/lib/ruby19/gems/1.9/gems/god-0.8.0/lib/god/driver.rb:146:in `block (2 levels) in initialize' 
    /usr/local/lib/ruby19/gems/1.9/gems/god-0.8.0/lib/god/driver.rb:144:in `loop' 
    /usr/local/lib/ruby19/gems/1.9/gems/god-0.8.0/lib/god/driver.rb:144:in `block in initialize'

I've not had a chance to dig into it though, but I'll have a look over the next few days.

from god.

igrigorik avatar igrigorik commented on September 26, 2024

bump

I think this should be promoted to critical...

from god.

eric avatar eric commented on September 26, 2024

igrigorik: Did the fix you tried end up working?

from god.

igrigorik avatar igrigorik commented on September 26, 2024

Partially. I installed your branch which introduces compat19.rb + grabbed thread.rb from latest head of ruby 1.9 to make it work. That solved the "driver loop" exception show above, but there are other bugs still.

Namely, god appears to hang indefinitely when it tries to fork a new process: I see the child that it's trying to spawn in the tree, and it appears to be stuck in that state. (Doesn't happen everytime & unfortunately don't know the exact conditions under which this happens)

from god.

eric avatar eric commented on September 26, 2024

Could you attach gdb to the process and give the output of running "where"?

That's odd that it would be hanging there — there isn't a lot that it does before running exec().

from god.

igrigorik avatar igrigorik commented on September 26, 2024

Attaching to the god process itself yields:

(gdb) where
#0  0x00002b9122f581ad in pthread_cond_destroy@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000051622e in terminate_atfork_i (key=, val=, current_th=)
    at thread_pthread.c:178
#2  0x00000000004b00b9 in st_foreach (table=0x7dc8f0, func=0x516120 , arg=23970512) at st.c:708
#3  0x0000000000513d88 in rb_thread_atfork () at thread.c:2649
#4  0x000000000046f60e in rb_f_fork (obj=) at process.c:2618
#5  0x00000000004fedef in vm_call_cfunc (th=0x16dc2d0, reg_cfp=0x2aaaae83dc48, num=0, recv=16886520, blockptr=0x2aaaae83dc70, flag=202, me=0x8bbdd0)
    at vm_insnhelper.c:385
#6  0x0000000000511b43 in vm_call_method (th=0x16dc2d0, cfp=dwarf2_read_address: Corrupted DWARF expression.
) at vm_insnhelper.c:510
#7  0x00000000005045d6 in vm_exec_core (th=0x16dc2d0, initial=) at insns.def:994
#8  0x0000000000509a04 in vm_exec (th=0x16dd5b8) at vm.c:1099
#9  0x000000000050a0c3 in invoke_block_from_c (th=0x16dc2d0, block=0x2aaaae83df30, self=16897880, argc=0, argv=0x0, blockptr=0x0, cref=0x0)
    at vm.c:543
#10 0x000000000050a948 in loop_i () at vm.c:573
#11 0x0000000000416ae9 in rb_rescue2 (b_proc=0x50a910 , data1=0, r_proc=0, data2=0) at eval.c:574
#12 0x00000000004fe739 in rb_f_loop (self=16897880) at vm_eval.c:605
#13 0x00000000004fedef in vm_call_cfunc (th=0x16dc2d0, reg_cfp=0x2aaaae83df08, num=0, recv=16897880, blockptr=0x2aaaae83df30, flag=202, me=0x864210)
    at vm_insnhelper.c:385
#14 0x0000000000511b43 in vm_call_method (th=0x16dc2d0, cfp=dwarf2_read_address: Corrupted DWARF expression.
) at vm_insnhelper.c:510
#15 0x00000000005045d6 in vm_exec_core (th=0x16dc2d0, initial=) at insns.def:994
#16 0x0000000000509a04 in vm_exec (th=0x16dd5b8) at vm.c:1099
#17 0x000000000050a0c3 in invoke_block_from_c (th=0x16dc2d0, block=0x142fd40, self=16897880, argc=0, argv=0x0, blockptr=0x0, cref=0x0) at vm.c:543
#18 0x000000000050a5d2 in rb_vm_invoke_proc (th=0x16dc2d0, proc=0x142fd40, self=16897880, argc=0, argv=0x101ac70, blockptr=0x0) at vm.c:590
#19 0x00000000005197d5 in thread_start_func_2 (th=0x16dc2d0, stack_start=0x40613120) at thread.c:387
#20 0x00000000005199ee in thread_start_func_1 (th_ptr=0x16dd5b8) at thread_pthread.c:351
#21 0x00002b9122f541b5 in start_thread () from /lib64/libpthread.so.0
#22 0x00002b9123afa36d in clone () from /lib64/libc.so.6
#23 0x0000000000000000 in ?? ()

Also, strace:

Process 10887 attached - interrupt to quit
futex(0x16dd5b8, FUTEX_WAIT, 2, NULL 
Process 10887 detached

When it's in this state, the process is completely unresponsive.. god log / status, etc., all return "server not available".

from god.

eric avatar eric commented on September 26, 2024

I've committed another method to remove the need to grab thread.rb. It is available here:

http://github.com/eric/god/tree/condition-variable-1.9-support

I'm still trying to figure out why it would be hanging here.

from god.

igrigorik avatar igrigorik commented on September 26, 2024

Eric, don't think that solved it. I'm trying to find the actual sequence that reproduces the error, but so far this it the pattern I'm seeing: startups fine, loads all the processes it should. Then I start randomly killing processes (kill -9) to get god to bring them back up. Works well for the most part, except every once in a while, it get's stuck. In ps I can see that god is trying to fork a process, but it's stuck in that state.. Below is the output of god process itself, and the child that's trying to fork:

god process itself..

God process itself:
-------------------

(gdb) where
#0  0x00002aec325a1376 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000000516561 in native_sleep (th=0x7dc500, tv=0x0) at thread_pthread.c:618
#2  0x0000000000519edc in thread_join_sleep (arg=) at thread.c:770
#3  0x0000000000416714 in rb_ensure (b_proc=0x519e80 , data1=140737486727824, e_proc=0x513170 , 
    data2=140737486727824) at eval.c:671
#4  0x0000000000514c7c in thread_join (target_th=0x1830c70, delay=1e+30) at thread.c:637
#5  0x0000000000514da2 in thread_join_m (argc=, argv=, self=) at thread.c:718
#6  0x00000000004fedef in vm_call_cfunc (th=0x7dc500, reg_cfp=0x2aec324a5a90, num=0, recv=15750920, blockptr=0x0, flag=0, me=0x8d9b60)
    at vm_insnhelper.c:385
#7  0x0000000000511b43 in vm_call_method (th=0x7dc500, cfp=dwarf2_read_address: Corrupted DWARF expression.
) at vm_insnhelper.c:510
#8  0x00000000005045d6 in vm_exec_core (th=0x7dc500, initial=) at insns.def:994
#9  0x0000000000509a04 in vm_exec (th=0x7dc594) at vm.c:1099
#10 0x000000000050a0c3 in invoke_block_from_c (th=0x7dc500, block=0x1619f80, self=8631080, argc=0, argv=0x0, blockptr=0x0, cref=0x0) at vm.c:543
#11 0x000000000050a5d2 in rb_vm_invoke_proc (th=0x7dc500, proc=0x1619f80, self=8631080, argc=0, argv=0x10456a0, blockptr=0x0) at vm.c:590
#12 0x0000000000418ced in rb_exec_end_proc () at eval_jump.c:134
#13 0x0000000000418df4 in ruby_finalize_0 () at eval.c:107
#14 0x0000000000418ea9 in ruby_cleanup (ex=0) at eval.c:142
#15 0x0000000000419209 in ruby_stop (ex=8242580) at eval.c:217
#16 0x000000000046f65d in rb_f_fork (obj=) at process.c:2623
#17 0x00000000004fedef in vm_call_cfunc (th=0x7dc500, reg_cfp=0x2aec324a5c48, num=0, recv=16884920, blockptr=0x2aec324a5c70, flag=0, me=0x8bbdd0)
    at vm_insnhelper.c:385
#18 0x0000000000511b43 in vm_call_method (th=0x7dc500, cfp=dwarf2_read_address: Corrupted DWARF expression.
) at vm_insnhelper.c:510
#19 0x00000000005045d6 in vm_exec_core (th=0x7dc500, initial=) at insns.def:994
#20 0x0000000000509a04 in vm_exec (th=0x7dc594) at vm.c:1099
#21 0x000000000050be7b in vm_call0 (th=0x7dc500, recv=16884920, id=448, argc=1, argv=0x2aec323a60b0, me=0x1217660) at vm_eval.c:62
#22 0x00000000005105cd in rb_funcall2 (recv=16884920, mid=448, argc=1, argv=0x2aec323a60b0) at vm_eval.c:266
#23 0x0000000000449402 in rb_class_new_instance (argc=1, argv=0x2aec323a60b0, klass=) at object.c:1494
#24 0x00000000004fedef in vm_call_cfunc (th=0x7dc500, reg_cfp=0x2aec324a5e00, num=1, recv=16885080, blockptr=0x0, flag=0, me=0x84e330)
    at vm_insnhelper.c:385
#25 0x0000000000511b43 in vm_call_method (th=0x7dc500, cfp=dwarf2_read_address: Corrupted DWARF expression.
) at vm_insnhelper.c:510
#26 0x00000000005045d6 in vm_exec_core (th=0x7dc500, initial=) at insns.def:994
#27 0x0000000000509a04 in vm_exec (th=0x7dc594) at vm.c:1099
#28 0x0000000000509d71 in rb_iseq_eval (iseqval=16382880) at vm.c:1296
#29 0x000000000054e875 in rb_load_internal (fname=16492120, wrap=) at load.c:293
#30 0x000000000054e9cc in rb_f_load (argc=, argv=) at load.c:366
#31 0x00000000004fedef in vm_call_cfunc (th=0x7dc500, reg_cfp=0x2aec324a5f08, num=1, recv=8631080, blockptr=0x0, flag=0, me=0x8c61f0)
    at vm_insnhelper.c:385
#32 0x0000000000511b43 in vm_call_method (th=0x7dc500, cfp=dwarf2_read_address: Corrupted DWARF expression.
) at vm_insnhelper.c:510
#33 0x00000000005045d6 in vm_exec_core (th=0x7dc500, initial=) at insns.def:994

child process:

(gdb) where
#0  0x00002aec325a11ad in pthread_cond_destroy@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000000000051622e in terminate_atfork_i (key=, val=, current_th=)
    at thread_pthread.c:178
#2  0x00000000004b00b9 in st_foreach (table=0x7dc8f0, func=0x516120 , arg=23356528) at st.c:708
#3  0x0000000000513d88 in rb_thread_atfork () at thread.c:2649
#4  0x000000000046f60e in rb_f_fork (obj=) at process.c:2618
#5  0x00000000004fedef in vm_call_cfunc (th=0x1646470, reg_cfp=0x2aaaae63bb98, num=0, recv=17068080, blockptr=0x2aaaae63bbc0, flag=202, me=0x8bbdd0)
    at vm_insnhelper.c:385
#6  0x0000000000511b43 in vm_call_method (th=0x1646470, cfp=dwarf2_read_address: Corrupted DWARF expression.
) at vm_insnhelper.c:510
#7  0x00000000005045d6 in vm_exec_core (th=0x1646470, initial=) at insns.def:994
#8  0x0000000000509a04 in vm_exec (th=0x1573918) at vm.c:1099
#9  0x000000000050a0c3 in invoke_block_from_c (th=0x1646470, block=0x2aaaae63bf30, self=17092960, argc=0, argv=0x0, blockptr=0x0, cref=0x0)
    at vm.c:543
#10 0x000000000050a948 in loop_i () at vm.c:573
#11 0x0000000000416ae9 in rb_rescue2 (b_proc=0x50a910 , data1=0, r_proc=0, data2=0) at eval.c:574
#12 0x00000000004fe739 in rb_f_loop (self=17092960) at vm_eval.c:605
#13 0x00000000004fedef in vm_call_cfunc (th=0x1646470, reg_cfp=0x2aaaae63bf08, num=0, recv=17092960, blockptr=0x2aaaae63bf30, flag=202, me=0x864210)
    at vm_insnhelper.c:385
#14 0x0000000000511b43 in vm_call_method (th=0x1646470, cfp=dwarf2_read_address: Corrupted DWARF expression.
) at vm_insnhelper.c:510
#15 0x00000000005045d6 in vm_exec_core (th=0x1646470, initial=) at insns.def:994
#16 0x0000000000509a04 in vm_exec (th=0x1573918) at vm.c:1099
#17 0x000000000050a0c3 in invoke_block_from_c (th=0x1646470, block=0x1815bb0, self=17092960, argc=0, argv=0x0, blockptr=0x0, cref=0x0) at vm.c:543
#18 0x000000000050a5d2 in rb_vm_invoke_proc (th=0x1646470, proc=0x1815bb0, self=17092960, argc=0, argv=0x10470b8, blockptr=0x0) at vm.c:590
#19 0x00000000005197d5 in thread_start_func_2 (th=0x1646470, stack_start=0x40511120) at thread.c:387
#20 0x00000000005199ee in thread_start_func_1 (th_ptr=0x1573918) at thread_pthread.c:351
#21 0x00002aec3259d1b5 in start_thread () from /lib64/libpthread.so.0
#22 0x00002aec3314336d in clone () from /lib64/libc.so.6
#23 0x0000000000000000 in ?? ()

from god.

igrigorik avatar igrigorik commented on September 26, 2024

A few more observations: if I kill the forked process, god responds to commands, but it doesn't bring up the process on which it blocked in the first place. However, if I issue an explicit "god restart proc", it does seem to come back.

trying to figure out the actual pattern

from god.

eric avatar eric commented on September 26, 2024

This is the same issue as #8.

from god.

mojombo avatar mojombo commented on September 26, 2024

The Ruby 1.9 mutex patches are now in God 0.9.0. If anyone has additional feedback on how well it works now, I'd much appreciate it!

from god.

ambethia avatar ambethia commented on September 26, 2024

It works fine for a while, but eventually:

$ ruby --version

ruby 1.9.1p420 (2010-02-04) [x86_64-linux]

$ god --version

Version 0.9.0

$ top

top - 11:31:46 up 1 day,  1:24,  1 user,  load average: 1.35, 1.19, 1.11
Tasks:  89 total,   1 running,  88 sleeping,   0 stopped,   0 zombie
Cpu(s): 52.5%us,  0.3%sy,  0.0%ni, 47.0%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4058360k total,  2327260k used,  1731100k free,   144404k buffers
Swap: 11888060k total,        0k used, 11888060k free,  1877324k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                                         
 1107 root      20   0 87852  22m 2948 S  100  0.6   1498:55 god                                                                                                                                                                                                              
14996 deploy    20   0  174m  69m 3128 S    5  1.8   0:49.47 ruby                                                                                                                                                                                                             
  822 mongodb   20   0 4210m 1.5g 1.5g S    0 39.7   0:58.81 mongod                                                                                                                                                                                                           
 1189 www-data  20   0  414m 4908 1832 S    0  0.1   0:03.24 apache2
[...]

$ sudo strace -p 1107

Process 1107 attached - interrupt to quit
futex(0xf7c594, FUTEX_WAIT_PRIVATE, 5, NULL

$ sudo gdb --pid=1107

[...]
(gdb) where
\#0  0x00007f08fc6cf5a9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
\#1  0x00000000004e5245 in native_sleep (th=0xf7c500, tv=0x0) at thread_pthread.c:614
\#2  0x00000000004e63b1 in sleep_forever (th=0xf7c500, deadlockable=1) at thread.c:774
\#3  0x00000000004e641a in thread_join_sleep (arg=<value optimized out>) at thread.c:608
\#4  0x0000000000418adb in rb_ensure (b_proc=<value optimized out>, data1=<value optimized out>, e_proc=<value optimized out>, data2=<value optimized out>) at eval.c:680
\#5  0x00000000004e7d5c in thread_join (target_th=0x2128d90, delay=1e+30) at thread.c:641
\#6  0x00000000004e7e42 in thread_join_m (argc=<value optimized out>, argv=<value optimized out>, self=<value optimized out>) at thread.c:722
\#7  0x00000000004d1e12 in vm_call_cfunc (th=<value optimized out>, reg_cfp=<value optimized out>, num=<value optimized out>, id=<value optimized out>, oid=<value optimized out>, recv=2, klass=16421720, flag=0, mn=0xfa66b8, blockptr=0x0) at vm_insnhelper.c:384
\#8  0x00000000004dc019 in vm_call_method (th=0xf7c500, cfp=0x7f08fcaefa10, num=0, blockptr=0x213, flag=0, id=0, mn=0xfa6680, recv=22885408) at vm_insnhelper.c:517
\#9  0x00000000004d6f0c in vm_exec_core (th=0xf7c500, initial=<value optimized out>) at insns.def:999
\#10 0x00000000004d9ec2 in vm_exec (th=<value optimized out>) at vm.c:1082
\#11 0x00000000004db340 in invoke_block_from_c (th=<value optimized out>, proc=<value optimized out>, self=<value optimized out>, argc=<value optimized out>, argv=<value optimized out>, blockptr=<value optimized out>) at vm.c:526
\#12 rb_vm_invoke_proc (th=<value optimized out>, proc=<value optimized out>, self=<value optimized out>, argc=<value optimized out>, argv=<value optimized out>, blockptr=<value optimized out>) at vm.c:573
\#13 0x000000000041a960 in rb_exec_end_proc () at eval_jump.c:133
\#14 0x000000000041aa24 in ruby_finalize_0 () at eval.c:108
\#15 0x000000000041aafe in ruby_cleanup (ex=<value optimized out>) at eval.c:143
\#16 0x000000000041ad39 in ruby_stop (ex=16237972) at eval.c:218
\#17 0x0000000000461107 in rb_f_fork (obj=<value optimized out>) at process.c:2394
\#18 0x00000000004d1e12 in vm_call_cfunc (th=<value optimized out>, reg_cfp=<value optimized out>, num=<value optimized out>, id=<value optimized out>, oid=<value optimized out>, recv=2, klass=16792552, flag=8, mn=0xfbb8e8, blockptr=0x7f08fcaefc18)
    at vm_insnhelper.c:384
\#19 0x00000000004dc019 in vm_call_method (th=0xf7c500, cfp=0x7f08fcaefbf0, num=0, blockptr=0x213, flag=8, id=0, mn=0xfbb878, recv=27407576) at vm_insnhelper.c:517
\#20 0x00000000004d6f0c in vm_exec_core (th=0xf7c500, initial=<value optimized out>) at insns.def:999
\#21 0x00000000004d9ec2 in vm_exec (th=<value optimized out>) at vm.c:1082
---Type <return> to continue, or q <return> to quit---

from god.

igrigorik avatar igrigorik commented on September 26, 2024

Bump. Same issue, still there on 1.9.2 p0, another trace:
https://gist.github.com/9ccfda78c52356e74291

from god.

igrigorik avatar igrigorik commented on September 26, 2024

Good news.. we have a Ruby patch which address this. Full details:
https://gist.github.com/762807#comments

(unfortunately rubylang redmine is down atm)

from god.

dbackeus avatar dbackeus commented on September 26, 2024

Is this fix included in the new 1.9.2 release (p180)?

from god.

igrigorik avatar igrigorik commented on September 26, 2024

According to the changelog @ http://svn.ruby-lang.org/repos/ruby/tags/v1_9_2_180/ChangeLog

Unfortunately, don't see it..

http://redmine.ruby-lang.org/repositories/revision/ruby-19?rev=30272

from god.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.