kostya / eye Goto Github PK

View Code? Open in Web Editor NEW

1.2K 38.0 89.0 2.14 MB

Process monitoring tool. Inspired from Bluepill and God.

License: MIT License

Ruby 99.99% Shell 0.01%

monitoring tool process-monitor daemonize

eye's People

Contributors

Stargazers

Watchers

Forkers

miv a0s halorgium djmaze mburger paviensky iancoffey newx kellym dreamfrog dbi tmeinlschmidt baweaver leikind zires jgwmaxwell soulcutter awesome wolfer nikolaymurha outstand gema-arta slumos fengb dwbutler smt116 chadfennell geekontheway apanly bonias pramodkoneru549-zz silp-com sidravic grimm26 gitinsky marshall-lee danielvartanov adone pvijayfullstack sergxiiith solidsnack josephgrossberg rinosamakanata liudongbao esse makevoid rcpj zennro cuzic alifar76 kevin-klein zfjoy520 hedgesky nuorinero fanyeren redteamcaliber princessd8251 kevinsia ilyakarol charlieyqin opt9 ilbanshee kaf-systems justin-rich millim z4ppy odderlynat lhy20062008 alexeypark platforms williamhatch noma4i drarnold jayjlawrence cjstadler waldow90 dyet92k mubarakrocky mlarraz dammer mishina2228 standardgalactic bearerpipelinetest yichenjm

eye's Issues

reload .eye file on restart?

It seems the default behavior is not to reload the .eye file with the restart command. Is the only way to do it stop monitoring then reload the config file? It might be nice to reload the config file automatically when you need to add new environment variables and such.

Can't set uid, gid with Ruby 1.9.3

I would like to be able to set uid and gid for my app that uses Ruby 1.9.3. I'm moving from Bluepill to Eye, and it would be really helpful if Eye had this functionality. Is support for 1.9.3 impossible with Eye's current structure, or could this be added? When I load eye, I get this error message: :uid not supported (use ruby >= 2.0).

depend_on breaks other checks and triggers on transition

When I add something like:

depend_on 'proxy'

It appears any other checks and transitions in the process are completely ignored, for example all of the following gets ignored:

  trigger :transition, to: :starting, do: -> {
    month = 60 * 60 * 24 * 28
    process.wait_for_condition(month, 15) do
      info "Waiting for camera #{url} to be up"
      timeout = 5 * 1_000_000 # 5 seconds in microseconts
      system("ffmpeg -loglevel warning -y -stimeout #{timeout} " \
        "-i #{url} -c:v copy -an -vframes 1 -f rawvideo /dev/null")
    end
  }

  check :rtsp, every: 30.seconds, times: 2, addr: rtsp_url

variables outside blocks?

Hello, I am trying to get eye to work. I am setting some variables at the top of the file:

RAILS_ENV = ENV['RAILS_ENV'] || 'development'
RAILS_ROOT = ENV['RAILS_ROOT'] || File.expand_path('.')

Eye.config do 
  logger File.join(RAILS_ROOT,"log","eye.log")
end

When I start eye, it complains that "/log/eye.log" does not exist. It seems that it is not picking up the variable RAILS_ROOT

This only happens in daemon mode. If I run it in the foreground, it works fine. What is wrong? Is there a better way to do this?

Best way to split .eye files for each service process?

Trying to follow the format of blue pill and god to split the .eye files into separate ones for each service.

Eye.application "production" do

ENV['RAILS_ENV'] = "production"
ENV['RAKE_ROOT'] = "/usr/local/rvm/gems/ruby-2.0.0-p247"
ENV['RAILS_ROOT'] = "/srv/ectasio/current"

#load ENV['RAILS_ROOT'] + "/config/eye/nginx.eye"
#load ENV['RAILS_ROOT'] + "/config/eye/mysql.eye"
#load ENV['RAILS_ROOT'] + "/config/eye/cumulus.eye"
#load ENV['RAILS_ROOT'] + "/config/eye/solr_jetty.eye"
load ENV['RAILS_ROOT'] + "/config/eye/sidekiq.eye"
load ENV['RAILS_ROOT'] + "/config/eye/unicorn.eye"
load ENV['RAILS_ROOT'] + "/config/eye/private_pub.eye"

end

It would be nice to set global env variables and then load each .eye file separately,
what would be the best way to do this?

Notify without restart?

Is there a way to notify without a restart or stop by observing a process? Also, is there a way to observe the general memory or cpu on the machine and send a notification. I've been scouring the code and docs since yesterday as well as making some attempts at intuition but haven't found anything so far.

Issues getting service to startup ( permission denied / process not stopped )

Dear Kostya,

Last night spend a lot of time reading docs, sources and old issues and refactored my code based on your previous comments.
However even after extensive research still having several issues getting services started with eye, hope you can help me fix those with suggestions on how to improve.

Hope someone can help fix these problems.

Managing eye with upstart

I've got eye set up for managing my Rails and Sidekiq processes, however I wanted to manage eye via Upstart to start eye on boot and monitor eye incase it dies unexpectedly.

I've tried with all three variations on the expect stanza:

no expect stanza (0 forks)
expect fork (1 fork)
expect daemon (2 forks)

And each time, Upstart ends up tracking a different PID than what eye is running on.

Following Upstarts documentation for getting the fork count (http://upstart.ubuntu.com/cookbook/#how-to-establish-fork-count), the strace log returns a value of 17, which seems excessively high.

Here's the Upstart config I'm using for reference:

description 'Eye'

start on runlevel [2]
stop on runlevel [016]

expect daemon

exec su -l -c 'eye load /home/deploy/sample.eye' deploy

respawn

Process forking is a bit new to me in Ruby, so I'm hoping someone else can shed some light on this for me.

Once it's solved, I'll gladly work up a wiki page to help document this for future users.

Thanks in advance!

rails 4 compatible?

Amazing work on this code!

eye (>= 0) ruby depends on
activesupport (~> 3.2) ruby

Could you make it support Rails 4?

trigger :transition, to: :starting blocks eye from stopping

When I run:

$ bundle exec eye q -s

It hangs forever (well, until the timeout) and when I look at eye.log, I see:

[2014-05-13 19:36:05 +0100] [recorder:52ab35a548616311d3360000:proxy] trigger(transition) Waiting for camera rtsp://username:[email protected]/media/video1 to be up
[2014-05-13 19:36:20 +0100] [recorder:52ab35a548616311d3360000:proxy] trigger(transition) Waiting for camera rtsp://username:[email protected]/media/video1 to be up
[2014-05-13 19:36:40 +0100] [recorder:52ab35a548616311d3360000:proxy] trigger(transition) Waiting for camera rtsp://username:[email protected]/media/video1 to be up

The process in question is:

process :proxy do
  trigger :transition, to: :starting, do: -> {
    month = 60 * 60 * 24 * 28
    process.wait_for_condition(month, 15) do
      info "Waiting for camera #{url} to be up"
      timeout = 5 * 1_000_000 # 5 seconds in microseconds
      system("ffmpeg -loglevel warning -y -stimeout #{timeout} " \
        "-i #{url} -c:v copy -an -vframes 1 -f rawvideo /dev/null")
    end
  }

  daemonize true
  start_command "live555ProxyServer -p #{port} -V #{url}"
  check :rtsp, every: 30.seconds, times: 2, addr: "rtsp://127.0.0.1:#{port}/proxyStream"
end

log rotation

Hi,

what's the best way to add eye's log files to e.g. logrotate? Does it support a special signal/command to release its file handle and log to a new file?

Eye::Control.process_by_name for process group?

I have a process group:

group camera_id do
  process :proxy do
    ...
  end

  process :recorder do
    trigger :transition, to: :starting, do: -> {
      p = Eye::Control.process_by_name("proxy")
      process.wait_for_condition(60, 5) do
        p ? p.state_name == :up : false
      end
    }
    ...  
  end
end

However, it seems the process_by_name method just returns a random process called "proxy" instead of returning the process for the process group. Is there anyway to scope the results to the group camera_id?

syswrite': Broken pipe (Errno::EPIPE) on unicorn (not continues or reproducible)

We get every now and then this:

/unicorn-4.6.2/lib/unicorn/http_server.rb:263:in `syswrite': Broken pipe (Errno::EPIPE)

and unicorn fails to start

doing eye stop unicorn
and then eye start unicorn fixes it

here's the info

RUBY = '/usr/bin/ruby'

rails_env = 'production'

Eye.application('rails_unicorn') do
  process('unicorn') do
    working_dir '/var/www/projects/current'

    if File.exist? File.join(working_dir, 'Gemfile')
      clear_bundler_env
      env 'BUNDLE_GEMFILE' => File.join(working_dir, 'Gemfile')
    end

    env "RAILS_ENV" => rails_env
    # unicorn requires to be `ruby` in path (for soft restart)
    env "PATH" => "#{File.dirname(RUBY)}:#{ENV['PATH']}"

    pid_file 'tmp/pids/unicorn.pid'
    stdall 'log/eye-unicorn.log'

    start_command  "/usr/local/bin/bundle exec unicorn_rails -c /var/www/projects/current/config/unicorn/#{rails_env}.rb -E #{rails_env} -D"
    stop_command  'kill -QUIT {PID}'
    restart_command  'kill -USR2 {PID}'

    # stop signals: http://unicorn.bogomips.org/SIGNALS.html
    stop_signals [:TERM, 10.seconds]

    start_timeout 30.seconds
    restart_grace 30.seconds

    monitor_children do
      stop_command "kill -QUIT {PID}"
      check :cpu, :every => 30, :below => 75, :times => 3
      check :memory, :every => 30, :below => 500.megabytes, :times => [3,5]
    end
  end
end

user is deploy and permissions are ok to unicorn socket

this process worked with bluepill without this error message
any idea?

How to monitor something started as root?

I would like to be able to run eye as my deploy user (rather than root) and be able to start nginx (or any other process that needs root privileges) and monitor that process.

Right now, using the Process.kill(0,pid) way of determining that only works for processes that are owned by my deploy user, not root. At the moment, I'm starting nginx via a sudo command that my deploy user can run.

I tried to search for solutions to do this in the issues and/or the wiki, but didn't find anything specifically related to this. Is there a best practice that I'm missing? Is the best practice to run as root?

Please update the wiki managing with upstart with the following suggestion

We found out that using the suggested upstart on the wiki is not good enough.

when you issue the stop command, the services that eye was monitoring are staying up while eye got killed by the upstart, which in turn cause the issue of trying to start services that are already up like unicorn etc...

in order to fix this and allow eye to stop all services we added some modifications (tested) to the upstart config file:

description "Eye Monitoring System"

start on runlevel [2345]
stop on runlevel [016]

expect fork
kill timeout 60 # when upstart issued a stop, send SIGTERM, wait 60 sec before sending SIGKILL

setuid deploy
setgid deploy

respawn

# ensure eye home folder is set (stores in .eye the pid, the states history, and the eye socket file
env EYE_HOME=/home/deploy

# log stdout and stderr to /var/log/upstart/eye
console log

# important for unicorn to create a socket folder & set permissions before run 
pre-start script
  mkdir -p "/var/run/unicorn"
  chown -R deploy:deploy "/var/run/unicorn"
end script

# load all eye services - upstart will monitor eye, and eye will monitor its own processes
script
  exec /usr/local/bin/eye load /etc/eye/*.eye
end script

# this section is to ensure services won't stay up while upstart kills (when issued a stop) the actual eye process

pre-stop script
  /usr/local/bin/eye stop all   # Stop all eye services
  /bin/sleep 15s # wait 15sec before issue SIGTERM to eye
end script

eye ignores pid change and overwrites again with 'invalid' old pid

Hi!

We're monitoring/controlling haproxy instances with eye. The restart_command should do a so called 'soft restart' ('/usr/sbin/haproxy -D -f /etc/haproxy/haproxy_test.conf -sf {PID}'), which works so far and haproxy itself updates (as expected) the pidfile with the new pid. But then eye ignores the new pid entry and rewrites the pid file with the old instance pid. That's really bad, because if the old haproxy instance stops (when all connections on the old instance are closed), eye will restart haproxy again and again

proxy/haproxy_test.pid) changes by itself (pid:6751) => (pid:7118), not under eye control, so ignored
08.04.2014 17:10:51 WARN  -- [haproxy_test:haproxy_test] check_alive: pid_file(/var/run/haproxy/haproxy_test.pid) changes by itself (pid:6751) => (pid:7118), not under eye control, so ignored
08.04.2014 17:10:56 WARN  -- [haproxy_test:haproxy_test] check_alive: pid_file(/var/run/haproxy/haproxy_test.pid) changes by itself (pid:6751) => (pid:7118), > 120 ago, so rewrited (even if pid_file not under eye control)

Is it possible to add an option so that eye doesn't ignore a pid change?

according eye config:

# vi: set ft=ruby :

Eye.load("/etc/eye/root/config.rb")

Eye.application "haproxy_test" do
  working_dir "/"

  process "haproxy_test" do
    start_command '/usr/sbin/haproxy -D -f /etc/haproxy/haproxy_test.conf'
    stop_command '/bin/kill -SIGUSR1 {PID}'
    restart_command '/usr/sbin/haproxy -D -f /etc/haproxy/haproxy_test.conf -sf {PID}'
    pid_file '/var/run/haproxy/haproxy_test.pid'
    stdall '/var/log/haproxy.log'
  end
end

Eye restart leads to running double unicorn master

I have a problem with eye restart ending up with two unicorn master processes. My config is very similar to the example for unicorn.eye this repo.

Using eye start and eye stop seems to work well:

deploy@apps:~$ eye start dagensdoman
command :start sent to [dagensdoman]

deploy@apps:~$ eye info
dagensdoman                
  unicorn ......................... starting

deploy@apps:~$ eye info
dagensdoman                
  unicorn ......................... up  (19:12, 0%, 75Mb, <8345>)
    child-8379 .................... up  (19:12, 0%, 71Mb, <8379>)
    child-8382 .................... up  (19:12, 0%, 71Mb, <8382>)
    child-8385 .................... up  (19:12, 0%, 71Mb, <8385>)
    child-8388 .................... up  (19:12, 0%, 71Mb, <8388>)

deploy@apps:~$ eye stop dagensdoman
command :stop sent to [dagensdoman]

deploy@apps:~$ eye info
dagensdoman                
  unicorn ......................... unmonitored (stop by user at 04 Mar 19:13)

But using eye start and then eye restart seems to end up with two master unicorn processes:

deploy@apps:~$ eye start dagensdoman
command :start sent to [dagensdoman]

deploy@apps:~$ eye info
dagensdoman                
  unicorn ......................... up  (19:15, 0%, 75Mb, <8544>)
    child-8551 .................... up  (19:15, 0%, 71Mb, <8551>)
    child-8554 .................... up  (19:15, 0%, 71Mb, <8554>)
    child-8556 .................... up  (19:15, 0%, 71Mb, <8556>)
    child-8560 .................... up  (19:15, 0%, 71Mb, <8560>)

deploy@apps:~$ eye restart dagensdoman
command :restart sent to [dagensdoman]

deploy@apps:~$ eye info
dagensdoman                
  unicorn ......................... restarting
    child-8551 .................... up  (19:15, 0%, 71Mb, <8551>)
    child-8554 .................... up  (19:15, 0%, 71Mb, <8554>)
    child-8556 .................... up  (19:15, 0%, 71Mb, <8556>)
    child-8560 .................... up  (19:15, 0%, 71Mb, <8560>)

deploy@apps:~$ eye info
dagensdoman                
  unicorn ......................... up  (19:15, 0%, 74Mb, <8544>)
    child-8551 .................... up  (19:15, 0%, 71Mb, <8551>)
    child-8554 .................... up  (19:15, 0%, 71Mb, <8554>)
    child-8556 .................... up  (19:15, 0%, 71Mb, <8556>)
    child-8560 .................... up  (19:15, 0%, 71Mb, <8560>)
    child-8626 .................... up  (19:16, 0%, 76Mb, <8626>)

deploy@apps:~$ ps x | grep unicorn
 8544 ?        Sl     0:04 unicorn master (old) -Dc ./config/unicorn.rb -E production
 8551 ?        Sl     0:00 unicorn worker[0] -Dc ./config/unicorn.rb -E production 
 8554 ?        Sl     0:00 unicorn worker[1] -Dc ./config/unicorn.rb -E production 
 8556 ?        Sl     0:00 unicorn worker[2] -Dc ./config/unicorn.rb -E production 
 8560 ?        Sl     0:00 unicorn worker[3] -Dc ./config/unicorn.rb -E production 
 8626 ?        Sl     0:04 unicorn master -Dc ./config/unicorn.rb -E production    
 8662 ?        Sl     0:00 unicorn worker[0] -Dc ./config/unicorn.rb -E production 
 8665 ?        Sl     0:00 unicorn worker[1] -Dc ./config/unicorn.rb -E production 
 8667 ?        Sl     0:00 unicorn worker[2] -Dc ./config/unicorn.rb -E production 
 8670 ?        Sl     0:00 unicorn worker[3] -Dc ./config/unicorn.rb -E production 
 8744 pts/0    S+     0:00 grep --color=auto unicorn

For some reason unicorn master (old)is still alive and the new unicorn master process 8626is picked up as a child.

Any ideas?

Thanks,
Martin

Puma working process isn't marked as up and flapping happens

I have this config

Eye.application "rails" do

    working_dir '/var/www'

    # global check for all processes
    check :cpu, :below => 90, :times => 3, :every => 30.seconds
    check :memory, :every => 30.seconds, :below => 420.megabytes, :times => 3

    notify :developers, :debug

    trigger :flapping, :times => 3, :within => 2.minute, :retry_in => 15.seconds

    #any_status = [:starting, :restarting, :up, :down, :unmonitored, :stopping]

    #trigger :transition, to: any_status, from: any_status, do: ->{ process.notify :debug, "is #{s.to_s.upcase}" }

    # Notify any transition
    trigger :transition, do: ->{ process.notify :debug, "is #{@transition.event.upcase}" }

    #start_timeout 30.seconds
    #start_grace 30.seconds
    #restart_grace 20.seconds
    #stop_grace 20.seconds  

    stop_on_delete true

    process :puma do
        pid_file "/eye/pid/puma.pid"
        stdall '/eye/log/puma.log'
        start_command "bundle exec puma"
        restart_command "kill -USR2 {{PID}}"
        stop_signals [:TERM, 5.seconds, :KILL]
        daemonize true      
    end

    process :swf do
        pid_file "/eye/pid/swf.pid"
        stdall '/eye/log/swf.log'
        start_command "bundle exec rake swf:start_workers"
        daemonize true
    end

end

And puma start, I can curl for some second and it properly servers things, but a little after it's down because Eye probably killed it. I don't know what could be wrong in the config. Any ideas? I've tried my best to scan the source and find an answer without much luck. Thanks.

'info' should show an error message for unknown processes

Hi,

eye info bla returns an empty line with error code 0. I think it would be better to output an error message with error code > 0. This would be much more useful for scripts or systems like chef or puppet to detect if a process is loaded or not (atm you have to grep for empty lines..)

Process dependencies?

Is there anyway to tell 1 process in the group to only start when another process in the group is already started?

application never starts

using eye for adhearsion using a jruby wrapper over ahn
normally app start in 5-6 seconds

but with eye I get

05.03.2014 11:55:13 INFO -- [Eye] client command: ping (0.007122302s)
05.03.2014 11:55:13 INFO -- [Eye] loading: ["/home/letmecallu_voip/eye/vozio.eye"]
05.03.2014 11:55:13 INFO -- [Eye] loading: /home/letmecallu_voip/eye/vozio.eye
05.03.2014 11:55:13 INFO -- [vozio:default] send_command: monitor
05.03.2014 11:55:13 INFO -- [vozio:default] schedule :monitor (reason: monitor by user)
05.03.2014 11:55:13 INFO -- [vozio:default] => monitor (reason: monitor by user)
05.03.2014 11:55:13 INFO -- [vozio:default] starting async with 0.2s chain monitor []
05.03.2014 11:55:13 INFO -- [vozio:ahn] schedule :monitor (reason: monitor by user)
05.03.2014 11:55:13 INFO -- [Eye] loaded: ["/home/letmecallu_voip/eye/vozio.eye"], selfpid <11987>
05.03.2014 11:55:13 INFO -- [vozio:ahn] => monitor (reason: monitor by user)
05.03.2014 11:55:13 INFO -- [vozio:ahn] pid_file not found, starting...
05.03.2014 11:55:13 INFO -- [vozio:default] <= monitor
05.03.2014 11:55:13 INFO -- [vozio:ahn] switch :starting [:unmonitored => :starting](reason: monitor by user)
05.03.2014 11:55:13 INFO -- [vozio:ahn] executing: jruby_ahn - with start_timeout: 180.0s, start_grace: 2.5s, env: nil, working_dir: /home/letmecallu_voip/voip-server-side/vozio-server-rayo
05.03.2014 11:55:13 INFO -- [Eye] client command: load /home/letmecallu_voip/eye/vozio.eye (0.011918452s)

05.03.2014 11:56:33 WARN -- [Eye::System] [ahn] sending :KILL signal to <960> due to timeout (180s)
05.03.2014 11:56:33 ERROR -- [vozio:ahn] execution failed with #<Timeout::Error: execution expired>; try increasing the start_timeout value (the current value of 180s seems too short)
05.03.2014 11:56:33 ERROR -- [vozio:ahn] process <> failed to start ("#<Timeout::Error: execution expired>")
05.03.2014 11:56:33 INFO -- [vozio:ahn] switch :crashed [:starting => :down](reason: monitor by user)
05.03.2014 11:56:33 INFO -- [vozio:ahn] schedule :check_crash (reason: crashed)
05.03.2014 11:56:33 INFO -- [vozio:ahn] <= monitor
05.03.2014 11:56:33 INFO -- [vozio:ahn] => delete (reason: delete by user)
05.03.2014 11:56:33 INFO -- [vozio:ahn] <= delete
05.03.2014 11:56:33 WARN -- [celluloid] Terminating task: type=:call, meta={:method_name=>:process}, status=:callwait

Additional process commands

It would be great to be able to 'reload' a process. Should we add a dedicated reload or a system to add user defined commands via dsl?

reel-eye crashes uninitialized constant HTTP::Header

Hi,

reel-eye 0.5.2 is awesome, and was working fine until we recently updated its dependencies.

0.5.2 started crashing on all our machines like so:

uninitialized constant HTTP::Header
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/gems/2.0.0/gems/reel-0.4.0/lib/reel/response.rb:3:in `<class:Response>'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/gems/2.0.0/gems/reel-0.4.0/lib/reel/response.rb:2:in `<module:Reel>'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/gems/2.0.0/gems/reel-0.4.0/lib/reel/response.rb:1:in `<top (required)>'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/gems/2.0.0/gems/reel-0.4.0/lib/reel.rb:18:in `<top (required)>'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/gems/2.0.0/gems/reel-rack-0.1.0/lib/reel/rack/server.rb:3:in `<top (required)>'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:58:in `require'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:45:in `require'
/usr/local/Cellar/ruby/2.0.0-p247/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:45:in `require'

Any idea what's going on? The error only occurs when we try to configure with http enable:true. We tried playing with the versions of celluloid, reel, and reel-rack, but no luck.

reel-rack 0.2.0 came out recently, is this related?

Thank you so much!

Issue when running under supervisor

I'm trying to run eye under supervisord as an user called "deploy".

I'm getting this error only when not running as root:

/home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/eye-0.5/lib/eye/local.rb:10:in `join': no implicit conversion of nil into String (TypeError)
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/eye-0.5/lib/eye/local.rb:10:in `dir'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/eye-0.5/lib/eye/local.rb:31:in `path'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/eye-0.5/lib/eye/local.rb:39:in `socket_path'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/eye-0.5/lib/eye/cli/commands.rb:5:in `client'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/eye-0.5/lib/eye/cli/commands.rb:9:in `_cmd'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/eye-0.5/lib/eye/cli/server.rb:5:in `server_started?'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/eye-0.5/lib/eye/cli.rb:58:in `load'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/thor-0.18.1/lib/thor/command.rb:27:in `run'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/thor-0.18.1/lib/thor/invocation.rb:120:in `invoke_command'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/thor-0.18.1/lib/thor.rb:363:in `dispatch'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/thor-0.18.1/lib/thor/base.rb:439:in `start'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/gems/eye-0.5/bin/eye:5:in `<top (required)>'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/bin/eye:23:in `load'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/bin/eye:23:in `<main>'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/bin/ruby_executable_hooks:15:in `eval'
  from /home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/bin/ruby_executable_hooks:15:in `<main>'

My supervisor conf:

[program:eye]
command=/home/deploy/.rvm/bin/chefbrowser_eye load /home/deploy/apps/chef-browser/eye/puma.eye
autostart=true
autorestart=true
startsecs=5
startretries=0
stopsignal=TERM
stopwaitsecs=5
user=root
redirect_stderr=false
stdout_logfile=/var/log/apps/eye_stdout.log
stdout_logfile_maxbytes=50MB
stdout_logfile_backups=0
stdout_events_enabled=false
stderr_logfile=/var/log/apps/eye_stderr.log
stderr_logfile_maxbytes=50MB
stderr_logfile_backups=0
directory=/home/deploy/apps/chef-browser
environment=PATH="/home/deploy/.rvm/gems/ruby-2.0.0-p353@chefbrowser/bin:/home/deploy/.rvm/gems/ruby-2.0.0-p353@global/bin:/home/deploy/.rvm/rubies/ruby-2.0.0-p353/bin:/home/deploy/.rvm/bin:/usr/local/bin:/usr/bin:/bin"

Any ideas ?

Thanks in advance!

Passing in environment variables?

Hi there,
I'm attempting to transfer my bluepill config over to eye, and one thing we do is use an environment variable for the rails environment to configure a few things. However, when I try to do this with eye, I can't seem to get the environment variable to pass into the script, no matter what I try to do.

The way I'd use bluepill before was:

RAILS_ENV=development bluepill load

I've tried this plus a ton of other tricks but I haven't figured out how to do something similar with eye. Do you have any advice for how to pass in a variable from the command line, so I can have the .eye config file perform different things based on that variable?

Thanks!

Custom check and action?

I have a running ffmpeg recording segments and logging the last 10 video files in recording.csv - something like this but just copying from the camera: http://stackoverflow.com/questions/8767727/transcode-and-segment-with-ffmpeg

I would like to make sure the recording.csv file was updated in the last 2 minutes and if not to kill a parent live555ProxyServer process (the first process in the process group).

Is this possible?

Setup

Hi,

Thanks a lot for the gem.

I am trying to figure out how to create the pid file when eye restarts as my process does not do it by default. Is there a check available to see if the process is alive?

Thank you

The exit code for the 'info' command always returns 0

In the Linux Standard Base, the exit codes are different if the service is running or not running. It would be nice to be able to check the exit code in wrapper scripts. http://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

history for child processes

Hi,

eye history for child processes seems odd to me, it produces empty lines

$ eye i sauspiel
sauspiel
  unicorn ......................... up  (Jan16, 0%, 149Mb, <26460>)
    child-11039 ................... up  (Jan16, 0%, 394Mb, <11039>)
    child-28576 ................... up  (Jan17, 1%, 396Mb, <28576>)
    child-9926 .................... up  (Jan18, 0%, 373Mb, <9926>)
    child-28010 ................... up  (Jan18, 0%, 386Mb, <28010>)
    child-10222 ................... up  (Jan18, 0%, 376Mb, <10222>)
    child-4095 .................... up  (Jan18, 0%, 397Mb, <4095>)
    child-4268 .................... up  (Jan18, 2%, 339Mb, <4268>)
    child-9074 .................... up  (Jan19, 0%, 369Mb, <9074>)
    child-6947 .................... up  (Jan19, 0%, 386Mb, <6947>)
    child-9776 .................... up  (Jan19, 0%, 371Mb, <9776>)
    child-9961 .................... up  (Jan19, 2%, 353Mb, <9961>)
    child-13943 ................... up  (Jan19, 0%, 390Mb, <13943>)
    child-15247 ................... up  (Jan19, 0%, 341Mb, <15247>)
    child-15335 ................... up  (Jan19, 0%, 369Mb, <15335>)
    child-18407 ................... up  (Jan19, 0%, 341Mb, <18407>)
    child-18695 ................... up  (Jan19, 0%, 344Mb, <18695>)
    child-16495 ................... up  (Jan19, 0%, 344Mb, <16495>)
    child-16903 ................... up  (Jan19, 0%, 352Mb, <16903>)
    child-28104 ................... up  (Jan19, 0%, 337Mb, <28104>)
    child-24127 ................... up  (Jan19, 0%, 329Mb, <24127>)

$ eye history sauspiel:unicorn
sauspiel:unicorn:
16 Jan 13:48 - restart        (restart by user)
08 Jan 10:28 - ... 10 times   (...)
08 Jan 10:28 - restart        (restart by user)
08 Jan 10:28 - monitor        (monitor by user)

$ eye history sauspiel:unicorn:\*




















$

Existing ~/.eye directory causes eye to misbehave

If the server was rebooted uncleanly, the existing ~/.eye directory causes eye to think processes are running when they are not, or to kill the wrong processes (based on PID?), or just not to start at all.

A quick fix is to rm -$ ~/.eye followed by eye load recorder.eye after a server reboot, but would be nice if eye could detect if it was rebooted?

I'm not sure how eye internals work, but maybe it can check for PID as well as process name or something before assuming a process is up when it is not?

Return nonzero status on bad commands

When calling eye incorrectly e.g. eye unicorn restart the command currently returns status 0.

It should indicate that it has failed with status 1.

eye seems to not recognize that a process has successfully started

I'm getting a lot of errors like this:

01.04.2014 20:49:13 WARN  -- [Eye::System] [puma] sending :KILL signal to <12563> due to timeout (120s)
01.04.2014 20:49:13 ERROR -- [production:services:puma] execution failed with #<Timeout::Error: execution expired>; try increasing the start_timeout value (the current value of 120s seems too short)
01.04.2014 20:49:13 ERROR -- [production:services:puma] process <12541> failed to start ("#<Timeout::Error: execution expired>")
01.04.2014 20:49:13 INFO  -- [production:services:puma] switch :crashed [:starting => :down] (reason: crashed)
01.04.2014 20:49:13 INFO  -- [production:services:puma] schedule :check_crash (reason: crashed)
01.04.2014 20:49:13 INFO  -- [production:services:puma] <= restore
01.04.2014 20:49:13 INFO  -- [production:services:puma] => check_crash  (reason: crashed)
01.04.2014 20:49:13 WARN  -- [production:services:puma] check crashed: process is down
01.04.2014 20:49:13 INFO  -- [production:services:puma] schedule :restore (reason: crashed)
01.04.2014 20:49:13 INFO  -- [production:services:puma] <= check_crash
01.04.2014 20:49:13 INFO  -- [production:services:puma] => restore  (reason: crashed)
01.04.2014 20:49:13 INFO  -- [production:services:puma] pid_file found, but process <12563> is down, starting...
01.04.2014 20:49:13 INFO  -- [production:services:puma] switch :starting [:down => :starting] (reason: crashed)

The process that's being monitored seems to be running fine until eye decides it's not and kills and restarts it.

I've tried extending the start_timeout as suggested, but that's clearly not the problem. How can I troubleshoot this?

implement service to support prowl/android/ios push notifications?

Would it be possible to implement prowl ( http://www.prowlapp.com )
or if you know better service which allows for iphone/android push notification.

We currently use god with prowl notification to get iphone push notification if certain services crash.

Also on a sidenode, would it be possible to use the config/environments/production.rb actionmailer config block with eye? It seems Eye.config mail block does not allow :username and password attributes?

We use mandrillapp.com to send mails and have it configures with username + password in action mailer config block

latest version of eye kills the rails app when running eye quit

I have this in my Rails controller:

system('bundle exec eye quit')

And this kills my running rails app which is not managed by eye. Any ideas what could be causing that and how to prevent this?

Eye kills itself

I have this check:

require 'rtsp/client'
class Rtsp < Eye::Checker::CustomDefer
  param :addr, String, true

  def initialize(*args)
    super
    @addr = addr
    @rtsp_client = RTSP::Client.new(@addr)
  end

  def get_value
    begin
      if @rtsp_client.describe.code == 200
        check_with_ffmpeg
      else
        false
      end
    rescue
      false
    end
  end

  def good?(value)
    value
  end

  def human_value(value)
    value == true ? 'Ok' : 'Err'
  end

  private
  def check_with_ffmpeg
    system("ffmpeg -loglevel warning -y -stimeout 5000000 " \
      "-i #{@addr} -c:v copy -an -vframes 1 -f rawvideo /dev/null")
  end
end

It works most of the time, however sometimes, I get this in eye.log and it stops monitoring all processes:

[2014-05-29 21:55:57 +0100] [celluloid] Terminating task: type=:timer, meta=nil, status=:receiving
[2014-05-29 21:55:57 +0100] [recorder:537b68987a616e64da431e00:proxy] check:rtsp task was terminated ["/home/deployer/xanagent/shared/bundle/ruby/2.1.0/gems/celluloid-0.15.2/lib/celluloid/tasks/task_fiber.rb:32:in `terminate'", "/home/deployer/xanagent/shared/bundle/ruby/2.1.0/gems/celluloid-0.15.2/lib/celluloid/actor.rb:404:in `block in cleanup'", "/home/deployer/xanagent/shared/bundle/ruby/2.1.0/gems/celluloid-0.15.2/lib/celluloid/actor.rb:404:in `each'", "/home/deployer/xanagent/shared/bundle/ruby/2.1.0/gems/celluloid-0.15.2/lib/celluloid/actor.rb:404:in `cleanup'", "/home/deployer/xanagent/shared/bundle/ruby/2.1.0/gems/celluloid-0.15.2/lib/celluloid/actor.rb:375:in `shutdown'", "/home/deployer/xanagent/shared/bundle/ruby/2.1.0/gems/celluloid-0.15.2/lib/celluloid/actor.rb:185:in `run'", "/home/deployer/xanagent/shared/bundle/ruby/2.1.0/gems/celluloid-0.15.2/lib/celluloid/actor.rb:157:in `block in initialize'", "/home/deployer/xanagent/shared/bundle/ruby/2.1.0/gems/celluloid-0.15.2/lib/celluloid/thread_handle.rb:13:in `block in initialize'", "/home/deployer/xanagent/shared/bundle/ruby/2.1.0/gems/celluloid-0.15.2/lib/celluloid/internal_pool.rb:100:in `call'", "/home/deployer/xanagent/shared/bundle/ruby/2.1.0/gems/celluloid-0.15.2/lib/celluloid/internal_pool.rb:100:in `block in create'"]

The formatted output looks like this:

[2014-05-29 21:55:57 +0100] [celluloid] Terminating task: type=:timer, meta=nil, status=:receiving
[2014-05-29 21:55:57 +0100] [recorder:537b68e17a616e64de651e00:proxy] check:rtsp task was terminated
celluloid-0.15.2/lib/celluloid/tasks/task_fiber.rb:32:in `terminate'
celluloid-0.15.2/lib/celluloid/actor.rb:404:in `block in cleanup'
celluloid-0.15.2/lib/celluloid/actor.rb:404:in `each'
celluloid-0.15.2/lib/celluloid/actor.rb:404:in `cleanup'
celluloid-0.15.2/lib/celluloid/actor.rb:375:in `shutdown'
celluloid-0.15.2/lib/celluloid/actor.rb:185:in `run'
celluloid-0.15.2/lib/celluloid/actor.rb:157:in `block in initialize'
celluloid-0.15.2/lib/celluloid/thread_handle.rb:13:in `block in initialize'
celluloid-0.15.2/lib/celluloid/internal_pool.rb:100:in `call'
celluloid-0.15.2/lib/celluloid/internal_pool.rb:100:in `block in create'

Any ideas why it happened and how to prevent this from happening?

Reading from socket that doesn't close

I want to see if an RTSP server is alive after starting, I have this check:

      check :socket, every: 30.seconds, times: 2, timeout: 3.seconds,
        addr: "tcp://localhost:12345",
        send_data: "DESCRIBE rtsp://localhost/proxyStream RTSP/1.0\nCSeq: 1\n\n",
        expect_data: /RTSP\/1\.0 200 OK/

I can see in my RTSP server that it gets a request and sends the correct response, however I see this in my eye.log:

11.03.2014 10:34:10 INFO  -- [record:Test:proxy] check:socket [*ReadTimeout<2.0>, *ReadTimeout<2.0>] => Fail
11.03.2014 10:34:10 ERROR -- [record:Test:proxy] NOTIFY: Bounded socket: [*ReadTimeout<2.0>, *ReadTimeout<2.0>] send to [:restart]
...

I think this is happening because there is no way to tell the RTSP server to close the connection after this DESCRIBE command. Is there anyway to get eye to just force a timeout, but still read the response and not consider that a failure?

How uninstall gem or swich to version 0.2.3?

I got error config error: undefined method logger=' for Eye:Module` and maybe it happens because I'm use version 0.3

$ gem list eye
*** LOCAL GEMS ***
eye (0.3.beta1, 0.2.3)

$ gem uninstall eye -v 0.3.beta1
INFO:  gem "eye" is not installed

How I can uninstall gem?

Logrotate

Hi guys, just want to check if eye has configuration to rotate all log files from code instead of setting up some other 3rd party process.

Notification

Sorry for posting a question here, let me know if you have a group or mailing list.

I have playing with eye for a while but couldn't make email notification work. We already have a :smpt server running so our configuration is very simple.

Eye.config do
  mail domain: 'mailer.xxx.org', host: 'mailer.xxx.org', port: 25, from_mail: '[email protected]', from_name: 'Eye'
  contact :dev, :mail, '[email protected]'
end

Eye.application 'app' do
  notify :dev
  process :process do
    notify :dev
  end
end

I cannot see anything from the log? Additionally, when is notify :dev triggered?

Many thanks.

Wait to wait for processes to stop

Couldn't find anything in the docs or a quick scour of the code. Ideally I would like something like:

bundle exec eye stop resque --wait

which wouldn't return until the signaled process(s) stopped. This is because ultimately I want to be able to do:

bundle exec eye stop all --wait
bundle exec eye quit

CPU usage checks

Hi,

I'm looking for an alternative for bluepill and eye looks quite nice. But what I'm missing so far are custom check conditions. Are there plans for implementation?
Especially the cpu checks wouldn't be sufficent for me. eye (and bluepill) determines the cpu usage of a process with 'ps aux'. According to the man page of ps

CPU usage is currently expressed as the percentage of time spent running during the entire lifetime of a process. This is not ideal, and it does not conform to the standards that ps otherwise conforms to. CPU usage is unlikely to add up to exactly 100%.

the cpu check will never trigger if you have a long running process, because it doesn't represent the process' current cpu usage (e.g. as reported by top), which could suddenly increase to 100% after many hours or days due to a bug and should get restarted. With bluepill I wrote a custom check condition as a workaround

module Bluepill
  module ProcessConditions
    class CpuUsagePercental < ProcessCondition

      def initialize(options = {})
        @below = options[:below]
      end

      def run(pid, include_children = false)
        `top -b -p #{pid} -n 1 |grep '#{pid}'|awk '{print $9}'`.chomp.to_i
      end

      def check(value)
        value < @below
      end
    end
  end
end

What do you think?

Monitoring CPU

Hi,

Just want to ask how to set up eye to monitor CPU both high CPU and low CPU.

Currently I set it up to monitor high CPU like this.

checks :cpu, every: 60, below: 90, times: 15

But couldn't see any docs regarding low CPU.

Thanks man,
Son.

Stack level too deep

Hi.

after testing eye config on our staging servers (it was fine) and moving to production, I'm not able to load config file and start eye

got this

$ bin/eye load script/eye/rp.eye
eye started!
stack level too deep
/usr/local/rvm/gems/ruby-1.9.3-p448/gems/state_machine-1.2.0/lib/state_machine/macro_methods.rb:1

No single line in log file. Can send list of gems, if necessary. is there any way how to trace in which part it failed?

Thanks

tom

Using custom pid and socket files path

Is it possible to use custom pid and socket file? I saw EYE_PID environment variable, but path given by eye xinfo is different when eye is running from root and from user.

I want to be able to start eye on the user and then manage it from root :)

Wrong timestamp in `eye info`

Hi!

on one box eye info reports the wrong process timestamp.

e.g.

$ eye version
Eye v0.5.1 (c) 2012-2014 @kostya
$ eye i sauspiel_delayed_job
sauspiel_delayed_job
  dj_rake_task .................... up  (12:21, 0%, 159Mb, <24243>)

but the last restart was at 12:42

03.02.2014 12:42:56 INFO  -- [sauspiel_delayed_job:dj_rake_task] switch :restarting [:up => :restarting] (reason: bounded memory(500Mb))
03.02.2014 12:42:56 INFO  -- [sauspiel_delayed_job:dj_rake_task] switch :stopping [:restarting => :stopping] (reason: bounded memory(500Mb))

$ ps -ef |grep -v grep | grep 24243
deploy   24243     1  7 12:43 ?        00:01:12 ruby ./script/delayed_job start

This time shift of ~20 minutes applies to all monitored processes on this box. On other boxes it is correct. Any hints?

output from the stdall is not synchronized

Hi there,

I have a problem with output from process which is managed by eye. This process prints some text to stdout every two seconds, but the file does not have output until I restart process via eye. It seems like output is collected somewhere and save is delayed.

example process:

require 'logger'

logger = Logger.new('log/hi.log', 1, 1024 * 1024)

loop do
  str = (0...50).map { ('a'..'z').to_a[rand(26)] }.join
  logger.info str
  puts ENV['TEST'] + ENV['TESTT']
  puts 123
  sleep(2)
end

explain of eye configuration:

{:settings=>
  {:logger=>["/var/log/eye/eye.log", 3, 1048576],
   :mail=>{:host=>"...", :port=>..., :type=>:mail},
   :contacts=>{"smefju"=>{:name=>"smefju", :type=>:mail, :contact=>"smefju@...", :opts=>{}}}},
 :applications=>
  {"Custom processes"=>
    {:name=>"Custom processes",
     :working_dir=>"/home/app/app/current/",
     :stdall=>"/home/app/app/shared/log/eye.log",
     :stdout=>"/home/app/app/shared/log/eye.log",
     :stderr=>"/home/app/app/shared/log/eye.log",
     :notify=>{"smefju"=>:warn},
     :triggers=>{:flapping=>{:times=>3, :within=>60, :retry_in=>300, :type=>:flapping}},
     :groups=>
      {"__default__"=>
        {:name=>"__default__",
         :working_dir=>"/home/app/app/current/",
         :stdall=>"/home/app/app/shared/log/eye.log",
         :stdout=>"/home/app/app/shared/log/eye.log",
         :stderr=>"/home/app/app/shared/log/eye.log",
         :notify=>{"smefju"=>:warn},
         :triggers=>{:flapping=>{:times=>3, :within=>60, :retry_in=>300, :type=>:flapping}},
         :application=>"Custom processes",
         :processes=>
          {"hi"=>
            {:name=>"hi",
             :working_dir=>"/home/app/app/current/",
             :stdall=>"/home/app/app/shared/log/hi_stdall.log",
             :stdout=>"/home/app/app/shared/log/hi_stdall.log",
             :stderr=>"/home/app/app/shared/log/hi_stdall.log",
             :notify=>{"smefju"=>:warn},
             :triggers=>{:flapping=>{:times=>3, :within=>60, :retry_in=>300, :type=>:flapping}},
             :application=>"Custom processes",
             :group=>"__default__",
             :pid_file=>"/var/run/app/hi.pid",
             :start_command=>"ruby processes/hi.rb",
             :environment=>{"TEST"=>1, "TESTT"=>2},
             :daemonize=>true}}}}}}}

Any idea?

feature request: never ending wait_for_condition

The code I'm using now is:

  trigger :transition, to: :starting, do: -> {
    month = 60 * 60 * 24 * 28
    process.wait_for_condition(month, 15) do
      info "Waiting for camera #{url} to be up"
      timeout = 5 * 1_000_000 # 5 seconds in microseconds
      system("ffmpeg -loglevel warning -y -stimeout #{timeout} " \
        "-i #{url} -c:v copy -an -vframes 1 -f rawvideo /dev/null")
    end
  }

But even after a month, I want to be able to wait_for_condition endlessly. If the camera is offline, there is no reason to ever start the proxy server process, the condition just needs to be retried every 15 seconds until the camera shows up on the network (if ever).

How would I make wait_for_condition never time out? - What is the rational for the wait_for_condition to time out? - in other words, under what circumstances is it useful to time out and ignore a condition? (why have a condition in the first place if it will be ignored after a while?)

Default directory for eye's pid and socket in $HOME is problematic

Hello,

I'm having issues with the default behaviour of creating pid/socket files in $HOME. I have a system-wide installation of eye on an Ubuntu system, I thus invoke eye either via sudo from my account or logging directly as root. $HOME is however different in both cases leading to communication issues.

On LTS system I would put eye's files in /var/run/eye and I do in my fork but of course this has to be hardcoded and would not be suitable for those who run it as normal user.

This is why I'm not proposing a pull request.

Maybe it would be appropriate to make the server's directory configurable between "user" and "system-wide" mode and make the client search pid/socket in the home dir and in /var/run by default with maybe the possibility to override this behaviour as a CLI option.

If you deem this approach viable I can make a patch and pull request.

Bye,

fork a proc rather than launching an external process?

Would this be possible? Instead of launching external processes I would like to fork different bits of code and monitor them.

Thanks.

Piping one command to another doesn't work (e.g. cmd1 | cmd2)

When I make my command something like:

start_command 'command_1.py | command_2.rb'

It appears the first command rans and the '|' (pipe) is just provided as an escaped argument instead of running the command and piping the output from the first command.