Code Monkey home page Code Monkey logo

monsit's People

Contributors

airekans avatar

Watchers

 avatar

monsit's Issues

SIGINT cannot stop the agent.

I got a error when running agent from a machine.

return_code: 0
msg: "SUCCESS"

^CTraceback (most recent call last):
  File "core.pyx", line 348, in gevent.core.loop.handle_error (gevent/gevent.core.c:6380)
  File "/home/yahuang/programming/Monsit/build/client/out00-PYZ.pyz/gevent.hub", line 291, in handle_error
KeyboardInterrupt
return_code: 0
msg: "SUCCESS"

Please figure out its root cause and check whether it's a bug of rpc module.

separate master from email sender

Right the master is responsible for sending email after detecting certain condition is met.
The master should be separated from the sending function because if the email function is not functional, it can be restarted or rewrote independently.

My initial thought was to use a queue to store the request and process these request in another dedicated process.

Add display setting page

Now what should be shown in the host information page is hard-coded in the template.
But it's better to let user configure what to display in the information page.

Email notification is not reliable

Sometimes email sending is not reliable, especially when the network is down.
I can check some third-party push service to see whether it's okay to use.
For example, this one.

Network stat seems wrong

image
As shown in the image, the stat number is huge and also have negative number, this is wrong, and should be fixed.

Add admin dashboard

Right now to add a new field, user has to operate the DB, which is not easy.

Show the top memory/CPU consumer

It would be nice to show the top memory/CPU consumer, so that you will know the killer app at certain moment.
How to store and the information should be consider.

Change the stat report mechanism

Right now, the agent reports stat with so many redundant data. for example, CPU stats.
This causes the stats showing code so complicated, which is not scalable.
I should change the stat to report a single number which makes it so easy to change and scale.

Add option to let user specify the target connection.

Now the default load balancer is use flow num to do routing.
But if the user wants to send requests to a certain host, he can only no way to do it.
The best way is to add a option in RpcController so that users can control whether the request can be randomly routed to hosts or not.

Add support for cluster monitoring

Now every host is a single entity, while in reality a host may belong to a cluster.
To monitoring a cluster, we should sum up the data in the host and show them to user.

Timeout handling cause sending mail forever

Should check timeout handling to see what's the root cause.
It cause a failed message keep sending forever.

Already found the root cause: the time inconsistency in different hosts causes the timeout logic functions incorrectly.
Should fix this.

Improve web page performance

When user clicks auto-refresh in the machine state page, every chart will try to connect to server to get data.
This can be improved by collect all refresh request and send them together.

Auto-reconnects when the connection fails.

Right now when the TcpConnection fails, it will just close the connection and do nothing.
It's better to try to reconnect multiple times, so that when the down-stream service is down, it will still be able to recover later.

Make master and agent daemon

Right now the master and agent are not daemon.
If I want to run it as daemon, it should be run as nohup python server.py &, which is not easy to use.
Master and agent should run in daemon mode by default.

Enhance the connection failure detection.

Right now, the only way to detect a connection failure is when heartbeat time out 3 times.
This is not a best way to detect connection failure.
We can use the request timeout to help detect connection failure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.