Code Monkey home page Code Monkey logo

chiamon's Introduction

ChiaMon

Example Chia monitoring stack, using:

This includes a docker-compose configuration to run everything, but this is primarily intended for development and testing.

WARNING this is NOT a one-click install, expect to need to do some work setting everything up for your machine. PLEASE read the notes below and understand what all the services are, what they do, and how they work together.

Chia dashboard

mtail program

The mtail program is in mtail/chialog.mtail. Currently it only collects harvester metrics:

  • chia_harvester_blocks_total: cumulative number of block challenges attempted
  • chia_harvester_plots_total: current number of plots
  • chia_harvester_plots_eligible: cumulative number of plots that passed filter
  • chia_harvester_proofs_total: cumulative number of proofs won
  • chia_harvester_search_time: histogram of proof search times

NOTE you need to set log_level to INFO in your Chia config.yaml to get harvester metrics.

chia_exporter

The chia_exporter is used to collect metrics from the Chia node RPC API.

Grafana dashboard

The example Grafana dashboard is in grafana/dashboards/Chia.json. It defines a number of variables that will be auto-populated from the node metrics. Grafana dashboards are easily customized to show what you're interested in seeing, in the way you find best; this dashboard is just meant to demonstrate what can be done.

Running on Linux/Mac

The docker-compose file will mount the Chia log from $HOME/.chia/mainnet/log/debug.log, verify that this location is correct and set the log level to INFO in the Chia configuration (usually at $HOME/.chia/mainnet/config/config.yaml).

Run:

docker-compose up -d

This will do the following:

  • Build container image with configuration for mtail from source
  • Build container image for chia_exporter from source
  • Download other images from docker hub
  • Run containers in the background, attached to the host network (this makes it easy to communicate with native services, but has some trade-offs. See notes.)

The grafana service provisions the prometheus and loki datasources and a basic dashboard that displays harvester and node metrics.

Access Grafana at http://localhost:3000 and login with the default admin/admin username and password (you'll be prompted to change the password).

Notes

  • It's highly encouraged to run the node exporter natively rather than in docker - see the discussion in the node_exporter docs. On Ubuntu you can run sudo apt install prometheus-node-exporter, which includes disk SMART monitoring (disk temperatures, etc) as well. If you do run it in Docker, you'll need to bind-mount in any other volumes you want to monitor (add them to the volumes list in docker-compose.yml, e.g. - '/scratch:/scratch'). See issue #3.

  • The docker-compose file uses the $HOME environment variable for the Chia log paths. Verify that these paths are correct, and if you run the docker-compose commands with sudo then you'll have to replace $HOME with the actual path (since root's home is not your home!). Even better, add your user to the docker group so you don't have to use sudo:

      sudo usermod -a -G docker username
    
  • On Mac you'll need to run node_exporter natively, not under Docker: brew install node_exporter. You'll probably need to change the networking setup too, since Docker on Mac runs in a VM. See the windows docker-compose and prometheus configs.

Running on Windows

The node exporter does not work on Windows; instead you need to use the Windows exporter for system metrics. Modified config and example dashboard are in the windows branch. You may also want to review the discussion in issue #2.

These steps will get you to a working setup (but aren't the only way):

Monitoring Multiple Nodes

To monitor multiple nodes (e.g. multiple harvesters), you just need to run the appropriate exporters (e.g. node_exporter and mtail for a harvester) and add them as targets in the prometheus config, for example:

  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100', 'harvester1:9100', 'harvester2:9100']
  - job_name: 'mtail'
    static_configs:
      - targets: ['localhost:3903', 'harvester1:3903', 'harvester2:3903']

If you're also running Loki to collect logs, you'll also want to run promtail on every node, and configure it to push logs to Loki on your monitoring node:

clients:
  - url: http://loki-server:3100/loki/api/v1/push

Copyright & License

Copyright 2021 Kevin Retzke

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

See LICENSE.txt

chiamon's People

Contributors

retzkek avatar schmiddim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chiamon's Issues

Data jump exponentially

Hi!
I use chaimon on windows. Everithing working well, but after a short amount of time (aprox. 10 min) the data from the logs jumping exponentialy. If i restart the docker machine the problem solving, but after 10 minutes (or the logs rotation time) data jumps again.

2
UPDATE: Looks like only need chiamon_mtail restart and the dataflow is normalizing. (Some kind of buffer problem?)

chia_exporter problem

chia_exporter composes up and runing at console bot docker cant deploy it i gave the all -read-write auth but it exits all the time without error.
Also log level converted to info but no chia logs :(
any ideas? thanks in advance
image
image

PROJECT LOG
Building mtail
[+] Building 1.5s (12/12) FINISHED
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 32B 0.0s
=> [internal] load .dockerignore 0.1s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 1.1s
=> [internal] load metadata for docker.io/library/golang:alpine 1.2s
=> [builder 1/3] FROM docker.io/library/golang:alpine@sha256:4dd403b2e7a689adc5b7110ba9cd5da43d216cfcfccfbe2b356 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 35B 0.0s
=> [stage-1 1/3] FROM docker.io/library/alpine@sha256:69e70a79f2d41ab5d637de98c1e0b055206ba40a8145e7bddb55ccc04e 0.0s
=> CACHED [builder 2/3] WORKDIR /build 0.0s
=> CACHED [builder 3/3] RUN apk add --update --no-cache --virtual build-dependencies git make && git clone http 0.0s
=> CACHED [stage-1 2/3] COPY --from=builder /build/mtail/usr/local/bin/mtail /usr/bin/mtail 0.0s
=> CACHED [stage-1 3/3] COPY *.mtail /etc/mtail/ 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:707b7529ec7ed655d7192c04d75c457636ccadc9dfe9cedd3891ab6f96f179fd 0.0s
=> => naming to docker.io/library/chiamon_mtail 0.0s
Building chia_exporter
[+] Building 0.8s (10/10) FINISHED
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 32B 0.0s
=> [internal] load .dockerignore 0.1s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 0.4s
=> [internal] load metadata for docker.io/library/golang:alpine 0.5s
=> [builder 1/3] FROM docker.io/library/golang:alpine@sha256:4dd403b2e7a689adc5b7110ba9cd5da43d216cfcfccfbe2b356 0.0s
=> [stage-1 1/2] FROM docker.io/library/alpine@sha256:69e70a79f2d41ab5d637de98c1e0b055206ba40a8145e7bddb55ccc04e 0.0s
=> CACHED [builder 2/3] WORKDIR /build 0.0s
=> CACHED [builder 3/3] RUN apk add --update --no-cache --virtual build-dependencies git make && git clone http 0.0s
=> CACHED [stage-1 2/2] COPY --from=builder /build/chia_exporter/chia_exporter /usr/bin/chia_exporter 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:cc21dfc65c2cebd8310d9bef2b3343641212c2c811a9eda740ee5595cd2b87c8 0.0s
=> => naming to docker.io/library/chiamon_chia_exporter 0.0s
Building prometheus
[+] Building 1.6s (7/7) FINISHED
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 31B 0.0s
=> [internal] load .dockerignore 0.1s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/prom/prometheus:latest 1.2s
=> [internal] load build context 0.0s
=> => transferring context: 36B 0.0s
=> [1/2] FROM docker.io/prom/prometheus@sha256:d1a9a86b9a3e60a9ea3cde141bdc936847456acc497e0affe7e288234383efa5 0.0s
=> CACHED [2/2] COPY prometheus.yml /etc/prometheus/prometheus.yml 0.0s
=> exporting to image 0.1s
=> => exporting layers 0.0s
=> => writing image sha256:314e9b91e782886c652a9674baa561c12ad2b938574990453dd51d5b7cd235e3 0.0s
=> => naming to docker.io/library/chiamon_prometheus 0.0s
Building grafana
[+] Building 1.8s (9/9) FINISHED
=> [internal] load build definition from Dockerfile 0.2s
=> => transferring dockerfile: 32B 0.0s
=> [internal] load .dockerignore 0.2s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/grafana/grafana:latest 1.1s
=> [1/4] FROM docker.io/grafana/grafana@sha256:09bb407e26abc38cf010fcac8e13910491213e60278f07878b335257b6f844c1 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 196B 0.0s
=> CACHED [2/4] COPY dashboards /etc/grafana/dashboards/ 0.0s
=> CACHED [3/4] COPY dashboards.yaml /etc/grafana/provisioning/dashboards/ 0.0s
=> CACHED [4/4] COPY datasources.yaml /etc/grafana/provisioning/datasources/ 0.0s
=> exporting to image 0.2s
=> => exporting layers 0.0s
=> => writing image sha256:a08370f9d1c977cc521c00b0f0bfe27f607b907c508b6058ebac29b357386559 0.0s
=> => naming to docker.io/library/chiamon_grafana 0.0s
chiamon_prometheus_1 is up-to-date
Starting chiamon_chia_exporter_1 ...
chiamon_mtail_1 is up-to-date
Starting chiamon_chia_exporter_1 ... done

debug file bug

Since the log file is in INFO mode many logs are collected due to which after about 80MB of data a new log file is created which causes the mtail to malfunction and acquire random data for the variables.

only solution is to recompose the doc file in VSL which fixes the issue until another new log file is created and same happens again.

Probable solution: Maybe possible to increase the file size for log in chia or the mtail should automatically restart itself as the new file is created.

Some mounts are not shown in the list

My field disk is mounted to /mnt/fields/0 and this mountpoint does not show up in mounts, and thus I cannot track disk usage.
I imagine I have to tweak something in grafana/dashboards/Chia.json but I cannot determine where $mounts is defined.

Sélection_446
Sélection_445

Grafana dashboard source

Hi all,

its more a question then an issue:
the grafana dashboard in the source tree looks a bit different from the one which is copied in the readme file. Would it be possible to publish also the dashboard from the readme ? (I think mainly the disk temperature is missing).
Thanks in advance !
Its a really beautiful and comprehensive dashboard ! Thanks once more for this great work !
Cheers Peter

What is the search time histogram showing?

I'm wondering what the search time histogram is showing exactly? Is that the equivalent of the nr. of seconds for proofing a plot when farming? If not, how can you create a component that shows if the speed of plots are below the 5 and 30 second mark?

I think a lot of people don't know that the respondspeeds for proofing and answering when farming is bigger then 30 seconds and they will not earn anything.

Cannot login anymore

Hi after rebooting the chia farmer and restarting chia I cannot seem to log into Grafana anymore. The default credntials also won't work. Is there a way to manaully (re)set the credentials? I can access the server it's running on at root level.

Accessing grafana dashboard

I want to be able to check on plots and sync condition while I am away from home.

Do you know any way I can access the grafana dashboard while I'm away.

It is possible to add this functionality to Chiamon, it would be great addition to the application.

Farming stats from multiple harvesters

I was able to setup everything. Great job putting this together.

Wondering if there is a way to add stats about remote harvesters across other machines as well on the same dashboard as they connect to the farmer server.

Getting "plots: no data"

I'm running Chiamon on my Ubuntu 20.04 LTS, with Chia 1.15 installed.

In the Grafana Dashboard, the plots panel always shows up as "No data".

image

I have verified that:

  • my Prometheus is able to talk to all 4 services
  • my Chia logs are set to INFO
  • I'm running node_exporter natively instead of in Docker, per README notes. To do this, I run prometheus-node-exporter in Terminal before running docker-compose up -d.
    -- (note: I've also tried using node_exporter in Docker, but I get same "no plot" error).
  • my Chia client is up and running properly
  • tried reinstalling chiamon (by removing local chiamon directory and then git clone again)
  • verified my debug.log files are in $HOME/.chia/mainnet/log/debug.log

As one additional data point: When I run docker-compose ps, I also see that chiamon_node_exporter_1 has exit code of 1, as it cannot use Port 9100, as I have already run prometheus-node-exporter before running docker-compose up -d to initialize chiamon_node_exporter_1.

Any potential solutions?

mtail fails to recover after log rotation

i'm running a modified configuration, so this might be a "me" problem and not an issue with the project...

here's my setup:

Synology DS2419+ with Docker v18.09.8 (latest via Package Center)

  • chiamon_mtail
  • chiamon_chia_exporter
  • chiamon_node_exporter
  • chia (full node, farmer, and harvester)

PhotonOS with Docker v19.03.15

  • chiamon_prometheus
  • chiamon_grafana

The Issue:

when chia's debug.log rotates, mtail stops updating with new information. the docker image itself stays running, but no updates related to Search Time or new Plots are seen until i restart the chiamon_mtail container.

mtail-gap

log rotated at :26, container manually restarted at :36 -

$ ls -l /volume1/chia/appdata/mainnet/log/  
total 123540  
-rwxrwxrwx+ 1 root root   643928 May  6 17:36 debug.log  
-rwxrwxrwx+ 1 root root 20971568 May  6 17:26 debug.log.1  
-rwxrwxrwx+ 1 root root 20971585 May  6 13:01 debug.log.2  
-rwxrwxrwx+ 1 root root 20971565 May  6 09:07 debug.log.3  
-rwxrwxrwx+ 1 root root 20971717 May  6 05:58 debug.log.4  
-rwxrwxrwx+ 1 root root 20971521 May  6 02:24 debug.log.5  
-rwxrwxrwx+ 1 root root 20971583 May  5 23:10 debug.log.6  

no entry in mtail's container log around :26, but does have entries when i restarted the container -

$ sudo docker logs chiamon_mtail_1  
I0506 21:12:21.649568       1 mtail.go:193] Listening on [::]:3903  
I0506 22:12:21.465474       1 store.go:153] Running Store.Expire()  
I0506 23:12:21.444691       1 store.go:153] Running Store.Expire()  
I0507 00:12:21.443927       1 store.go:153] Running Store.Expire()  
I0507 01:12:21.446024       1 store.go:153] Running Store.Expire()  
I0507 01:35:12.859262       1 main.go:155] Received terminated, exiting...  
I0507 01:35:12.859578       1 loader.go:428] END OF LINE  
I0507 01:35:12.859576       1 mtail.go:206] Shutdown requested.  
I0507 01:35:12.859602       1 vm.go:1025] VM "chialog.mtail" finished  
I0507 01:35:56.682599       1 main.go:113] mtail version v3.0.0-rc45-31-g12162ff3 git revision 12162ff338afebf71536cef709f15b0bf5c29b8b go version go1.16.3 go arch amd64 go os linux  
I0507 01:35:56.682685       1 main.go:114] Commandline: ["/usr/bin/mtail" "-progs" "/etc/mtail" "-logs" "/var/log/chia/debug.log" "-logtostderr"]  
I0507 01:35:56.683161       1 store.go:178] Starting metric store expiry loop every 1h0m0s  
I0507 01:35:56.825930       1 loader.go:242] Loaded program chialog.mtail  
I0507 01:35:56.826086       1 tail.go:259] Tailing /var/log/chia/debug.log  
I0507 01:35:56.826239       1 mtail.go:193] Listening on [::]:3903  

Running on Mac?

On a Mac, if I run docker compose up I get:

Error response from daemon: path / is mounted on / but it is not a shared or slave mount

If I instead brew install node_exporter and the run node_exporter I can see it running:

level=info ts=2021-05-08T14:38:34.107Z caller=node_exporter.go:195 msg="Listening on" address=:9100
level=info ts=2021-05-08T14:38:34.108Z caller=tls_config.go:191 msg="TLS is disabled." http2=false

However it is unclear what should I change now in docker-compose.yml to communicate with this process instead of the docker node_exporter service already defined.

Any help is appreciated.

Temperature

On your screenshots, you have temperature blocks. How to add it?

Windows Version? Docker has problem

So I tried this on Docker using Windows 10 Home with WSL2 backend and it's having problem with node_exporter can't mount the / path as shared/slave.

I saw that there is a Windows Exporter for Prometheus available but I'm not well verse enough to work on it. If possible can you try and make a Windows version?

Thank you for the great work!

remote setup

Is there a way to configure this where the prometheus and graphfana are in one locations and all that logs and stats get pushed there? My reason is that i have a number of remote harvesters and one remote farmer so i thought it would be good to have a docker to bring up the stat/log collection per host and another for the display and endpoint of the collected stats and logs

Node Exporter Volume

Hi guys,

Thanks for chiamon, it's pretty neat. Trying to run this on Windows and looking that docker-compose:

  node_exporter:
    image: quay.io/prometheus/node-exporter:latest
    command:
      - '--path.rootfs=/host'
    pid: host
    volumes:
      - 'C:\:/host:ro,rslave'

What's the volume for the node_exporter meant to be? I can't get it to run, I get:

2021/05/15 12:14:16 chia_exporter version 0.2

2021/05/15 12:14:16 error calling get_network_info: Post "https://localhost:8555/get_network_info": dial tcp [::1]:8555: connect: connection refused

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.