Code Monkey home page Code Monkey logo

athena-core's Introduction

Athena

Athena Processor Athena Monitor Docker publish (ghcr.io)

Athena is a file processor service, that consumes files stored in the files.com API and runs a series of reports over the downloaded artifacts and subsequently it talks with the Salesforce API for performing actions (currently, only comments are supported)

Basics

There are 3 software components in athena:

  1. Athena-monitor: Monitor changes in several directories across a file.com account and if new files are found those are sent to the processor for background processing.

  2. Nats: Nats is a light messaging daemon that allows a pubsub system to be implemented on top, its used to dispatch messages from athena-monitor to a athena-processor

  3. Athena-processor: Subscribes to messages from monitor and routes the reports that have to be run over a given detected file, subsequently it will perform an action on salesforce (such as posting a comment, etc)

The basic flowchart of interaction is as follows

Hacking

In order to stand up a development environment, you will need

  • make
  • docker
  • docker-compose
  • golang >= 1.19

For running a docker based installation locally you will need a sandbox account on Salesforce and a sandbox directory on files.com. Supply

  1. A list of the corresponding credentials in creds.yaml,

    db:
      dialect: mysql
      dsn: "athena:athena@tcp(db:3306)/athena?charset=utf8&parseTime=true"
    
    filescom:
      key : "***"
      endpoint: "https://..."
    
    salesforce:
      endpoint: "https://..."
      username: "***"
      password: "***"
      security-token: "***"
  2. A list of directories to monitor in athena-monitor-directories.yaml,

    monitor:
      directories:
        - "/sandbox/..."
        - "/sandbox/..."
  3. A path for where the report uploads will go in athena-processor-upload.yaml,

    processor:
      reports-upload-dir: "/sandbox/..."

And finally run

make devel

In case the docker-build step fails you can try to re-run the make command without using the cache,

NOCACHE=1 make devel

The devel deployment includes a debug container which can be used to inspect the database.

$ docker exec --interactive --tty debug bash
# mysql -h db -u athena -pathena athena
mysql> describe files;
+------------+---------------------+------+-----+---------+----------------+
| Field      | Type                | Null | Key | Default | Extra          |
+------------+---------------------+------+-----+---------+----------------+
| id         | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| created_at | datetime(3)         | YES  |     | NULL    |                |
| updated_at | datetime(3)         | YES  |     | NULL    |                |
| deleted_at | datetime(3)         | YES  | MUL | NULL    |                |
| created    | datetime(3)         | YES  |     | NULL    |                |
| dispatched | tinyint(1)          | YES  |     | 0       |                |
| path       | longtext            | YES  |     | NULL    |                |
+------------+---------------------+------+-----+---------+----------------+
7 rows in set (0.01 sec)

athena-core's People

Contributors

bilboer avatar dosaboy avatar freyes avatar lathiat avatar nicolasbock avatar niedbalski avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

athena-core's Issues

Split long comments

SalesForce limits the length of comments. Athena needs to check whether a comment is larger than the limit and then split that comment into multiple ones.

avoid conflicts with file mover

Currently athena is competing with the file-mover which can kick it at any point and move uploaded files before athena has a chance to download them.

Implement producer / consumer work queue

Currently Athena is using a publisher / subscriber model in which all subscribers receive all messages from the publisher. This model does not allow to scale out the processor since all processors would work on all jobs. We need to move to a queue system to allow for multiple processor (or multiple monitor) tasks.

Integration testing

Add a simple golang test using dockertest that goes through the full workflow:

  1. Create or retrieve a case number in testing sandbox
  2. Upload artifacts on files.com that matches case number
  3. Assert that a comment and a resulting output file exists.

Improve file/case matching

Recently I came accross a case where athena posted the results but the file was related to another case. The filename was this:

/uploads/sosreport-AC163-WZP253106Q5-00330921-2022-03-22-ktveflv.tar.xz

Athena posted the results in case 253106 while should post it in case 00330921.

Summary printing no longer works

Athena is no longer adding the summary to SF. Appear to be happening only in the last few days.

Looks like there's a shell script somewhere.

E.g.

tmp/athena/athena-report-sosreports719461722/run-script-925565340: line 5: [: sosreport-juju-a9d6f4-25-lxd-2-00321501-2021-10-22-chpgvbi.short.summary: binary operator expected
No known bugs or issues found on sosreport.

sometimes a sosreport isn't found though it's present

eg case - 00335429

https://files.support.canonical.com/files/customers/foo/00335429/sosreport-juju-a9d6f4-25-lxd-2-00335429-2022-04-27-cchvwkw.tar.xz exists and can be downloaded,

but when athena ran hotsos, we got an empty output,

2022-04-27 00:41 UTC-
-Athena processor: 23af8d3c4d2b subscriber: sosreports has run the following reports.

Summary for report: hotsos - filepath: uploads/sosreport-juju-a9d6f4-25-lxd-2-00335429-2022-04-27-cchvwkw.tar.xz
-------------------------------------------------------------------------
{}


Full hotsos output can be found at: https://files.support.canonical.com/files/customers/athena-reports/sosreport-juju-a9d6f4-25-lxd-2-00335429-2022-04-27-cchvwkw.tar.xz.athena-hotsos.hotsos-full

The full output has just,

ERROR: invalid path or option '/tmp/athena/athena-report-sosreports911261865/00322012_sosreport-sna2r03c010-2021-11-28-ffopvto/'
No full sosreport generated.

hotsos summary doesn't render HTML entities

In the summary displayed in SF comments, HTML entities aren't getting replaced.

E.g. &lt; is shown instead of <.

I am not sure if this needs to be addressed on the SF side or in Athena, I am just raising it here.

Athena loses connection to SalesForce from time to time

The monitor loses connection to SalesForce after some time and does not recover.
The error message in the monitor log is something similar to:

athena-monitor      | 2023/10/16 13:05:40 [simpleforce] request failed, 401
athena-monitor      | 2023/10/16 13:05:40 [simpleforce] Failed resp.body:  [{"message":"Session expired or invalid","errorCode":"INVALID_SESSION_ID"}]
athena-monitor      | 2023/10/16 13:05:40 [simpleforce] HTTP GET request failed: https://canonical.my.salesforce.com/services/data/v43.0/query?q=SELECT%20Id%2CCaseNumber%2CAccountId%20FROM%20Case%20WHERE%20CaseNumber%20LIKE%20%27%2500371350%25%27
athena-monitor      | {"level":"error","msg":"[simpleforce] Error. http code: 401 Error Message:  Session expired or invalid Error Code: INVALID_SESSION_ID","time":"2023-10-16T13:05:40Z"}

Multiple files tags

Implement a way to distinguish between multiple files generated by a single command report.

            hotsos-short:
              exit-codes: 0 2 127 126
              output-files:
                   short: *.short.summary$
                   long: *.long.summary$
              run: |
                #!/bin/bash
                git clone --quiet https://github.com/canonical/hotsos.git {{basedir}}/hotsos &>/dev/null
                tar -xf {{filepath}} -C {{basedir}} &>/dev/null
                {{basedir}}/hotsos/hotsos.sh -s --all-logs --short {{basedir}}/$(basename {{filepath}} .tar.xz)/ &>/dev/null
                if [ -s *.short.summary ]; then
                  cat *.short.summary
                else
                  echo "No known bugs or issues found on sosreport."
                fi
                rm -f *.short.summary
                exit 0

That you can later refer from the template as

        {%- for file, content in report.files %}
        {% endfor %}

detect simpleforce connection errors

sometimes the connection to sf gets corrupted and athena never attempts a reconnect which leave athena in a failed state and requiring a manual restart:

2023/08/23 17:34:58 [simpleforce] Failed resp.body: [{"message":"Session expired or invalid","errorCode":"INVALID_SESSION_ID"}]
2023/08/23 17:34:58 [simpleforce] HTTP GET request failed: ##redacted##
{"level":"error","msg":"Failed to upload and save report: foo - error: [simpleforce] Error. http code: 401 Error Message: Session expired or invalid Error Code: INVALID_SESSION_ID","time":"2023-08-23T17:34:58Z"}

add support for retry/kick

Sometimes athena fails to download a sosreport for whatever reason. It would be useful to have a way to get it to retry.

report corrupt sosreports

right now if an sosreport is uploaded but it is corrupt, we just don't see hotsos results. Ideally we should get some message if the sosreport is corrupt or failed extracting, in order to request a reupload as early as possible

Move deployment to Kubernetes

Currently Athena is deployed using docker-compose. Kubernetes offers some advantages such as auto scaling and monitoring.

error starting new script runner

I am seeing the following in the athena-processor typically a few days after a restart:

{"level":"info","msg":"Downloading 'uploads/sosreport-XXX.tar.xz' to '/tmp/athena/athena-report-sosreports865620253'","time":"2023-09-18T07:18:54Z"}
{"level":"error","msg":"Failed to get new runner: Not Found - http-code: 404","time":"2023-09-18T07:18:54Z"}
{"component":"pubsub","duration":266873658,"err":"Not Found - http-code: 404","handler":"athena-processor-30389025ce78","id":"436","level":"error","metadata":{"x-audit-user":""},"msg":"Failed Processing PubSub Msg","stack":"goroutine 23 [running]:
runtime/debug.Stack(0xb28700, 0xc0006c9a68, 0xbdb487)
/opt/hostedtoolcache/go/1.16.15/x64/src/runtime/debug/stack.go:24 +0x9f
github.com/lileio/pubsub/v2/middleware/logrus.Middleware.SubscribeInterceptor.func1(0xcc2990, 0xc000026030, 0xc0004b15ac, 0x3, 0xc000321560, 0xc00069c120, 0x105, 0x120, 0xc0003a60d8, 0xc0000bc8d0, ...)
/home/runner/go/pkg/mod/github.com/lileio/pubsub/[email protected]/middleware/logrus/logrus.go:57 +0x537
github.com/lileio/pubsub/v2.chainSubscriberMiddleware.func1.1(0xcc2990, 0xc000026030, 0xc0004b15ac, 0x3, 0xc000321560, 0xc00069c120, 0x105, 0x120, 0xc0003a60d8, 0xc0000bc8d0, ...)
/home/runner/go/pkg/mod/github.com/lileio/pubsub/[email protected]/subscribe.go:167 +0x170
github.com/lileio/pubsub/v2/providers/nats.(*Nats).Subscribe.func1(0xc00015a230)
/home/runner/go/pkg/mod/github.com/lileio/pubsub/[email protected]/providers/nats/nats.go:120 +0x58b
github.com/nats-io/stan%2ego.(*conn).processMsg(0xc00054a000, 0xc000276120)
/home/runner/go/pkg/mod/github.com/nats-io/[email protected]/stan.go:879 +0x2f9
github.com/nats-io/nats%2ego.(*Conn).waitForMsgs(0xc00054c000, 0xc00041c0c0)
/home/runner/go/pkg/mod/github.com/nats-io/[email protected]/nats.go:2412 +0x342
created by github.com/nats-io/nats%2ego.(*Conn).subscribeLocked
/home/runner/go/pkg/mod/github.com/nats-io/[email protected]/nats.go:3491 +0x4a5
","time":"2023-09-18T07:18:54Z","topic":"sosreports"}

This seems to result in the failure to run any further scripts until a manual restart is performed.

Improve README

Extend current readme to add the following information.

  1. System requirements
  2. How to setup keys and environment
  3. How to run the full workflow
  4. How to create a PR

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.