Code Monkey home page Code Monkey logo

diem's Introduction

DIEM

GitHub issues GitHub forks GitHub stars GitHub license Build Status Helm

NodeJS Angular Python TypeScript Webpack Docker Kubernetes GitHub Actions

Python, Spark, REST, Scala, Pipelines, Scheduling, API, Custom Jobs, SQL Statements, Openshift, Cloud Native, Machine Learning, Sendgrid, Kubernetes, Slack, Cloud Object Storage, JDBC, Box

Diem can be used to create, display, execute and maintain data transfers between hardware and database platforms. It will cover how to create and manage transfers and assign them to a schedule to execute regularly without human intervention.

Diem provides a front end for SPARK ETL (Extract, Transform, Load) โ€“ an SQL data pipeline that can be used to synchronize data between RDMS platforms. Composed of individual transfer operations called jobs, the tool will execute SQL statements to select data from a source system and insert or replicate the data on a target system.

Diem allows the user to create scripts using the interpreted programming language Python, and to create sophisticated schedules using Cron (a work scheduler for Unix systems.) The combination of Python and Cron, along with the intrinsic ability to define and execute custom SQL statements, allows a range of activities from simple data transfers to more sophisticated job streams.

Diem also allows quick and easy definition of connections, as well as a scheduler and log display. An interface to Slack can be used to send the results of jobs to a specified Slack channel.

Application Features

Feature Feature Summary Benefits
Spaces Support for Multiple Organisations Multiple Organisations can make use of DIEM, each org can have it's own space. You can even have multiple spaces per Org and use it for test, pre-prod or production
Data Transfer NodyPy Fast transfer of small data sets <100 k using pandas jdbc sqlalchemy
Data Transfer Spark Bulk Transfer of big data using spark, both pyspark an scala.
Partition your data for paralel inserts.
Write you sql online and easy manage your job.
Include it in a pipeline.
Get notified via slack or mail
Custom Code Write your own python code Write your own python code using pyspark or python. Integrate your favorite library, use your jdbc connection, integrate your config maps, code snippets, webhooks all in one pl;ace, creating a unique experience
API Services Rest services for external use Create jobs that can provide REST Services. Connect external applications to your code and provide rest services for them
Machine Learning Embed Machine Learning in your code Make use of the latest ML Libraries like SciPy, matplotlib, seaborn, pandas etc.. to create machine learning models that can be used in your code
Connections DB2
Netezza
ProgreSQL
Many more
JDBC connectins into various sources, easy to add and manage.
Secrets kept secure if personal
Webhooks Bring in your own webhook Webooks can be to integrate into your applications. You can bring in your git or slack webhook and use it n your applications
Slack Slack Integration Either you use the default slack channels or bring in your own slack api key. All job progress are logged to your slack channels. You can even integrate them in your custom jobs. Provide custom content and subject messages
Pipelines Pipelins of Jobs Group your jobs together and form a pipleline. Start each job at the same time or in order. Manage dependences and organize them in steps
Scheduling Cron Schedule Schedule to run jor jobs using an advance Cron schedule that can handle any type of timeframe and schedule
Mail Mail Functionality Send mail on Completion or Failure of jour job to your audiance
Mail Integration Mail Functionality for your code Integrate mail functionality in your code, send data reports as html, csv , xls to your audience based on your query. Customize headers and body content.
Files Upload, Download or integrate files Each space is connected to it's own Cloud Object Storage Buckewt and can be integrated in your code. You can also specify any other COS instance
Box Upload, Download from BOX You can now directly download and upload files from Box
Config Maps Manage parameters and config values Config maps are vary usefull as you can spererate your code from it's values. They can be kept private and secure so you can use them for storing your own tokens.
Tags Define your own tags You can set up your own tags for easy job search, classification and job management
Templates Reusable or shared Templates Your code could be based of a template, that you can clone from , you can lso have shared code which is the same amongst your jobs but only different in configuration
Code Snipptes Reusabel adn sharable code Create reusable code, share use it in your jobs.
This allows you to reuse your code in multiple jobs, maintaining key code centrally
Job Log Audit trails of completed job Each started job will have it's own audit trail, so you can go back to view errors and integrate it in your reporting for performance review
Organization Organization Profile View your Profile and your access rights organisation
Organizations Organizations you belong to See all organisations you belong to
Space Selector Easy move between spaces You can at any time easily swtich between organisations your belong to

diem's People

Contributors

dependabot[bot] avatar huineng avatar imgbotapp avatar stevemar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

diem's Issues

Create Helm chart

Before you can really use diem, we will create a helm chart that will make it much easier to install the application

update pod annotations for ingress

When you want to deploy diem on your local cluster in combination with an ingress-nginx then the ingress itself requires an annotation

kubernetes.io/ingress.class: nginx

but this is not foreseen in the values to have specific annotations for ingress (only for all)
add an entry in values to allow specific ingress annotations

Move cron to diem-operator

Currently we're depending on redis-cluster for cron jobs. Redis is an external service that can have outages.
Move the cron job to the diem-operator (single pod) that will use nats to sent a minute message to trigger cron jobs

Mongoose warning

node:1) [MONGOOSE] DeprecationWarning: Mongoose: the `strictQuery` option will be switched back to `false` by default in Mongoose 7. Use `mongoose.set('strictQuery', false);` if you want to prepare for this change. Or use `mongoose.set('strictQuery', true);` to suppress this warning.
(Use `node --trace-deprecation ...` to show where the warning was created)

Remove URL GETPOST

this is no longer needed as we can do everything with custom job and the python request module

config maps

there's an issue

  1. that a persons name cannot be overwritten and
  2. if it's overwritten then check if that person has access to that diem space , if so continue else take the name of the user
  3. if it is personal then the view should not display the edit button and the user may not be able to save the document (even redacted)

icons have disappeared

with the fact only using font-awesome solid, some icons needs to be updated

  1. job detail , the scheduled icon is gone
  2. job log the 3 icons are gone

Add the id of the job that failed and check if there's no alternative handling this error

trace
[
 "@at $job.stop (jobStop)",
 "@at $job.logger (jobLogger) - handleMail",
 "@at $handle.mail (handleMail)",
 "@at $handle.mail (prepareMail)",
 "@at $mail.notifications (sendMail)",
 "@at $mailhanlder (newMail)",
 "@at $mail (sendMail)"
]
body: [object Object]
transid: 54d0c3ba-e2e3-0902-6c98-9840096ac3cf
caller: @at $job.stop (jobStop)
log: $job.start.handler (saveDoc): error

wrong connection

Describe the bug
A clear and concise description of what the bug is.

when 2 spaces use the same connection name, when the job is run , the first connection is taken and not the real connection
make sure when connection is looked up the target space is being used

Create a slackbot

A bot to support DIEM workloads can seriously improve productivity. Tracking , starting and stopping can be done from within slack. A bot can also provide more granular and personalised messages (metrics , performance, recommendations) that a web site cannot offer. In the backend we can also integrate AI and machine Learning

Some functionality

That can interact with DIEM, some features

  • Start Job
  • Stop Job
  • Log Job
  • get error job
  • get job history
  • get members
  • get pipeline jobs ?
  • set schedule (should i go that far ?)
  • replicate job (maybe a nice)

Something around approvals

  • create approval for job xxx (to be worked out)

Something around machine learning and AI

  • performance metrics
  • job recommendations and configuration settings

But also can provide some generic utilities

  • guid() create unique id
  • base64
  • encryption code
  • several others

host it in this github repo , but don't make it part of helm (yet ?). In that case maybe it's own helm or any other way to easily install it ?

Box: improvements

  • saveFile needs to return id and name
  • saveFile needs to do a try except and return error
  • fileInfo : get file information of a file
  • deleteFile: delete a file

security fixes

After a scan using https://cloud.appscan.com/ a few minor things to update

  1. Ingress add annotation
nginx.ingress.kubernetes.io/configuration-snippet: |
          more_clear_headers "Server";  
  1. express cookie add sameSite
const sessionConfig = {
  secret: 'MYSECRET',
  name: 'appName',
  resave: false,
  saveUninitialized: false,
  store: store,
  cookie : {
    sameSite: 'strict', // THIS is the config you are looing for.
  }
};

CVE-2023-29019 investigate and solve

@fastify/passport is a port of passport authentication library for the Fastify ecosystem. Applications using `@fastify/passport` in affected versions for user authentication, in combination with `@fastify/session` as the underlying session management mechanism, are vulnerable to session fixation attacks from network and same-site attackers. fastify applications rely on the `@fastify/passport` library for user authentication. The login and user validation are performed by the `authenticate` function. When executing this function, the `sessionId` is preserved between the pre-login and the authenticated session. Network and same-site attackers can hijack the victim\'s session by tossing a valid `sessionId` cookie in the victim\'s browser and waiting for the victim to log in on the website. As a solution, newer versions of `@fastify/passport` regenerate `sessionId` upon login, preventing the attacker-controlled pre-session cookie from being upgraded to an authenticated session. Users are advised to upgrade. There are no known workarounds for this vulnerability. ","","",""

reference
GHSA-4m3m-ppvx-xgw9

Terms of Use And Support

Update the term of use
Feed content from backend so that it can be updated from the database and tailored per installation

same for the support page

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.