agola-io / agola Goto Github PK

Agola: CI/CD Redefined

License: Apache License 2.0

Jsonnet 0.32% Dockerfile 0.05% Makefile 0.09% Go 99.52% Shell 0.02%

ci-cd golang high-availability distributed devops continuous-integration continuous-delivery continuous-deployment kubernetes docker

agola's People

Stargazers

Watchers

Forkers

simonerota sgotti amreo srkntrn luckystarr88 alexvava sanderos87 gromanenghi squisher46 robertocangiano diegodinicola corcioneangelo jakfromspace afallone abisi2019 fedorico94 lleoni ninefive pierguido ming-fork camandel samug rafaeljesus devops-build mazzupa rcmaiolini it-fork junneyang calsaviour mitalibo 6543-forks patcito sorlandi-sorint pks-os laashub-soa operationbird byebye758 mirdhyn bskiefer matm weasy666 bignuoli xiaochong547 dut3062796s eagle9527 weilaizhou isgasho huangweiboy2 lukehuang ctmcisco jesusoctavioas clix-dev-llc devopstoday11 codions-forks 02bx megamcloud amulmgr luobintianya laiiihz showsmall nitrotm lihongchen shakeyo generalcommission deespater ygbillet lobinte jbwill36 pauloo27 cc8848 dberetta alessandro-sorint khorevaa sa-git4 jacobjohansen tulliobotti64 99rgosse faireal raeyulca bugfyi f4bio kokizzu carlfranz fproof zplll seralius heyxhh jaygith prskr kvaster fencholcn pandalec aryanjswl luis-sousa-pinto afumagalli98 1511563371 colynn-demo hivestead thrionratern devopsshravan

agola's Issues

Handle remote repository rename

Add an API to update the remote repository path:

Manual (entering new path)
Automaitic (fetch remote repo info using the remote repo id and update the path with the new received path)

Additional notes

How currently repository renames are handled depends on the git source:

github: implements redirects for web and git urls. So everything should work in agola also after a rename until a new repo with the previous name is created (this will override/remove the automatic redirect)
gitea: doesn't provide a github like redirect mechanism. So after a rename agola will break since git clones won't work
gitlab: to investigate

A way to automatically get the new repository path depends on the git source:

github: has an "hidden" but reported as stable api to get a repository by id
gitea: currently no way to get a repo by id
gitlab: has a stable api to get a repo by id

Project option to disable passing variables to PR

Currently the variable system let user filter variables by branch, tag, ref so just setting a when condition on all branches/tags or specific refs makes possible to not pass any variable to pull requests. This is very useful since a malicious user could just open a pr adding a run step that executes the env command to print all the environment variables and if some of these variables is defined in the config as from_variable it could leak some secrets.

To make all of this simpler and less error prone and avoid users explicitly define a when condition to exclude pull request we could just add a project option (perhaps enabled by default) to not pass any variable when the run is triggered by a pull request.

This should probably be done only on forked pull requests since pull request from the same repos means that the users have access to the main repo.

runservice: mark executors as not alive

Document our view about sharding

The microservice nature of agola will in future let use shard configstore resource and runservice runs to achieve a greater scaling. Since this is a complex argument let's write a doc about our view on how sharding could be implemented (with its constraints, required changes etc...)

runservice: run deletion (automatic or manual)

Add the ability to:

manually delete some runs from the runservice
automatic delete runs based on some criteria (older than N days, no more than X runs per project etc...)

runservice: global and runconfig runs and tasks timeout

services/cmd: add version endpoint and version reporting

Add a "version" endpoint to every agola service.
Add a "version" command to the agola command to report the gateway version and the client version.

External oidc auth

Currently we can do oauth2 auth using the remote source (gitea, gitlab, github).

Also provide an external (not related to git remote sources) oidc auth

configstore: garbage collect entries with unexisting parent

If a user, org, projectgroup, project etc... is removed, the child resources should be garbage collected.

Currently, on the read side, the child resources without a parent should already be ignored (there're tests to check this)

We could also do the removal of all the child resources in the same transaction but garbage collection could be a better approach to deal with a lot of child resource and is the unique way if in future we'll split/shard configstore resources to multiple configstores.

datamanager: add architectural documentation

Document how the datamanager works

k8s driver: option to set task pods namespace

Currently the k8s driver starts task pods in the same namespace where it's deployed. Add an option to define in which namespace the task pods should be executed

integration / e2e tests

Create and improve integration and e2e tests. Currently we have some local integration/e2e test not yet committed because they contains a lot of hacks. They should be committed just after the launch so we could better detect regressions.

They require an external source and currently we are using gitea since it's the easiest to setup inside CI.

Add local auth

Placeholder for local auth. It'll require (at least):

Adding a password field to the user (using bcrypt format)
Adding email field to the user
Email verification
2FA

Run: restart run (or tasks) recalculating variable data

Add the ability to restart a run or some of its tasks updating the task variable data. This could be useful if some variables have changed (like auths) so we want to update them and then restart the failed tasks.

Run restart logic is done inside the runservice and the runservice is made to don't know anything of the upper layers (webhook, git, configstore data) and for this reason it receives the tasks envs and docker auths already populated by the upper layer (currently the gateway).

So we should:

Save the original config somewhere (directly inside the runconfig as an annotation but this could increase a lot its size at every load or perhaps in a dedicated entry that could be loaded only when needed)
Recreate the runconfig tasks that we want to restart and send them to the runservice when calling the recreaterun api
The runservice on a recreaterun request could verify and update the task Environment and DockerRegistryAuth task entries)

runscheduler: create run manually

Add an API to create a run manually specifying the project and the remote repository branch/ref and sha.

If no sha is provided, to keep run reproducibility we shouldn't use the HEAD but always populate a commit sha. To do this the gitsource should implement a method to fetch the current HEAD and save it in the run annotations and environment (AGOLA_ variables)

k8s driver: use/create a custom service account for task pods

Currently task pods runs with the default service account but, to avoid pods being able to talk with the underlying k8s cluster api the account secrets aren't mounted.

An additional measure could be to use another already existing service account (it should be configured with no permissions or with the agola admin preferred permissions) (or automatically create one).

runservice: run task groups

Add a new concept of run task group where users can group task together. It’ll look like a sort of subrun. Task groups can have dependencies between them like tasks (with also conditions on_success and on_failure), a group will start only when its parent is fully completed and could be useful for multiple use cases:

Notification: create a subgroup that will fire notification tasks when the primary group finishes (in any state or only when success or when failed)
Cleanup/rollback: create a subgroup that will start when a deploy task group fails.

Currently the “rollback” case is already possible but requires some more configuration effort: the user have to make the “rollback” tasks in the above examples depend on all the possible deploy tasks since the failure of any deploy task should trigger a rollback.

The “notification” case currently will be very ugly to implement since the notification tasks must depend on every other task

runservice: don't schedule on not alive executors

Add ability to require run start approval

Use a when condition to define if a run needs to be approved before starting.

This config should live in the project configuration.

An use case is to validate, when a contributor opens a PR changing the run config definition with the intent to do bad things (like executing a malicious program or trying to leak secrets writing environment variables to stdout, also if this is already avoided thanks to the dynamic variables feature)

Another enhancement to the above proposal is to check if the .agola/config.* file have changed and require an approval only when it's changes. In this way if an user changes the config file (for good or malicious reasons) the run will require an approval.

runconfig: ability to stop a task from a step

gitserver: automatic cleanup of old repos/branches

datamanager: better test to handle eventual consistency

Since the "original" s3 and other implementations (but not ceph rgw or minio) are eventual consistent, the datamanager was designed with this in mind. But since we are testing it with a posix fs and ceph rgw and minio, we aren't affected by any eventual consistency issue.

We should find a way to test possible s3 eventual consistency scenarios to check that everything is handled correcly and fix possible issues.

Organization Teams and Roles (RBAC)

This issue is a placeholder to describe how org teams and roles will be implemented.

Organization Teams

Teams are a group of user
Teams will be a hierarchical. Child teams will inherit users and permissions from parent teams
Every team will have permissions to projects/projectgroups in the form of Roles

Roles (RBAC)

A role is a group of permitted actions on specific (project/projectgroup)
An action is mapped 1to1 to a gateway api action
A group of predefined roles (owner, authorizer etc...) will be provided by default
In future there could be the possibility to define custom roles

Open questions

A special case is Task authorization. We'd like to be able to let some user authorize only some tasks in a Run (i.e. some user can authorize a deployment to a testing environment while only some other users can authorize the deployment to production). This will require the ability to describe how a role can be applied only to a specific task "selector" (task name, regexp etc...)

runconfig: jsonnet external libraries

Let the user provide imports for jsonnet config from external source (starting with git).

This will require defining our own import callback and thinking about the import format and the required features:

reproducible builds (prefer the use of a commit sha/tag)
remote import source authentication (use an existing agola project or provide git credentials? how?)

Skip runs creation via special commit messages

Add option to handle special commit messages like [ci skip] to skip runs creation

datamanager/readdb: move readdb parts inside datamanager

Now the runservice and configstore have distinct readdb implementations that share a lot of common parts (events handling, reinit etc...). These parts are very related to the datamanger and should live inside it and used by the readdbs

Add a cloneOptions item to clone step

Would be useful an "options item" to clone step in order to specify additional clone options.
Example --recurse-submodules, --depth, and so on.

runservice: add step when conditions

org: user invitation

Add ability (with related option to enable/disable it) to invite an user instead of directly adding it regardless of its consent.

go errors: future steps

With the move from pkg/errors to x/errors we lost full stack trace and get only the frame at the error line (using : %w).

Now in go tip (future 1.13) the error frames were also removed (and fmt.Errorf accepts %w everywhere in the format string (not only : %w). We have some options:

we could continue using x/errors (so keep the frame info) but it needs to be updated to accomodate changes from go 1.13 (there's a pending CL, it currently doesn't compile) and do fmt.Errorf("%w",err) at every error return. This will print the frames for every error wrapping. It's not like a stack trace but it could be enough to understand the callers...
Don't use x/errors but just go 1.13 and leverage new errors Wrapper interface to add our custom error functions that will add a stacktrace and other goodies:
- Uses the new Wrapper interface
- Add the ability to take stack traces using something like errors.Wrap at "every return" but avoiding adding (and printing) a stacktrace for every errors.Wrap if there's already a more detailed stacktrace (taken by a previous wrapping).

gateway: pass env variables to direct run creation

Let the user provide custom environment variables when doing a direct run creation. Useful for debugging to override the generated env vars. Will break run reproducibility.

executor: implement podman driver

Add podman executor driver.

docker auth: add support for credential helpers

Useful when users have to pull images from amazon ecr registry or other registries that require special authentication.

external secrets provider

The secret and variable logic has already been designed to handle external secrets providers (like hashicorp vault).

The idea is to be able to support multiple user defined secrets providers (not only one per instance), defining them at the projectgroup/project level.

We just have to implement this integration. Also related to #13 since we have to carefully think how the sub projectgroups/projects inherits the secret provider definition (my idea is to not inherit them or any owner of a child project could use the parent secret provider)

runservice: more powerful run restart

Now we can restart a run from scratch or from failed tasks.

What's not yet implemented is restarting a run from a user defined list of tasks that can be also already finished with success.

datamanager: remove old wals

Currently we are keeping all the wals in the object storage (primarily for debugging purpose).

But it's time to implement removal of already applied wals (we should start removing all the wal starting from the one required by the older DataStatus entry)

*: improve and expose event streaming apis

Currently we have initial event streaming for runs events provided by the runservice. It's currently used by the notification service.

Its api and implementation could be improved a lot:

ability to resume streaming from a specific event (so the client when reconnects will not lose past events)
perhaps read events from local readdb instead of etcd for improved scaling
implements similar api also for the configstore
document and expose this api (and other possible event api) for additional services (since this is the way to extend agola)

Project creation issue if webhook registration fails

If i try to create a new project into Agola and something goes wrong like webhook or deploy key registration, would be better if project not registered in Agola.

If project creation process goes wrong, we need to delete and recreate to resolve

datamanager/configstore: backup and restore

Add ability to take a backup and restore it. There can be two kind of backups:

A full dump of the resources (in json or other format)
A data level backup containing the data files and the wals.

configstore/gateway: move projectgroups/projects between users/orgs

executor: embedded git

It could be useful to provide a git implementation (at least for checkout) inside task so user don't have to:

build custom images providing the git command or do a custom t.
or
checkout and save the checkout dir in the workspace inside a task using an image providing git and then restore the workspace in another task.

runservice: log/workspace data deletion

Add the ability to automatically delete runs logs/workspace data based on some criteria (older than N days etc...)

gateway/runservice: add api to delete step logs

users should be able to delete step logs (i.e. if they noticed that the logs leak some private information)

runservice: ability to share the same cache to multiple projects

The caches are per project to avoid issues between projects. Some users would like to use the same cache between multiple project.

We should find a way to define this (i.e. cache groups)

Report when a project linked account is invalid

If a project is related to a user linked account that is invalid (i.e. removed) the "projects" api should report this (and this should displayed in the ui).

runconfig: move some env variables from static to dynamic environment

All the env variables generated when receiving a webhook or by manual run creation are currently saved as "static" env vars. Static env vars are kept the same between run restarts. Some of them could be moved to a dynamic env and be recalculated when the run is restarted without impacting run reproducibility. Some of them could be:

AGOLA_SSHPRIVKEY, AGOLA_SSHHOSTKEY, AGOLA_SKIPSSHHOSTKEYCHECK: the remotesource config could have been changed so they should be refetched and updated when submitting a restart run.
AGOLA_GIT_HOST, AGOLA_GIT_PORT: This will require fetching the repo info at every run recreation (possible issue with api limits).

Manually create a Run from a project

Add an API to create a run (without waiting for a webhook).

The api will require the user to specify which branch/ref to start the build on. Users could also specify a specific commit sha available in that ref.
If no commit sha is provided the sha referenced by the ref will be used.

run config: Add run step tty option

By default tty is true, let the user choose to disable it with tty: false

notification: publish also tasks statuses

Currently we publish the commit status for the whole Run. It could be useful to also push statuses for specific (or all) tasks.

agola client config file

Define an agola config file format (like kube config) with multiple "contexts" (referencing an agola api url, token etc...) and add the agola related commands to use the config file (to avoid specifing every time the api url and the token) with the default context or the choosed context.

Also add related commands to manage the config file.