Code Monkey home page Code Monkey logo

100daysofcloudbackend's Introduction

Info

This repository holds the source truth for the backend architecture of the new 100DaysOfCloud website [in development currently]. The infrastructure is fully serverless.

Architecture diagram

arch-diagram

Branches

All development work happens on the dev branch. When changes are operational there, they get merged into the staging branch. Once that has been proven working, they get merged into the main branch.

CI/CD & Environments

As seen in the cicd folder, the deployment pipeline is done through CodePipeline and CodeBuild, which are integrated into this repository via webhooks. When changes are made for example on the dev branch, they get deployed automatically into its respective AWS environment (dev, staging, prod). The setup and workings of this pipeline can be seen in the cicd folder and the buildspec.yml file. CodeBuild is set up to deploy the infrastructure via SAM based on the contents of the buildspec.yml file.

The naming scheme of the stacks deployed is hdoc-{frontend/backend}-{stage}-{resource}-{randomID}. An example would be hdoc-backend-dev-cognito-HF34H2FBJ. There is no explicit naming of resources, every resource is named automatically by CloudFormation.

The CodePipeline (seen in the cicd folder) is manually deployed. When changes are made to it, you have to manually deploy it again with the commands below. The details of the deployment and its stack can be seen in the cicd/samconfig.toml file.

cd cicd/
sam build -t codepipeline-dev.template.yml
sam deploy -t codepipeline-dev.template.yml --profile mfa

Misc

MFA

If there is a need for CLI interaction with the 100DaysOfCloud AWS account, you have to authenticate your CLI with MFA credentials. Execute the following command which will return a pair of access keys and a session token. You have to create a profile in your CLI and have it use those credentials. The returned credentials are valid for 12 hours from creation. You have to execute this command with a CLI profile that has access to the account.

Get MFA credentials example

aws sts get-session-token --serial-number arn:aws:iam::{AWS::AccountId}:mfa/{User} --profile User@100daysofcloud --token-code mfa-code-from-device

Example entry in the ~/.aws/credentials file

mfa]
aws_access_key_id = example
aws_secret_access_key = example
aws_session_token = example

CodeBuild GitHub authentication errors

Sometimes you'll need to delete the source credentials of the codebuild, this is how you can list them and delete one. CodeBuild's GitHub integration and authentication is a bit buggy, you might need to reauthenticate if there are ever issues around that. CodeBuild stores the credentials for GitHub in an abstract way, this is how you can list or delete the key. There can be only one authentication present at any given time, you can't authenticate again if it's already authenticated.

aws codebuild list-source-credentials --profile mfa --region us-east-1
aws codebuild delete-source-credentials --arn arn:aws:codebuild:us-east-1:{AWS::AccountId}:token/github --profile mfa --region us-east-1

Lambdas with SAM in nested stacks

Most resources in the infrastructure are inside nested stacks. This is for clear seperation and to avoid resource limits per template in the future. However, this shines light on the lack of a feature in SAM, which is recursively building templates with the sam build command. This means that SAM will not package the external libraries for Lambda functions that are inside nested stacks. Because each Lambda function has its own requirements.txt file, the external libraries have to be manually installed within each folder and SAM will package the whole folder up and deploy. This is solved with a "for loop" in the buildspec.yml at the pre_build stage where it cycles through the Lambda functions' folders and pip install -r requirements.txt -t "$PWD" all of the required packages into their respecitve folders.

For example, the get_user_by_id Lambda function requires the external library simplejson. The buildspec goes into the folder, installs the libraries inside the folder next to the function's code (get_user_by_id.py) and that whole folder will be deployed as one by SAM into the Lambda function.

Testing

Testing of the Lambda functions happens at multiple stages but with the same process. Each PR gets tested on GitHub with the workflow seen in .github/workflows/backend-ci.yml which runs a pytest for all of the unit tests.

100daysofcloudbackend's People

Contributors

what-name avatar omenking avatar

Stargazers

Sameer Syed avatar  avatar

Watchers

 avatar James Cloos avatar Nana avatar  avatar

100daysofcloudbackend's Issues

Current get_user_by_id Lambda code does not work

@syedautherabbas
The current code in sam/api does not work well, or at all. I tried it on my machine and the tests don't run either. It has been merged because of outside reason but needs to be improved, it is high priority because we can't move further with improving the testing pipelines without test code. See #4

This first will also establish the directory and test structure of all coming APIs, it's important to get this right.
The CloudFormation templates are already written and ready to go (see sam/api.template.yml), only the code needs imrovement.

  • Figure out and improve the folder structure of that folder. Do we put the tests in their own folder at the same level as the Lambda code? If yes, how can we import the Lambda code into the test.py? (it is notoriously complicated, it's a real question)
  • Where exactly is __init__.py needed? Let's not have it where it's not neccessary.
  • AWS_DEFAULT_REGION does not need to be specified in the Lambda code itself, it goes to the resources that are in the region the Lambda itself is in. It is however required for the tests, with that exact env variable name. It doesn't hurt, but it's just one more env variable we need to pass in just to be the same as the default.
  • How does the URL path https://api.com/users/{userID} work and where is that {userID} stored in the request? In the QueryStringParameters? Or will the github_username be passed in the request body, in which case to we even need the {userID} URL path?
  • this should go into the get_user_details_by_Id function so we don't need to pass it in extra.
    table = dynamodb.Table(os.environ['databaseName'])
  • More allowed headers are required. "Access-Control-Allow-Headers" : "Content-Type,X-Amz-Date,Authorization,X-Api-Key,x-requested-with", as the APIs will likely use a Cognito authorizer.
    'Access-Control-Allow-Headers': 'Content-Type',

These are just a couple things from the top of my head, lets get this sorted together and with others' input as well, and we can move further.

Current PyTest error with the CI GitHub Action:

Screen Shot 2020-08-06 at 23 55 08

FYI, the GitHub Actions results (output) is only visible to users who have collaborator access on this repo. If you can't see it, that's why.

[API] GET /tags?limit=<int>

Expected Behavior

Return list of article tags ordered by the number of articles, with both tag name and article count. If no ?limit is specified, return all tags.

Current Next.js implementation

export default function returnFilteredArticles(req, res) {
    const { limit } = req.query;

    const number = limit ? limit : undefined;

    const tags = _(articles)
        .countBy('tags')
        .map(function (count, tag) {
            return {
                tag: tag,
                count: count,
            };
        })
        .orderBy('count', 'desc')
        .slice(0, number);

    res.status(200).json(tags);
}

[API] GET /articles/{?tag}?limit=<int>

Expected Behavior

Return a list of articles. If a tag is provided, return only the articles tagged with tag otherwise return all articles. If no ?limit is provided, return all the articles, either tagged or untagged.

Current Next.js implementation

export default function articlesHandler(req, res) {
    const { tag, limit } = req.query;

    const number = limit ? limit : articles.length;

    console.log(tag);

    if (tag) {
        res.status(200).json(
            articles
                .filter(
                    (article) =>
                        article.tags.toLowerCase() == tag[0].toLowerCase()
                )
                .slice(0, number)
        );
    } else {
        res.status(200).json(articles.slice(0, number));
    }
}

[API] GET /users?limit=<int>/<id>

Expected Behavior

Returns a JSON object for a specific ID. If there's no id, return a list of users. If no?limit is specified, return the full list of users.

Current Next.js implementation

Note: this implementation is both for GET/users?limit= and GET /users/{id}

export default function returnUser(req, res) {
    const { id, limit } = req.query;

    const number = limit ? limit : users.length;

    if (id) {
        res.status(200).json(
            users.find((entry) => {
                return entry.githubProfile == id;
            })
        );
    } else {
        res.status(200).json(users.slice(0, number));
    }
}

Create deployment role for SAM

Currently the sam deploy command in the buildspec.yml file gets executed with the CodeBuild project's permissions. Need to build a role into the hdoc-backend-{stage} template (template.yml or rather a nested stack seperately) that will be assumed by sam deploy and it allows access to all the resources that are and will be contained in the stack.

See current CodeBuild's extra permissions for example

- PolicyName: TemplateResourcesAccess

[API] GET /leaderboard?limit=<int>

Expected Behavior

Returns a JSON list of the top journeyers by cumulative score. I can choose how many users to return with a ?limit query string, otherwise it would default to 10. API Gateway can handle this with Template mappings.

Current Next.js implementation

export default function returnSortedUser(req, res) {
    // Catch all possible routes
    const { limit } = req.query;

    users.sort(function (a, b) {
        return (
            b.githubScore + b.twitterScore - (a.githubScore + a.twitterScore)
        );
    });

    // If there's no limit query string, default to 10
    const number = limit ? limit : 10;

    res.status(200).json(users.slice(0, number));
}

Modify the README.md and add a COD.md

#So i already created 2 pull requests i wasn't aware i should open an issue before doing that , @antoniolofiego pointed that out after i already did. Basically what i did was modify the READ.md and i noticed that there was no code of conduct , so jotted something real quick that can be modified at any point.

Branches for different environments

Info

The currently active branch is dev. This is where active work will be done on. When there will be a functional MVP of the backend and frontend, we will create a staging branch and deploy that into a staging environment on AWS as well . Once we have a production-ready MVP, we will merge all the changes from staging into main and a seperate CodePipeline will deploy that onto AWS as well into a prod environment, which will then be made public.

Steps to achieve staging

  • Create a solid system of CI/CD on the AWS side. Currently there is only one CodePipeline that monitors the dev branch and deploys changes to the dev environment on AWS. This environment is completely in the shadows for people who do not have access to the account (account access must be limited to least privilege for security reasons). Therefore, setting up a system that can make all the dev branches (frontend & backend) visible to "outsiders" (volunteers and contributors) has high priority. Once that's done, we can see all the changes deployed into a live environment and can iterate on that.

  • Implement PR testing as described in #4 - This is very important as to not deploy broken code even into the dev environment as much as possible. It is utterly important to write fully functional tests for each Lambda's code in a way that they can be executed without having to modify the PR testing workflow. The PR testing workflow therefore has to be built in a way that can execute new tests without needing modification. See corresponding issue.

  • Have the dev environment deployed and visible to everybody as described earlier. Once that's done, the staging branch can be "built" and deployed. This is important because all the changes made on the dev branch have to be available (public and deployed) to contributors for iterating and fixing issues.

  • Acceptable code / error / failure rates for different branches: (these are proposed numbers)

    • dev - 10-20%
    • staging - 5-10%
    • prod - 0-2%
  • CI/CD pipeline for the Frontend - this can be done with small modifications to the current cicd/codepipeline-dev.template.yaml file. Seperate issue can be found at #7

CI/CD pipeline for the Frontend

Info

The frontend repository can be found here.
There is currently no pipeline whatsoever or even a deployment of the new frontend. To be able to work on both the front and backend, there needs to be a complete dev environment deployed incorporating both.

The pipeline is very similar to the current cicd/codepipeline-dev.template.yaml template and could even be incorporated into that. I'm not sure if we should have two seperate and isolated pipelines for deploying changes in either the frontend or backend repositories, or the same pipeline should deploy everything together when changes are made in either of the repositories.

  1. Decoupled and seperate deployments
    vs.
  2. coupled and coordinated deployments

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.