Code Monkey home page Code Monkey logo

public-cloud-info-service's Introduction

CI Workflow status - Github Actions

Introduction

Public Cloud Information Service enables users to lookup Public Cloud images and services information via REST API. Image and Server information is tracked in a PostgreSQL database.

Prerequisites

Prior to running Pint Server, you must prepare an instance of PostgreSQL database with the up-to-date Pint Server schema and data.

  1. follow the instructions to install an instance of PostgreSQL from your favorite vendor

  2. clone the pint-data repo

  3. OPTIONAL: create the Python 3.6 development virtual environment. Skip this step if you are using an existing environment.

    ./bin/create_dev_venv.sh
    
  4. activate the development virtual environment

    source dev_venv/bin/activate
    
  5. OPTIONAL: skip this step if you are using a brand new virtual environment. Otherwise, keep the existing virtual environment up-to-date by running:

    pip install -r requirements.txt
    
  6. run the ./bin/schema_upgrade.sh CLI to perform scheme migration. The script itself is idempotent so it won't fail if the schema is already up-to-date.

    ./bin/schema_upgrade.sh -h db_host -U db_user -W db_password -n db_name --ssl-mode require --root-cert /etc/ssl/postgresql_ca_cert.pem upgrade
    

    NOTE: in a development environment where TLS is not enabled for the PostgreSQL instance, the --ssl-mode and --root-cert arguments are not needed.

  7. run the ./bin/data_update.sh CLI to perform data update. The script itself is idempotent so it won't fail if the data is already up-to-date.

    ./bin/data_update.sh -h db_host -U db_user -W db_password -n db_name --ssl-mode require --root-cert /etc/ssl/postgresql_ca_cert.pem update --pint-data /home/foo/pint-data
    

    NOTE: in the above example, /home/foo/pint-data is where you clone the pint-data repo. In other words, the XML data files are expected to be located in the /home/foo/pint-data/data directory.

    NOTE: in a development environment where TLS is not enabled for the PostgreSQL instance, the --ssl-mode and --root-cert arguments are not needed.

Quick Start

There are two ways you can run Pint Server service locally:

  1. as a standalone Flask application.
  2. as a serverless application via AWS Serverless Application Model (SAM) CLI with the embedded Lambda runtime emulator.

The former is recommended to test the application logic without the AWS layer baggage while the latter is good to test the Lambda function deployment readiness. In most cases, you'll only need to test your changes by running the standalone Flask application.

Runing Standalone Flask Application

To run the standalone Flask application:

  1. create the Python 3.6 development virtual environment

    ./bin/create_dev_venv.sh
    
  2. activate the development virtual environment

    source dev_venv/bin/activate
    
  3. update ./bin/run_standalone.sh with the correct PostgreSQL host, user, password, and database name.

  4. run the standalone Flask application. By default, it is listening for HTTP requests on port 5000.

    ./bin/run_standalone.sh
    
  5. open a separate terminal and test it with curl command

    curl http://127.0.0.1:5000/v1/providers
    

Running Serverless Application Locally via SAM CLI

To run the serverless application via SAM CLI:

  1. make sure both aws-sam-cli Python package is installed. If not, install it with pip.

    sudo pip install aws-sam-cli
    
  2. build the Pint Server Lambda function container image with make. By default, the container image is based on the SLES 15.2 base image.

    make aws
    
  3. update ./local_test_env.json with the correct PostgreSQL host, user, password, and database name.

  4. run serverless application

    ./bin/run_sam_local.sh
    
  5. open a separate terminal and test it with curl command

    curl http://127.0.0.1:5000/v1/providers
    

NOTE: to run the serverless application in debug mode, you can use the --debug flag. For example:

./bin/run_sam_local.sh --debug

Developing Unit Tests

Overview

For the purpose of unit testing, we are using MagicMock to handle the DB layer and manipulate the return values.

For example: When we mock the app.get_provider_images, in this stack: ` Flask app API handler -> app.list_provider_resource -> app.get_provider_images -> AlibabaImagesModel -> sqlachemy -> DB driver ` we intercept the call with our own fixtures instead of getting them from the DB.

Running The Unittests

Follow the following steps to run these unittests:

  1. Setup a python virtual environment

    ./bin/create_test_venv.sh
    
  2. Activate the python virtual environment created in Step 1

    source test_venv/bin/activate
    
  3. Run the unittests

    python -m pytest pint_server/tests/unit
    

Running the Functional Tests

Follow the steps below to run the functional tests:

Pre-requisite These functional tests expect the environment under test to be setup correctly.

  1. Setup a python virtual environment

    ./bin/create_test_venv.sh
    
  2. Activate the python virtual environment created in Step 1

    source test_venv/bin/activate
    
  3. Run the functional tests

    python -m pytest pint_server/tests/functional
    

By default, these tests run against https://susepubliccloudinfo.suse.com

You can pass the --base-url option to point to your pint api service.

For example:

python -m pytest --base-url http://localhost:5000 pint_server/tests/functional

To run the functional tests in a loop for a specified amount of time:

You can pass the options like --minutes, --hours, --seconds to pytest

python -m pytest --minutes 15 --base-url http://localhost:5000 pint_server/tests/functional

Running the Load Tests Using Locust

Follow the steps below to run the locust load tests:

Pre-requisite These load tests expect the environment under test to be setup correctly.

  1. Setup a python virtual environment

    ./bin/create_test_venv.sh
    
  2. Activate the python virtual environment created in Step 1

    source test_venv/bin/activate
    
  3. Run the locust load tests For example:

    locust -f pint_server/tests/loadtest/locustfile.py  --host http://localhost:5000 --headless -u 100 -r 10
    
    --host is where the pint service is running
    -u specifies the number of users to spawn
    -r specifies the number of users to start per second
    

If you want to specify the runtime for the loadtests, you can do so with the -t option: For example: .. code-block:

locust -f pint_server/tests/loadtest/locustfile.py  --host http://localhost:5000 --headless -u 100 -r 10 -t10m

How To Upgrade Schema

We are using Alembic framework to facility schema migration. For more details, see https://alembic.sqlalchemy.org/en/latest/tutorial.html.

Here's an example of a normal workflow for performing schema update.

  1. create the Python 3.6 development virtual environment

    ./bin/create_dev_venv.sh
    
  2. activate the development virtual environment

    source dev_venv/bin/activate
    
  3. update pint_server/models.py to reflect the latest changes

  4. copy pint_server/alembic.ini.sample to pint_server/alembic.ini

    cp pint_server/alembic.ini.sample pint_server/alembic.ini
    
  5. uncomment and set the sqlalchemy.url property in pint_server/alembic.ini to point to database to which to generate the next version of the schema. Make sure the database scheme is up-to-date prior to generate the next revision.

    NOTE: if your database password contains a percent character (%), make sure to escape it by replacing it with two percent characters (%%).

  6. auto generate the next revision. Note that Alembic will use the existing database as the baseline to generate the next revision so make sure the existing database is up-to-date. To auto generate the next revision:

    cd public-cloud-info-service/pint_server alembic revision --autogenerate -m 'add some table'

    If the above command is successful, you'll see the auto generate revision file in ./pint_db_migrate/versions/. The file is named <revision>_add_some_table.py.

  7. IMPORTANT: the auto-generated migration script may not have everything you need. Make sure to read the code carefully and make the necessary changes in order to complete the code.

  8. run ./bin/schema_upgrade.sh and ./bin/data_update.sh to perform scheme migration and data update respectively. The scripts themselves are idempotent so it won't fail if the schema and data are already up-to-date.

    ./bin/schema_upgrade.sh -h db_host -U db_user -W db_password -n db_name --ssl-mode require --root-cert /etc/ssl/postgresql_ca_cert.pem upgrade
    ./bin/data_update.sh -h db_host -U db_user -W db_password -n db_name --ssl-mode require --root-cert /etc/ssl/postgresql_ca_cert.pem update --pint-data /home/foo/pint-data
    

    NOTE: in the above example, /home/foo/pint-data is where you clone the pint-data repo. In other words, the XML data files are expected to be located in the /home/foo/pint-data/data directory.

    NOTE: The --root-cert is path to the file with the RDS CA bundle which can be obtained from https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem

    NOTE: in a development environment where TLS is not enabled for the PostgreSQL instance, the --ssl-mode and --root-cert arguments are not needed.

Testing Schema Upgrades

Once you have developed a schema upgrade, to verify that it works correctly you will need to perform the following validation steps:

  1. Create a DB instance using the old schema, populated with representative data, either real or synthesised.
  2. Pick a set of representative entries in any tables that are affected by the schema migration and stash their contents for later comparison. Similarly run some representative queries against the pint-server REST API, and stash the results for later comparison.
  3. Perform the schema migration on the DB and validate that the migration worked correctly, e.g. * any new columns that were added have the expected values (if not null) * deleted columns have been removed * additional tables and associated resources (e.g. sequences or primary keys) have been added * removed tables and associated resources (e.g. sequences or primary keys) are no longer present * renamed tables and any associated resources (e.g. sequences or primary keys) have been renamed correctly * primary key definitions have been updated/removed.
  4. Check that the contents of the representative rows in the relevant tables have the equivalent contents, allowing for schema migration, to what was there before the migration. Similarly verify that the pint-server REST API returns equivalent results for those queries whose results were saved.
  5. Test that new rows to the affected tables works as expect, thus verifying that any validators are working correctly after the schema migration.

public-cloud-info-service's People

Contributors

bear454 avatar cjainsuse avatar dependabot[bot] avatar guangyee avatar jeremy-moffitt avatar jesusbv avatar keithmnemonic avatar mbelur avatar rjschwei avatar rtamalin avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

public-cloud-info-service's Issues

Split DB schema migration and data import

The automation pipeline for image updates will not do a data import each time the schema is updated. Currently an import is done with every migration.

Acceptance Critera:

  • its possible to do a schema migration without doing a data import

Allow users to search for a bug # in the package changelogs

We frequently get asked, "Is my bug fixed in the latest release?" and currently there's no easy way to determine that. Now that all package changelogs are included in the changelog, and bug numbers listed in there, it would be nice to be able to search for a bug number in the package changelogs and see which product version has that fix. Ideally both through the UI and cli.

Handle google regions for image queries

Now that the client and server can provide region information it is more obvious that the data for Google is inconsistent.

Because Google treats images by name and the name of images in all regions is the same the Google data has no region specific entries for images. However this results in empty queries when on the client side

pint google images --active --region europe-north1

is used. Now that the query of region names is possible it is obvious that this inconsistency exists. Server side we should handle this and return the proper image, i.e. for Google ignore the region setting and treat it as "global".

Expose active DB Alembic schema version through the REST API

Similarly to how we expose the package version, provide a REST API query (/schem-version) which returns the active Alembic schema version, which can be used to confirm/validate that schema migrations have been applied.

Potentially implement a means, via the internal DB API, by which the expected schema version for the DB can be determined programmatically to allow automated checking that the schema is up-to-date.

Region query

When the user issues a query for the regions in a given framework the server can only return the regions that have active images.

There exists a case where a provider may close/remove a region. This action would not cause us to purge the historical data form the DB for images that were released in that region in the past. However, once the region is gone there will be no active images in the tat region and the removed region should no loonger be returned in the region list query.

Keep n-2 versions of the server container on registry.suse.de

As a sysadmin deploying the pint service in a container, I want to easily be able to switch back to either of the 2 most recent versions other than latest in the event that there is a problem with the latest version.

Currently registry.suse.de only keeps the latest version of the container

Acceptance Criteria:

  • at least 3 total versions are available on registry.suse.de
  • other than 'latest' the n-1 and n-2 versions should also be available.

Redirect requests for regions that have no infrastructure data via a lookup table

To avoid needing duplicate entries in the servers table for regions that do not have update infrastructure, there should be an automatic lookup of the information via lookup table when calls are made for those regions.

update pint-ng to add a redirector entry for regions that have no update infrastructure, populate a redirection with a regionname instead of an IP address and server information itself. In pint-ng we can read that field in the DB and return that information

no region information for frameworks where images are global

When we publish image information for a framework that has global images and no infrastructure we cannot provide user data for the regions API. For example:

https://susepubliccloudinfo.suse.com/v1/oracle/regions.json

returns an empty. We either need to provide generic information, such as "No region information available. Images have the same identifier in all regions", or we need to have some way of adding the region information outside of images and server descriptions.

can add same IP address as server more than once.

The following sequence does not return any errors:

61 pint-data-manager amazon servers --mode add --type update --ip-address 107.22.231.220 --region us-east-2
62 pint-data-manager amazon servers --mode add --type update --ip-address 107.22.231.220 --region us-east-1
63 pint-data-manager amazon servers --mode add --type update --ip-address 107.22.231.220 --region us-east-2
64 pint-data-manager amazon servers --mode add --type region --ip-address 107.22.231.220 --region us-east-2
65 pint-data-manager amazon servers --mode add --type region --ip-address 107.22.231.220 --region us-east-1
66 pint-data-manager amazon servers --mode add --type region --ip-address 107.22.231.220 --region us-east-2
67 pint-data-manager amazon servers --mode add --type region --ip-address 107.22.231.220 --region us-east-1

data_update.py incremental update broken for microsoftimages table

This was a side effect of the recent primary key schema changes; the microsoftimages table has been switched to using a unique sequence column, rather than using a primary key composed of exist columns, so the existence checking fell back on whole row checking, which caused duplicate entries to be added.

This can be worked around by using an override set of columns to use when checking for existing entries.

Add validation for date fields

It would be very helpful to have DB checks/constraints to ensure the following dates fields are always in chronological order

publishedon
deprecatedon
deletedon

I.e deleted must be greater than or the same as deprecated, and deprecated must be greater or the same as published.
This also implies deletedon is greater or the same as published.

Schema migration missed some data

Oracleimage table data is missing from the migration data after updating the schema.

This may be an opportunistic time to revisit the model for how the schema migration is done.
Consider adding validation rules and auto-generating migrations via alembic.

Automate container build when the version is incremented

When the version is incremented (see #67 for related issue), the container that runs the production version of PNG should be automatically built.

Acceptance Criteria:

  • when a PR lands on master that increments the version, the IBS container for pint-server should be rebuilt automatically
  • changes that don't increment the version should not trigger the container build

changing the image state resets the entries filter and the search terms

Currently if you change the image state filters - active, inactive, deleted, etc. it resets the # of entries selection as well as any search terms, it would be good to save these selections when you change the image state filter.

this doesn't seem to occur when you change the location/environment.

Enhancement: Utilize bumpcfg for version updates

For the new version API (see #65 ), enable automated version updates using bumpcfg or a similar mechanism.

Acceptance Criteria:

  1. the minor version in the version file is updated automatically as part of the PR merge process
  2. labels or a manual invocation of the bumpversion tool can update major parts of the version number
  3. the automated process can handle the scenario where someone manually updated the version number

Add a summary and sanity check to the migration/import mechanism to compare the DB to the imported xml

After performing a migration or import of data from XML, the import process should sanity check the imported data by counting rows and ideally performing a comparison against the original data that was imported.

Acceptance Criteria:

  1. import tool outputs a count of records in each table after the import
  2. import tool includes a count of records in the source xml for comparison
  3. (aspire/stretch) import tool does a direct compare of each record imported and flags mismatches between the xml and database

amazon, google and microsoft servers, and microsoftimages queries return new surrogate id fields

Just noticed this issue when helping Brett debug why pint microsoft images failed but pint microsoft images --json worked when using a hacked version of pint that points at local test pint-ng server instance.

After some debugging I determined that pint was hitting an error when rendering the XML of the id field, because it was an int, not a string or unicode.

So this represents an incompatibility between the old and new pint server implementations as a result of the primary key updates.

% curl -s http://pdm-dev:5000/v1/google/servers | head
{
  "servers": [
    {
      "id": 1, 
      "ip": "130.211.242.136", 
      "ipv6": "", 
      "name": "", 
      "region": "asia-east1", 
      "type": "regionserver-sles"
    }, 

Validate schema migration can be done with data in-place

As a future maintainer of pint, I want to be confident that if I must make a schema update, that I will not be required to re-create the database. Once the database is the sole source of truth for pint data, it must be possible to update the schema with zero loss of data.

Acceptance Criteria:

  • in a test environment, perform an arbitrary but reasonable schema update simulating something like adding a new CSP
  • validate that the data output from the impacted APIs is identical before/after the schema update

Add a dbdump to the migration process to allow rollbacks of schema changes

In order to support the ability to rollback schema changes, a copy of the database should be created prior to performing a migration. This should be automatic.

Acceptance Criteria:

  1. Migrations automatically create a dbdump prior to migration
  2. Create a set of instructions for rolling back the database using the dbdump

Enforce RO only mode

The pint server code must enforce and ensure that the DB is accessed in RO mode. The server has to self ensure that the connection to the DB is not mis-configured to allow write access to the data.

Query for image deletion

Our image deletion date is defined as deprecation-date + 6 months. We have a user of the API that is interested in being able to query the expected deletion date. For this we have 2 use cases

1.) Receive a list of all images to be deleted on or before a given date

  • The user must be able to filter this result set by framework and region

2.) Get the expected deletion date for a given image name

The primary keys for images need to be corrected

The images table is using incorrect primary keys in some cases. The keys should be as follows:

AlibabaImages

  • name, region

AmazonImages

  • name, region

GoogleImages

  • name, project

MicrosoftImages

  • name, environment

MicrosoftRegionMap

  • environment, region

OracleImages

  • id

Enforce Changelog consistency in data model

The datamodel should be updated to require the following of image entries:

  1. all changelog entries should end in a forward-slash / character . If a slash is not present, the validator should add one
  2. If an image has a non-empty changelog entry for one region, it must have a non-empty changelog entry for all regions
  3. It must be possible to apply this update in-place without re-creating the database, so this change will require a migration.

This should be done as a model validation.

Data update script should be more verbose when not using --debug

The update_data.sh script does not provide sufficient indications of on-going progress as the data is being updated, but not as noisy as --debug.

Acceptance Criteria:

update_data.sh should output on a reasonable cadence signs of progress

Depending on the timeframes involved, this may be a status update once per batch of updates to a table, or a time based update where every minute or so the script reports "## records have been updated". There is some flexibility in the specifics, but it should be less noisy than --debug and more verbose than the current version, which may go silent for minutes at a time.

Update README with api definition and links to the pint client

The README file should include information useful to developers, such as the basic API definition and a link to the PINT client. The previous README (for the ruby implementation) had that information.

Acceptance Criteria:

  1. README includes a basic definition of the PINT rest API
  2. README includes a reference to the PINT client repo

Add Unique Keys to image tables

The database key discussion that led to #89 included a plan for unique keys on several of the image tables. Based on testing, the unique key is still required for MicrosoftImages (the other image tables enforce uniqueness via the Primary Key). Need to add the following UK to the schema:

MicrosoftImages - UK: image_name,environment

pint-ng supports split schema for servers

The pint database schema is being updated to split the update and region servers into separate tables. PINT-NG needs to be able to continue to serve up the full set of servers after the database is updated. This may be done via a join or some other mechanism and should appear seamless to the end user. The script that imports the data needs to sort the servers into the appropriate tables based on the server type.

Acceptance Criteria:

  • the server queries in pint continue to consistently return data for all server types after the server types are split into separate tables in the schema
  • pint_server/data_update.py correctly sorts servers into the appropriate tables based on the server type

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.