Code Monkey home page Code Monkey logo

mlmd-operator's People

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlmd-operator's Issues

Update `mlmd` manifests

Context

Each charm has a set of manifest files that have to be upgraded to their target version. The process of upgrading manifest files usually means going to the component’s upstream repository, comparing the charm’s manifest against the one in the repository and adding the missing bits in the charm’s manifest.

What needs to get done

https://docs.google.com/document/d/1a4obWw98U_Ndx-ZKRoojLf4Cym8tFb_2S7dq5dtRQqs/edit?pli=1#heading=h.jt5e3qx0jypg

Definition of Done

  1. Manifests are updated
  2. Upstream image is used

Fix integration test

The test_using_charm integration test seems to send empty metadata and receive an empty response, giving a false-pass result. This can be observed in CI runs and locally:

$ tox -e integration -- --model=mlmd-test --keep-models
integration installed: asttokens==2.0.8,attrs==22.1.0,backcall==0.2.0,bcrypt==4.0.0,cachetools==5.2.0,certifi==2022.6.15,cffi==1.15.1,chardet==3.0.4,cryptography==37.0.4,decorator==5.1.1,executing==1.0.0,google-auth==2.11.0,idna==2.10,iniconfig==1.1.1,ipdb==0.13.9,ipython==8.4.0,jedi==0.18.1,Jinja2==3.1.2,jsonschema==3.2.0,juju==3.0.1,jujubundlelib==0.5.7,kubernetes==24.2.0,macaroonbakery==1.3.1,MarkupSafe==2.1.1,matplotlib-inline==0.1.6,mypy-extensions==0.4.3,oauthlib==3.2.0,oci-image==1.0.0,ops==1.2.0,packaging==21.3,paramiko==2.11.0,parso==0.8.3,pexpect==4.8.0,pickleshare==0.7.5,pluggy==1.0.0,prompt-toolkit==3.0.30,protobuf==3.20.1,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pycparser==2.21,Pygments==2.13.0,pymacaroons==0.13.0,PyNaCl==1.5.0,pyparsing==3.0.9,pyRFC3339==1.1,pyrsistent==0.18.1,pytest==7.1.2,pytest-asyncio==0.19.0,pytest-operator==0.22.0,python-dateutil==2.8.2,pytz==2022.2.1,PyYAML==6.0,requests==2.25.0,requests-oauthlib==1.3.1,rsa==4.9,serialized-data-interface==0.3.5,six==1.16.0,stack-data==0.5.0,theblues==0.5.2,toml==0.10.2,tomli==2.0.1,toposort==1.7,traitlets==5.3.0,typing-extensions==4.3.0,typing-inspect==0.8.0,urllib3==1.26.12,wcwidth==0.2.5,websocket-client==1.4.0,websockets==7.0
integration run-test-pre: PYTHONHASHSEED='3683306377'
integration run-test: commands[0] | pytest -vv --tb native --show-capture=no --log-cli-level=INFO -s --model=mlmd-test --keep-models /home/ubuntu/mlmd-operator/tests/integration
============================================================================ test session starts ============================================================================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 -- /home/ubuntu/mlmd-operator/.tox/integration/bin/python
cachedir: .tox/integration/.pytest_cache
rootdir: /home/ubuntu/mlmd-operator
plugins: asyncio-0.19.0, operator-0.22.0
asyncio: mode=strict
collected 2 items                                                                                                                                                           

tests/integration/test_charm.py::test_build_and_deploy SKIPPED (unconditional skip)
tests/integration/test_charm.py::test_using_charm 
------------------------------------------------------------------------------ live log setup -------------------------------------------------------------------------------
INFO     pytest_operator.plugin:plugin.py:653 Connecting to existing model my-controller:mlmd-test on unspecified cloud
------------------------------------------------------------------------------- live log call -------------------------------------------------------------------------------
INFO     root:test_charm.py:39 Using temporary directory /home/ubuntu/mlmd-operator/.tox/integration/tmp/pytest/test_using_charm0
INFO     root:test_charm.py:40 cwd = /home/ubuntu/mlmd-operator
INFO     root:test_charm.py:41 script_abs_path = /home/ubuntu/mlmd-operator/tests/integration/data/interact_with_mlmd.sh
+ MODEL=mlmd-test
+ echo MODEL=mlmd-test
MODEL=mlmd-test
+ wget https://raw.githubusercontent.com/google/ml-metadata/master/ml_metadata/proto/metadata_store.proto
--2022-08-31 09:33:53--  https://raw.githubusercontent.com/google/ml-metadata/master/ml_metadata/proto/metadata_store.proto
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 36972 (36K) [text/plain]
Saving to: ‘metadata_store.proto’

metadata_store.proto                        100%[========================================================================================>]  36.11K  --.-KB/s    in 0s      

2022-08-31 09:33:53 (110 MB/s) - ‘metadata_store.proto’ saved [36972/36972]

+ wget https://raw.githubusercontent.com/google/ml-metadata/master/ml_metadata/proto/metadata_store_service.proto
--2022-08-31 09:33:53--  https://raw.githubusercontent.com/google/ml-metadata/master/ml_metadata/proto/metadata_store_service.proto
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 39192 (38K) [text/plain]
Saving to: ‘metadata_store_service.proto’

metadata_store_service.proto                100%[========================================================================================>]  38.27K  --.-KB/s    in 0s      

2022-08-31 09:33:53 (89.3 MB/s) - ‘metadata_store_service.proto’ saved [39192/39192]

+ wget -O- https://github.com/fullstorydev/grpcurl/releases/download/v1.8.0/grpcurl_1.8.0_linux_x86_64.tar.gz
+ tar -xzv
--2022-08-31 09:33:53--  https://github.com/fullstorydev/grpcurl/releases/download/v1.8.0/grpcurl_1.8.0_linux_x86_64.tar.gz
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/111431261/88be8800-4f53-11eb-94fd-6a7ef1143069?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220831%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220831T093354Z&X-Amz-Expires=300&X-Amz-Signature=34b32b5c3850e69f87bbdeb08d6ef42aa04ac13e68afcb2337fe13692efddfc6&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=111431261&response-content-disposition=attachment%3B%20filename%3Dgrpcurl_1.8.0_linux_x86_64.tar.gz&response-content-type=application%2Foctet-stream [following]
--2022-08-31 09:33:54--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/111431261/88be8800-4f53-11eb-94fd-6a7ef1143069?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20220831%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220831T093354Z&X-Amz-Expires=300&X-Amz-Signature=34b32b5c3850e69f87bbdeb08d6ef42aa04ac13e68afcb2337fe13692efddfc6&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=111431261&response-content-disposition=attachment%3B%20filename%3Dgrpcurl_1.8.0_linux_x86_64.tar.gz&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4986341 (4.8M) [application/octet-stream]
Saving to: ‘STDOUT’

-                                             0%[                                                                                         ]       0  --.-KB/s               LICENSE
grpcurl
-                                           100%[========================================================================================>]   4.75M  --.-KB/s    in 0.1s    

2022-08-31 09:33:54 (35.2 MB/s) - written to stdout [4986341/4986341]

+ mkdir -p ml_metadata/proto/
+ mv metadata_store.proto ml_metadata/proto/
++ kubectl get services/mlmd -n mlmd-test -oyaml
++ yq e .spec.clusterIP -
+ SERVICE=10.152.183.202
+ ./grpcurl -v --proto=metadata_store_service.proto --plaintext 10.152.183.202:8080 ml_metadata.MetadataStoreService/GetArtifacts

Resolved method descriptor:
// Gets all the artifacts.
rpc GetArtifacts ( .ml_metadata.GetArtifactsRequest ) returns ( .ml_metadata.GetArtifactsResponse );

Request metadata to send:
(empty)

Response headers received:
accept-encoding: identity,gzip
content-type: application/grpc
grpc-accept-encoding: identity,deflate,gzip

Response contents:
{
  
}

Response trailers received:
(empty)
Sent 0 requests and received 1 response
PASSED
----------------------------------------------------------------------------- live log teardown -----------------------------------------------------------------------------
INFO     pytest_operator.plugin:plugin.py:768 Model status:

Model      Controller     Cloud/Region     Version  SLA          Timestamp
mlmd-test  my-controller  myk8s/localhost  2.9.33   unsupported  09:33:54Z

App   Version                Status  Scale  Charm  Channel  Rev  Address         Exposed  Message
mlmd  res:oci-image@e2cb9ce  active      1  mlmd   stable    10  10.152.183.114  no       

Unit     Workload  Agent  Address     Ports     Message
mlmd/0*  active    idle   10.1.17.61  8080/TCP  

Relate mlmd to mysql database

In Upstream CKF, MLMD component (metadata-grpc-server) uses the same MySQL server that KFP api-server uses, with each of them storing their data in different databases (tables). In order to not diverge from upstream, MLMD needs to be configurable in order to integrate with the mysql-k8s charm that we deploy for kfp-db.

Add logging relation to mlmd charm

Context

Add logging relation using loki_push_api interface and LogForwarder to mlmd charm. Alternatively the LogProxyConsumer, could be used if the service is using log file instead of STDOUT, however this needs to be discussed because LogProxyConsumer is marked as deprecated.

This task is part of the COS integration initiative for all Kubeflow charms.

What needs to get done

  1. Add logging relation using loki_push_api interface
  2. Use LogForwarder or LogProxyConsumer
  3. Use chisme abstraction for testing

Definition of Done

  1. Charm could be related with grafana-agent-k8s via logging relation
  2. This relation was tested by integration tests
  3. This relation was tested manually with COS deployed

Sidecar rewrite

Context

We rewrite all of our charms using the sidecar pattern instead of the old podspec.

What needs to get done

Rewrite the charm using sidecar with base charm pattern.

Definition of Done

Charm is rewritten with sidecar with base charm pattern.
All of the tests are rewritten and passing.

Remove unused `mysql` relation

Context

The charm has a mysql relation that is not used as it is provided by the mariadb charm which is no longer supported. The other alternative is to use the relational-db relation to relate to mysql-k8s charm, but due to #64, it is not possible.

In #71 it was decided that instead of an external DB provider, the mlmd charm will use the SQLite implementation it has been using for a while. In PR#72 the charm code has been modified to exclusively use SQLite.

What needs to get done

Remove the mysql relation from the metadata.yaml file

Definition of Done

The metadata.yaml file does not have the relation.

Move backup/restore logic to the Charm

Context

Right now the backup and restore guide is telling users to run the following commands to create a dump of the sqlite content of MLMD's PVC:

# CKF 1.8
MLMF_POD="mlmd-0"
MLMD_CONTAINER="mlmd"

kubectl exec -n kubeflow $MLMD_POD -c $MLMD_CONTAINER -- \
    /bin/bash -c "apt update && apt install sqlite3 -y"

This has the following 2 drawbacks:

  1. It will not work in airgap environments
  2. Users need to manually use commands to create the backup and push it to S3

What needs to get done

  1. Include any binaries needed for the backup into the Charm (could be separate task)
  2. Have an action that can make the backup and push it to S3
    1. Will most probably need to have a relation with s3-interface for this

This will also need to take into account if MLMD should block receiving traffic when making the backup or not (scaling down the pebble service).

Definition of Done

  1. Have a spike to confirm we know if MLMD should be up/down when doing the backup
  2. The action can be executed in an airgapped environment
  3. Users don't need to run any manual commands from their machine
  4. The data will go directly from the Charm to S3

Write migration documentation from podspec to sidecar

Context

#72 introduces changes in the charm that will affect the way users refresh this charm. Since deleting the charm will be necessary, we must ensure the right documentation is set in place.

In preparation for the 1.9 release, we have to at least document in this issue and/or in the README all the migration steps.

What needs to get done

Document the migration steps for:

  1. Refreshing the charm
  2. Backing up and migrating data

Definition of Done

A document that states all the necessary steps to correctly upgrade this charm.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.