Code Monkey home page Code Monkey logo

mlops-basics's Introduction

Hi there

About Me

I am currently working at Enterpret as a Founding Engineer - NLP.

My interests are in Unsupervised Algorithms, Semantic Similarity and Productionising the NLP models. I also like to follow latest research works happening in the NLP domain.

Check out my latest blogs here: Deep Learning Blogs

Besides work, I like cooking πŸ₯˜ , cycling πŸš΄β€β™€οΈ , kdramas πŸŽ₯.

Languages & Tools:

python PyTorch Docker

Contact

Twitter LinkedIn Gmail

graviraja


graviraja

mlops-basics's People

Contributors

graviraja avatar ravirajag avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlops-basics's Issues

Lambda Environent Support for SQLite3 Older Versions

dvc pull fails with the error on the screenshot, I tried downloading latest sqlite3 and compile from source but turns out lambda environment doesn't give much control

# # Configuring remote server in dvc
RUN dvc init --no-scm -f
RUN dvc remote add -d storage s3://basicmlops/dvcstore

# # pulling the trained model
RUN dvc pull dvcfiles/trained_model.dvc

Screenshot 2023-08-16 225000

Metric not matched between in `early_stopping_callbacks` (Week 1)

Hi @graviraja ,

I found out that train.py in Week 1 throws RuntimeError while runnning.

RuntimeError: Early stopping conditioned on metric `val_loss` which is not available. 
Pass in or modify your `EarlyStopping` callback to use any of the following: 
`valid/loss_epoch`, `valid/acc`, `valid/precision_macro`, `valid/recall_macro`, 
`valid/precision_micro`, `valid/recall_micro`, `valid/f1`, `valid/loss`, `train/loss_step`, 
`train/acc_step`, `train/loss_epoch`, `train/acc_epoch`, `train/loss`, `train/acc`

"val_loss" on line 52 seems to be the reason since it does not match.

monitor="val_loss", patience=3, verbose=True, mode="min"

I believe this should be changed to "valid/loss" ?

Thanks for the amazing resource.

AWS Lambda Function: Test error

I am following Week 8 Blog post. When I deploy the container using Lambda and try to test it using the Test section, the Execution fails. I get the following log. Can you please help with this? Does this function already have internet access to download that model? (Sorry if the question is naive)

es/transformers/file_utils.py", line 1518, in get_from_cache
os.makedirs(cache_dir, exist_ok=True)
File "/usr/lib/python3.6/os.py", line 210, in makedirs
makedirs(head, mode, exist_ok)
File "/usr/lib/python3.6/os.py", line 210, in makedirs
makedirs(head, mode, exist_ok)
File "/usr/lib/python3.6/os.py", line 210, in makedirs
makedirs(head, mode, exist_ok)
File "/usr/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/home/sbx_user1051'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/uvicorn", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/uvicorn/main.py", line 425, in main
run(app, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/uvicorn/main.py", line 447, in run
server.run()
File "/usr/local/lib/python3.6/dist-packages/uvicorn/server.py", line 69, in run
return asyncio.get_event_loop().run_until_complete(self.serve(sockets=sockets))
File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in run_until_complete
return future.result()
File "/usr/local/lib/python3.6/dist-packages/uvicorn/server.py", line 76, in serve
config.load()
File "/usr/local/lib/python3.6/dist-packages/uvicorn/config.py", line 448, in load
self.loaded_app = import_from_string(self.app)
File "/usr/local/lib/python3.6/dist-packages/uvicorn/importer.py", line 21, in import_from_string
module = importlib.import_module(module_str)
File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "./app.py", line 5, in <module>
predictor = ColaONNXPredictor("./models/model.onnx")
File "./inference_onnx.py", line 12, in __init__
self.processor = DataModule()
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/core/datamodule.py", line 49, in __call__
obj = type.__call__(cls, *args, **kwargs)
File "./data.py", line 20, in __init__
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
File "/usr/local/lib/python3.6/dist-packages/transformers/models/auto/tokenization_auto.py", line 534, in from_pretrained
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/transformers/models/auto/configuration_auto.py", line 450, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py", line 532, in get_config_dict
raise EnvironmentError(msg)
OSError: Can't load config for 'google/bert_uncased_L-2_H-128_A-2'. Make sure that:
- 'google/bert_uncased_L-2_H-128_A-2' is a correct model identifier listed on 'https://huggingface.co/models'
- or 'google/bert_uncased_L-2_H-128_A-2' is the correct path to a directory containing a config.json file
END RequestId: 95ab620c-bf63-46ab-8c02-27fb4099485b
REPORT RequestId: 95ab620c-bf63-46ab-8c02-27fb4099485b	Duration: 65041.97 ms	Billed Duration: 65042 ms	Memory Size: 1024 MB	Max Memory Used: 446 MB	
RequestId: 95ab620c-bf63-46ab-8c02-27fb4099485b Error: Runtime exited with error: exit status 1
Runtime.ExitError

Different module metrics for train/val

Module metrics stores internal states computed over each call on different batches. So using the same instance for both train and val might not lead to correct results when computed over epoch with (on_epoch=True) in step hooks. I'd suggest creating separate ones for each task (train & val).

ref: https://torchmetrics.readthedocs.io/en/latest/pages/quickstart.html#module-metrics

self.accuracy_metric = torchmetrics.Accuracy()
self.f1_metric = torchmetrics.F1(num_classes=self.num_classes)
self.precision_macro_metric = torchmetrics.Precision(
average="macro", num_classes=self.num_classes
)
self.recall_macro_metric = torchmetrics.Recall(
average="macro", num_classes=self.num_classes
)
self.precision_micro_metric = torchmetrics.Precision(average="micro")
self.recall_micro_metric = torchmetrics.Recall(average="micro")

Key error on Week1

Hi @graviraja,
I was following your tutorial on wandb logging and found a potential error in the training code when visualizing poorly performed data with wandb table.

def on_validation_end(self, trainer, pl_module):
val_batch = next(iter(self.datamodule.val_dataloader()))
sentences = val_batch["sentence"]

When running, this results in KeyError : "sentence", referring to line 21 sentences = val_batch["sentence"].

I think this is because "sentence" is not part of the columns setup in val_data of data.py. Please correct me if I'm wrong. Thanks :)

def setup(self, stage=None):
# we set up only relevant datasets when stage is specified
if stage == "fit" or stage is None:
self.train_data = self.train_data.map(self.tokenize_data, batched=True)
self.train_data.set_format(
type="torch", columns=["input_ids", "attention_mask", "label"]
)
self.val_data = self.val_data.map(self.tokenize_data, batched=True)
self.val_data.set_format(
type="torch",
columns=["input_ids", "attention_mask", "label"],
output_all_columns=True,
)

DVCFiles alternative not working

dvc add ../models/best-checkpoint.ckpt --file trained_model.dvc
this command is not working for me.
i mean if run this command trained_model.dvc file is not created in dvcfiles folder

Does it work on Windows?

From my understanding, the Huggingface Transformer Docker image can only work on Linux, is that right?

Advice how to deploy and run my docker image on my own local machine

I have my own local machine and I'd like to substitute what AWS S3 bucket does into my machine.

From my thought, the followings are the steps

  1. Open VPN of my local network
  2. Open sft/ssh of my local machine
  3. On Github Action, put the vpn and ssh key
  4. Send my own command.

Would you give me any advice/article to read?

What is Postman? How to set it up?

In the 8th week blog post, you mentioned to do this:

Now that the API Gateway is integrated, let's call it. Go to Postman and create a POST method with the Invoke URL and body containing sentence parameter.

Can you please elaborate about it in the blog post? I don't know what is Postman. It is a tab present in the Lambda function or API gateway or is it a separate AWS service?

Thanks.

How to push a container to specific repository in GitHub Actions?

To push an image xyz to a ECR repository abc using CLI, we would do the following:

docker tag xyz 246113150184.dkr.ecr.us-west-2.amazonaws.com/abc
docker push 246113150184.dkr.ecr.us-west-2.amazonaws.com/abc

How to do the same using GitHub actions? In your given example in the blog of Week 7, the image name mlops-basics and the repository name mlops-basics are same, so it is working. How to do it if they are different?

name: Create Docker Container

on: [push]

jobs:
  mlops-container:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ./week_7_ecr
    steps:
      - name: Checkout
        uses: actions/checkout@v2
        with:
          ref: ${{ github.ref }}
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-west-2
      - name: Build container
        run: |
          docker build --build-arg AWS_ACCOUNT_ID=${{ secrets.AWS_ACCOUNT_ID }} \
                       --build-arg AWS_ACCESS_KEY_ID=${{ secrets.AWS_ACCESS_KEY_ID }} \
                       --build-arg AWS_SECRET_ACCESS_KEY=${{ secrets.AWS_SECRET_ACCESS_KEY }} \
                       --tag mlops-basics .
      - name: Push2ECR
        id: ecr
        uses: jwalton/gh-ecr-push@v1
        with:
          access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          region: us-west-2
          image: mlops-basics:latest

Please correct me if I have misunderstood something.

A question on week_0

Hello Raviraj, your great work is so helping me.

I installed all packages using requirements.txt. And I trained a model without any issue.
But I have an issue while I do inference sentences.

In my case, every sentence gets the same results. (almost same score)
Could you check it out? Thanks!

image
image

Potential Error in Blog of Week 0

Hey Raviraj, great work. I am learning a lot.

In the week 0 blog post, you mentioned the following:
As an example, I will be implementing EarlyStopping callback. This helps the model not to overfit by mointoring on a certain parameter (val_loss in this case) The best model will be saved in the dirpath.

But in the code, you have used ModelCheckpoint callback and did not use EarlyStopping. I believe EarlyStopping and ModelCheckpoint callbacks are two completely different callbacks. Please correct if I am wrong. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.