Code Monkey home page Code Monkey logo

azureml-debug-training's Introduction

azureml-debug-training

This repos shows how to debug an AzureML remote run using VSCode remote debugging with the Python Tools Visual Studio Debugger.

While you want to do as much of your debugging as possible on the machine where your development tools are running, sometimes you need to debug on a cluster/remote compute. For instance a certain issue might only occur when running on the cluster or your production job failed and you want to inspect what happened by inspecting variables in memory.

This repository shows how to set up the networking required to debug a job that is running on an AzureML Compute cluster from an AzureML Notebook VM. The debugger is running on a VSCode Remote instance on the Notebook VM and attaches to the debug agent running inside the job on the AML Compute cluster. While all examples are using a Notebook VM, everything should work the same way from your local dev box (only maybe slower).

Installation

Go here for Setting up VSCode Remote on an AzureML Notebook VM for how to set up VSCode Remote on an AzuremML Notebook VM.

Next, log in to your Notebook VM and open a terminal window (in the top right corner of the Jupyter view click 'New', then 'Terminal')

  • activate the conda environment you wish to use
    conda activate py36
  • Install Python Tools Visual Studio Debugger (not strictly required on the Notebook VM, but a good idea when testing the setup locally
    pip install ptvsd
  • Next go to the folder where you keep your git projects and clone this repository
    git clone https://github.com/danielsc/azureml-debug-training

Then see Debug AzureML Remote Run for instructions on the debugging process.

azureml-debug-training's People

Contributors

abeomor avatar danielsc avatar ronglums avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

azureml-debug-training's Issues

debuging a Pipeline Run with PTVSD from a notebook VM

I'm running into a specific issue when trying to attach to the running (waiting process) an AML Compute target. @danielsc

Here's the output I'm seeing from the log (specifically the azureml-logs/70_driver_log.txt).

bash: /azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/libtinfo.so.5: no version information available (required by bash)
bash: /azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/libtinfo.so.5: no version information available (required by bash)
Starting the daemon thread to refresh tokens in background for process with pid = 145
Entering Run History Context Manager.
Launching: python -m ptvsd --host 10.0.0.4 --port 5678 --wait register_best_model.py --hd-step-name hd_train_step --best-model-name pipe_model_testing
E00400.209: Traceback (most recent call last):
File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/site-packages/ptvsd/ipcjson.py", line 269, in process_one_message
msg = self.__message.pop(0)
IndexError: pop from empty list

        During handling of the above exception, another exception occurred:
        
        Traceback (most recent call last):
          File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/site-packages/ptvsd/ipcjson.py", line 170, in _wait_for_message
            length_text = headers['Content-Length']
        KeyError: 'Content-Length'
        
        During handling of the above exception, another exception occurred:
        
        Traceback (most recent call last):
          File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/site-packages/ptvsd/ipcjson.py", line 258, in process_messages
            self.process_one_message()
          File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/site-packages/ptvsd/ipcjson.py", line 272, in process_one_message
            self._wait_for_message()
          File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/site-packages/ptvsd/ipcjson.py", line 179, in _wait_for_message
            raise InvalidHeaderError('Content-Length not specified in headers')
        ptvsd.ipcjson.InvalidHeaderError: Content-Length not specified in headers

Exception in thread ptvsd.Server:
Traceback (most recent call last):
File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/site-packages/ptvsd/ipcjson.py", line 269, in process_one_message
msg = self.__message.pop(0)
IndexError: pop from empty list

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/site-packages/ptvsd/ipcjson.py", line 170, in _wait_for_message
length_text = headers['Content-Length']
KeyError: 'Content-Length'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/site-packages/ptvsd/wrapper.py", line 521, in process_messages
self.process_messages()
File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/site-packages/ptvsd/ipcjson.py", line 258, in process_messages
self.process_one_message()
File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/site-packages/ptvsd/ipcjson.py", line 272, in process_one_message
self._wait_for_message()
File "/azureml-envs/azureml_545725e1cb9b35bc422730908f3231c4/lib/python3.6/site-packages/ptvsd/ipcjson.py", line 179, in _wait_for_message
raise InvalidHeaderError('Content-Length not specified in headers')
ptvsd.ipcjson.InvalidHeaderError: Content-Length not specified in headers

Any clue what might be the issue? Running the latest version of ptvsd (pulled it from an estimator step, without specifying the version, so likely getting the latest version on the AML compute side), and running VS Code insiders, remoted to the Notebook VM that has the code that's submitted to AMLCompute via an AML pipeline (other steps aren't debugged, just this one which pauses until I connect).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.