Comments (4)
Hi @paloma-rebuelta @romeokienzler @harshad16 just a quick update on my code change tested so we can see the actual logs in python and R nodes processed in KFP or Airflow. I decided to let the called process error exception (on non-zero return code) trickle up in subprocess by means of "check=True", as it's been before here in that part of Elyra code.
PR will follow, I am open to any hints regarding good python code style and practice, avoiding log injection vulnerabilities etc.
t0 = time.time()
run_args = ["python3", python_script]
with open(python_script_output, "w") as log_file:
print("Processing file:", python_script)
try:
result = subprocess.run(run_args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, check=True)
output = result.stdout.decode('utf-8')
logger.info(f"Output: {output}")
logger.info(f"Return code: {result.returncode}")
except subprocess.CalledProcessError as e:
logger.error("Output: %s", e.output.decode("utf-8"))
logger.error("Return code: %s", e.returncode)
In terms of new behavior, see some samples, in my case Airflow, using different python content:
@romeokienzler @kevin-bates @lresende @harshad16 I wonder why elyra developers initially decided to put the stdout script output into a file and then put that file into the elyra s3 bucket ... in my PR, I will add script output to stdout and log it in whatever system it runs on, leaving log collection to the system. Do we want to keep putting python or r or notebook output stdout to S3 bucket file like here https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L492 Searching for architectural input here. Also posted this question in elyra Slack general channel.
Since I am now piping all script run output to stdout, including stderr, I am getting the following, understandable, warning after running make lint
elyra/airflow/bootstrapper.py:334:53: F841 local variable 'log_file' is assigned to but never used
elyra/airflow/bootstrapper.py:374:48: F841 local variable 'log_file' is assigned to but never used
elyra/kfp/bootstrapper.py:486:53: F841 local variable 'log_file' is assigned to but never used
elyra/kfp/bootstrapper.py:528:48: F841 local variable 'log_file' is assigned to but never used
from elyra.
This is how we do it
result = subprocess.run(command_array, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
output = result.stdout.decode('utf-8')
print("Output:", output)
print("Return code:", result.returncode)
from elyra.
A coworker of mine just today approached me with the same problem. Will verify ... I am having him submit both python and jupyter generic pipeline nodes / tasks as well. He mentioned the same: why don't I see the log infos in Airflow Web GUI Task logs ... very similar to me to your problem described here. The issue is independent of the runtime, be it Kubeflow pipelines or Airflow.
Verified with us, too:
Any python output, be it print(1), logger.info("sometext") raise "exception text" does not show up in the runtime logs in case of python nodes. In contrast, in jupyter nodes, those infos show up.
pythonFileOp and RFileOp have similar snytax, subprocess.run, in both kfp and airflow bootstrapper.py
https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L521
https://github.com/elyra-ai/elyra/blob/main/elyra/airflow/bootstrapper.py#L332
i.e.
with open(r_script_output, "w") as log_file: subprocess.run(run_args, stdout=log_file, stderr=subprocess.STDOUT, check=True)
We only see the name of the processed file in stdout and no logs in case of python or r nodes:
whereas ipynb notebookFileOp
https://github.com/elyra-ai/elyra/blob/main/elyra/kfp/bootstrapper.py#L357
probably handles cell logging differently, which is why we see its python output.
I found the following info on subprocess: https://docs.python.org/3/library/subprocess.html#using-the-subprocess-module
subprocess.run has an argument capture_output=False
by default.
If capture_output is true, stdout and stderr will be captured. When used, the internal [Popen](https://docs.python.org/3/library/subprocess.html#subprocess.Popen) object is automatically created with stdout and stdin both set to [PIPE](https://docs.python.org/3/library/subprocess.html#subprocess.PIPE). The stdout and stderr arguments may not be supplied at the same time as capture_output. If you wish to capture and combine both streams into one, set stdout to [PIPE](https://docs.python.org/3/library/subprocess.html#subprocess.PIPE) and stderr to [STDOUT](https://docs.python.org/3/library/subprocess.html#subprocess.STDOUT), instead of using capture_output.
from elyra.
@harshad16 @romeokienzler regarding notebook logs and cell output that @paloma-rebuelta sees when executing Jupyter notebooks code (see her first code example ipynb that I also ran today), when airflow or kfp log level is only set to INFO (not debug), it seems the log info or cell output is not written:
[2024-05-13, 17:31:49 UTC] {pod_manager.py:466} INFO - [base] [I 17:31:48.057] Output Notebook: notebook_log_test-output.ipynb
[2024-05-13, 17:31:49 UTC] {pod_manager.py:466} INFO - [base]
Executing: 0%| | 0/1 [00:00<?, ?cell/s][I 17:31:49.020] Executing notebook with kernel: python3
[2024-05-13, 17:31:49 UTC] {pod_manager.py:466} INFO - [base]
Executing: 100%|██████████| 1/1 [00:00<00:00, 1.03cell/s]
Executing: 100%|██████████| 1/1 [00:01<00:00, 1.19s/cell]
[2024-05-13, 17:31:49 UTC] {pod_manager.py:466} INFO - [base] [I 17:31:49.259] 'NotebookLog':'notebook_log_test' - notebook execution completed (2.110 secs)
so we have to do some more work in bootstrapper.py for NotebookFileOps, too, it seems, with regards to logger and stdout and stderr ... I'll test this out. i.e.
nteract/papermill#433 (comment)
https://github.com/nteract/papermill/blob/main/papermill/engines.py#L221
Update:
ok, so for papermill.execute, if i set log_output to True, I get what Paloma saw, except also in overall airflow logs info mode:
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] None
[I 19:01:56.440] Executing notebook with kernel: python3
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] [I 19:01:56.441] Executing Cell 1---------------------------------------
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] [W 19:01:56.463] INFO:test:this is a jupyter notebook
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] INFO:test:this is a jupyter notebook
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base] [W 19:01:56.464] ERROR:test:this is a contrived error message in a jupyter notebook script
2024-05-13T19:01:56.466436267Z
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base] ERROR:test:this is a contrived error message in a jupyter notebook script
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base] [I 19:01:56.466] this is a print statement in a jupyter notebook
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base] this is a print statement in a jupyter notebook
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base] [I 19:01:56.467] Ending Cell 1------------------------------------------
[2024-05-13, 19:01:56 UTC] {pod_manager.py:466} INFO - [base]
Executing: 100%|██████████| 1/1 [00:01<00:00, 1.00s/cell]
Executing: 100%|██████████| 1/1 [00:01<00:00, 1.22s/cell]
[2024-05-13, 19:01:58 UTC] {pod_manager.py:511} WARNING - Pod notebook-log-test-mp6spa42 log read interrupted but container base still running
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] None
[I 19:01:56.440] Executing notebook with kernel: python3
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] [I 19:01:56.441] Executing Cell 1---------------------------------------
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] [W 19:01:56.463] INFO:test:this is a jupyter notebook
[2024-05-13, 19:01:59 UTC] {pod_manager.py:466} INFO - [base] INFO:test:this is a jupyter notebook
..
need to find out why cell output got logged twice: log read interrupted but container base still running. but good to see that the three additional arguments to papermill.execute_notebook
log_output=True, stdout_file=sys.stdout, stderr_file=sys.stderr
make sure that in any case, cell content is logged correctly and when there is a raise or unknown error, that, too, is logged.
¨
when I add
papermill.execute_notebook(notebook, notebook_output, kernel_name=kernel_name, log_output=True, stdout_file=sys.stdout, stderr_file=sys.stderr)
at https://github.com/elyra-ai/elyra/blob/main/elyra/airflow/bootstrapper.py#L229C13-L229C91
(or of course the bootstrapper code in kfp, too) it's cool. Will add that to the PR as well.
from elyra.
Related Issues (20)
- Can't export kubeflow pipelines if I use parameters HOT 1
- Introduce Option to make S3 Region configurable
- Unable to use KUBERNETES_SERVICE_ACCOUNT_TOKEN authentication method HOT 2
- Common or shared variables for different notebooks in a pipeline HOT 9
- Does elyra stop update new version anymore? HOT 1
- Unable to include folder/dir as deps in elyra pipeline from node properties.
- Introduce pipeline version with numbers(v1,etc) instead of datastamp HOT 1
- ODH Quick Install links are broken HOT 3
- Support AWS airflow configuration
- elyra-pipeline validate CLI command fails for pipelines with parameters and default properties HOT 1
- Kubeflow Pipelines runtime configuration HOT 1
- Fix/Add back support for Python 3.11
- Local development setup not working HOT 6
- Read the Docs: Update integration webhook secret
- support up-to-date python on community images for enterprise gateway
- Link to Contributing doc is broken HOT 4
- Fetch environment variables from a Notebook's environment
- DBT Editor support
- nbconvert requires lxml_clean_html HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elyra.