Application to generate your training scripts with PyTorch-Ignite.
Please see Contribution Guide.
Development of this project is supported by NumFOCUS Small Development Grant. We are very grateful to them for this support!
Web Application to generate your training scripts with PyTorch Ignite
Home Page: https://code-generator.pytorch-ignite.ai/
License: BSD 3-Clause "New" or "Revised" License
Application to generate your training scripts with PyTorch-Ignite.
Please see Contribution Guide.
Development of this project is supported by NumFOCUS Small Development Grant. We are very grateful to them for this support!
Following warning is shown in the CI: step-504
/opt/hostedtoolcache/Python/3.6.13/x64/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:134:
UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`.
In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.
Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.
See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
python main.py \
--data_path <path_to_dataset> \
--train_batch_size 4 \
--eval_batch_size 4 \
--num_workers 2 \
--max_epochs 2 \
--train_epoch_length 4 \
--eval_epoch_length 4
No warning to show.
Output of python -m torch.utils.collect_env
:
OS: Linux
torch: 1.9.0
torchvision: 0.10.0
ignite: 0.4.5
If you like to tackle this issue, please comment that you want to work on and see the contributing guide.
Provide sensible custom events like
BACKWARD_COMPLETED/STARTED
OPTIMIZER_STEP_COMPLETED/STARTED
OPTIMIZER_ZERO_GRAD_COMPLETED/STARTED
Both links will be the same but second is expected to be different.
AttributeError: 'VisdomLogger' object has no attribute 'writer'
No error.
Please see line 137, src/templates/template-vision-segmentation/vis.py.
logger.writer.add_image
is for tensorboard, no?
logger.writer.add_image
Choose template for vision-segmentation
Choose Visdom for exp. tracking system.
Currently tested on master: 78b0def
-r requirements.txt
to install streamlit with appropriate version-r requirements.txt
# dev
pytorch-ignite
torch
torchvision
jinja2
requests
# test
pytest
hypothesis
Change text: "Those in the parenthesis are used in the generated code." -> "Names in the parenthesis are variable names in the generated code." or something similar.
Let's explicitly create the trainer in CIFAR10 example to show how to write training_step
Let's add AMP option
Let's add Error metric (to show how we can do metrics arithmetics) :
accuracy_metric = Accuracy(device=device)
metrics = {
'eval_accuracy': accuracy_metric,
'eval_loss': Loss(loss_fn=loss_fn, device=device),
'eval_error': (1.0 - accuracy_metric) * 100
}
initialize
and also set up a LR scheduler:- device, model, optimizer, loss_fn = initialize(config)
+ device = idist.device()
+ model, optimizer, loss_fn, lr_scheduler = initialize(config)
Distributed option if used as multiprocessing schema: python main.py
-> multiple childs have/had a certain issue with dataloaders: first iteration of each epoch is very slow. To avoid that let's prefer to say to the user to lauch things with torch.distributed.launch
I think this code is useless to add to main.py if exp_logger is None
# --------------------------------
# setup common experiment loggers
# --------------------------------
exp_logger = setup_exp_logging(
config=config,
eval_engine=eval_engine,
train_engine=train_engine,
optimizer=optimizer,
name=name
)
I'm a bit confused about this option: eval_max_epochs
and its value = 2. It is something I've never seen before. I think that we have to follow the standard practices and by default run once on the validation dataloader. Thoughts ?
If possible make sidebar resizable from a min possible to a max value.
We are currently able to visit the site even on mobile, but it doesn't look good.
Make app looks good with left sidebar and generated code in the middle.
N/A
Let's either create a tutorial guide show how to use the app or a simply button with a message explaining how to use the app, where to start etc.
Let's simplify this code is no distributed option is selected :
with idist.Parallel(
backend=config.backend,
nproc_per_node=config.nproc_per_node,
nnodes=config.nnodes,
node_rank=config.node_rank,
master_addr=config.master_addr,
master_port=config.master_port,
) as parallel:
parallel.run(run, config=config)
to
# (no dist)
with idist.Parallel(
backend=config.backend,
) as parallel:
parallel.run(run, config=config)
and
# single node
with idist.Parallel(
backend=config.backend,
nproc_per_node=config.nproc_per_node,
) as parallel:
parallel.run(run, config=config)
We should be very careful with distributed button and this suggestion
python -m torch.distributed.launch \
--nproc_per_node=2 \
--use_env main.py \
--backend="nccl"
as dist button will add the code to spawn processes inside the main process and dist launch will spawn more processes.
Let's do the following:
python -m torch.distributed.launch --nproc_per_node=2 ...
and in the code we define config.nproc_per_node=None
. Same for multi-node: config.master_addr=None
etc and python -m torch.distributed.launch --nproc_per_node=2 --master_addr=master --master_port=1234 --nnodes=2 --node_rank=0 ...
python main.py ...
and in the code we define config.nproc_per_node=2
.We can also imagine folks doing other things like here: https://github.com/sdesrozis/why-ignite
If user picks "spawn" option, we have to update the code like
train_dataloader = idist.auto_dataloader(
train_dataset,
batch_size=config.train_batch_size,
num_workers=config.num_workers,
shuffle=True,
persistent_workers=True
)
eval_dataloader = idist.auto_dataloader(
eval_dataset,
batch_size=config.eval_batch_size,
num_workers=config.num_workers,
shuffle=False,
persistent_workers=True
)
It would be better to avoid such messages:
Please make sure to pass argument to metric_name parameter of get_handlers in main.py. Otherwise it can result KeyError.
Let's control what we are doing and configure everything such that we do not need to warn the user like that.
It would be nice to add AMP option for image classification / at least.
User would like to choose optimizer type: Adam, RMSprop etc
This is tracking issue for future releases
Config config.yaml
files require two params: train_epoch_length
and eval_epoch_length
Let's make them optional.
Such that epoch length is defined by the input dataloaders.
Maybe, it could make sense to add requirements.txt
to downloaded python files such that user does not need to handle the requirements ?
Add an image classification template, which can be referenced from Ignite CIFAR10 example
Currently, there is no file extension beside Copy
button for requirements.txt
.
Show txt
beside Copy
button.
requirements.txt
in the right paneCurrently, we are using markup
language from Prismjs for txt
file which is not correct.
So change the language-markup
to language-txt
or language-text
which don't exist in Prismjs languages. But this could allow us to add the file extension like here.
Another part is we are highlighting txt
with markup
. Since text files normally don't need syntax highlighting, we could just pass language grammar as empty object (i.e. languages['markup']
-> {}
).
If you like to tackle this issue, please comment that you want to work on and see the contributing guide.
It would be nice to be able to download the link using wget-like tools and not only the browser.
cc @trsvchn
Currently CI is failing on segmenation example:
[ignite]: Configuration:
{'accumulation_steps': 4,
'backend': None,
'data_path': '/home/runner/data',
'debug': False,
'eval_batch_size': 2,
'eval_epoch_length': 4,
'filename_prefix': 'training',
'limit_sec': 60,
'log_every_iters': 2,
'lr': 0.007,
'max_epochs': 2,
'n_saved': 2,
'num_classes': 21,
'num_workers': 2,
'output_dir': PosixPath('logs/20230320-090125-backend-None-lr-0.007'),
'patience': 2,
'save_every_iters': 2,
'seed': 666,
'train_batch_size': 2,
'train_epoch_length': 4,
'use_amp': False}
Traceback (most recent call last):
File "/home/runner/work/code-generator/code-generator/dist-tests/vision-segmentation-all/main.py", line 181, in <module>
main()
File "/home/runner/work/code-generator/code-generator/dist-tests/vision-segmentation-all/main.py", line 177, in main
p.run(run, config=config)
File "/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/ignite/distributed/launcher.py", line 316, in run
func(local_rank, *args, **kwargs)
File "/home/runner/work/code-generator/code-generator/dist-tests/vision-segmentation-all/main.py", line 83, in run
trainer.add_event_handler(Events.ITERATION_STARTED, lr_scheduler)
File "/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/ignite/engine/engine.py", line 319, in add_event_handler
_check_signature(handler, "handler", self, *(event_args + args), **kwargs)
File "/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/ignite/engine/utils.py", line 10, in _check_signature
signature = inspect.signature(fn)
File "/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/inspect.py", line 3113, in signature
return Signature.from_callable(obj, follow_wrapped=follow_wrapped)
File "/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/inspect.py", line 2862, in from_callable
return _signature_from_callable(obj, sigcls=cls,
File "/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/inspect.py", line 2261, in _signature_from_callable
raise TypeError('{!r} is not a callable object'.format(obj))
TypeError: <torch.optim.lr_scheduler.LambdaLR object at 0x7fdc0baab0a0> is not a callable object
Error: Process completed with exit code 1.
Got a feedback that this code
local_rank = idist.get_local_rank()
...
if local_rank > 0:
# Ensure that only rank 0 download the dataset
idist.barrier()
...
)
if local_rank == 0:
# Ensure that only rank 0 download the dataset
idist.barrier()
looks a bit strange if no distributed configuration is selected
Let's put template conditions here as well
Currently the templates are structured as python modules. So to edit and run, we need to install in editable mode.
Goal is to provide as simple python scripts instead of python modules and verify structuring like python scripts still works.
Boolean config items always give 'False' value even when set 'True' or 'true' in config.yaml.
Please see function setup_parser() on utils.py, on line 29. I think it's missing default value, like on line 31, or may be just skip checking the boolean type, and treat it like another?
code-generator/src/templates/template-vision-segmentation/utils.py
Lines 21 to 33 in f137219
Boolean config items give value as set.
Set any boolean config item in config.yaml to 'true'.
Thanks to @ydcjeff we have a working base app. And I think its a right point to split the app code into "classic" components for the GUI - app.
First, of this will allow us to start writing test, for example to verify that we generate appropriate python files and run generated code.
Model will be responsible for generating code from templates, py file preparation and archiving. It is going to be streamlit agnostic, so we can test it using pytest
.
Controller will stick together the model and Strimlit View.
Add a template based on transformers.
A list of few new features that can make Code Generator super awesome
We can probably go one feature at a time and iterate over couple of releases.
Maybe we should discuss this further over GitHub / discord
Browserslist: caniuse-lite is outdated. Please run:
npx browserslist@latest --update-db
Why you should do it regularly: https://github.com/browserslist/browserslist#browsers-data-updating
This warning seems visible in pnpm run test
command and needs to be resolved. For more information, see https://github.com/pytorch-ignite/code-generator/actions/runs/5938810397/job/16103995089#step:10:36
No warnings in CI and the CI works as expected.
To ensure that app prepares unique download link for the archive.
The idea is to link (if possible) left pane events to updated code on the right, to increase interactivity of the app.
For example: to inform user which files/part of the files were changed, (like in regular IDE, with stars or highlighting the updated code lines).
I like google-fire and think it could simplify more the templates. If we can provide it as an option, would be great.
@trsvchn please add GHA CI autocancel here as well
Right now we can see the following in the config.yaml:
seed: 777
data_path: ./
train_batch_size: 32
eval_batch_size: 32
num_workers: 4
max_epochs: 20
use_amp: false
debug: false
filename_prefix: training
n_saved: 2
save_every_iters: 1000
patience: 3
output_dir: ./logs
log_every_iters: 10
lr: 0.0001
model: resnet18
and it is unclear what each parameter is responsible for.
The idea is to add comments for each parameter like below
seed: 777 # random seed
data_path: ./ # input data path
train_batch_size: 32
eval_batch_size: 32
num_workers: 4
max_epochs: 20
use_amp: false
debug: false
filename_prefix: training # training checkpoint filename prefix
n_saved: 2 # number of saved checkpoints
save_every_iters: 1000 # training checkpoint frequency
patience: 3 # early stopping patience parameter
output_dir: ./logs # output folder
log_every_iters: 10
lr: 0.0001
model: resnet18
dataloader_train.sampler.set_epoch(trainer.state.epoch - 1)
into trainer = setup_trainer(config, model, optimizer, loss_fn, device, dataloader_train)
- [ ] We can remove Net
model in model.py
# run evaluation at every training epoch end
# with shortcut `on` decorator API and
# print metrics to the stderr
# again with `add_event_handler` API
# for evaluation stats
@trainer.on(Events.EPOCH_COMPLETED(every=1))
def _():
# show timer
if timer is not None:
logger.info("Time per batch: %.4f seconds", timer.value())
timer.reset()
What is the purpose of the timers and the reset here ?
It would be good to provide an option to select accelerator as TPU instead of GPU
We can also auto-select TPU accelerator if open with Colab + add torch_xla installation steps.
What to do:
0) Try a template with TPUs. Choose distributed training option with 8 processes and spawning option. "Open in colab" one template, for example, vision classification template, install manually torch_xla (see https://colab.research.google.com/drive/1E9zJrptnLJ_PKhmaP5Vhb6DTVRvyrKHx) and run the code with backend xla-tpu
: python main.py --nproc_per_node 8 --backend nccl
. If everything is correctly done, training should probably run
The idea is to provide a new script to make it easy for us to contribute templates to the code generator app. This should be like a python script which can be provided information in the form of a main.py
file and helps in creating necessary changes in the code generator app based on the script for the necessary tasks.
I propose something like this
main.py (to be submitted as template)
### DataLoaders
train_dataloader = ...
test_dataloader = ...
## model
class MyModel(nn.module):
def __init___(self, ):
pass
## Training Step
def step(engine, batch):
....
## Evaluation function
def evaluate():
....
Now using the comments like data loaders, we can separate the code through a script and then input the code below at specific places in the template and then use it for the template creation as well. This approach seems to be less tedious than making all the changes individually and hoping everything to work
Also we can add a template-contributing.md
guide that can help people who want to contribute new templates to the existing app.
We are currently thinking about a visual representation of building blocks composing an application : PyTorch objects, handlers, metrics, engines, etc. The idea is to provide a graphical helper to organise events and dataflow. In my knowledge, it is an original approach that could be complementary to our code generator (from templates).
To do this, the visual representation (from a graphical tool, in the spirit of PyFlow https://github.com/wonderworks-software/PyFlow) should be described in a specific representation used by a code generator. This representation could be similar to the intermediate representation (IR) used in compilation (e.g llvm, gcc). It helps optimisation, and code generation.
I wonder about merging our effort to have a unique representation. I mean
What do you think about that ?
Some insights
Currently we store defaults configuration for the project in utils.py
.
And as it was discussed, it would be great to add option for the user to be able to dump this configuration (e.g. for reproducibility purpose).
we can make it optional with another flag like:
and then, for example, we can ask for the format:
"What kind of config format would you like?"
Nav bar became too large in 404 page and covered the h1 title.
NavBar not covered the h1 title.
https://code-generator.pytorch-ignite.ai/foobar
View the reproduction link on mobile
There are two output paths currently : config.output_path
for saving checkpoints and config.filepath
for python logging and experiment tracking systems.
Goal is to combine them into one and have a easy traceable folder name for experiments.
If going to any unexpected url e.g. https://code-generator.pytorch-ignite.ai/test
, we see a blank page.
Let's show a 404 page ?
torch.distributed.launch
is be deprecated, let's use recommended launcher. Related ignite issue: pytorch/ignite#2415
Right now, the metadata.json
works like this
{
training”:{
…Options
}
}
And we just import all these options and assume that all templates are going to have them.
This causes two specific issues, as given below
We propose two approaches to solve this problem, namely
templates
sub option for each training option
which contains all templates for that specific training option and check it during rendering of the templates. This works as follows:[Changes in metadata.json]
"deterministic": {
"name": "deterministic",
"type": "checkbox",
"description": "Should the training be deterministic?",
+ "templates": ['vision-classification', 'text-classification', ..]
}
[Changes in TabTraining.vue(and other Vue components)]
<div v-if="deterministic.templates.includes(store.config.template)">
Then we can selectively choose which option will be available for each template. This approach makes it easier to track each template and option. The only problem of this approach seems to be hard for us to specify an option like a specific argparser or backend.
data_options.json
file in the templates. This file will contain all options related to each template that can may contain sub options check as well. This approach makes more sense while contributing new templates as you need to add all options you want to configure directly.[New file templateOptions.json]
{
"templates":
{
"template-vision-classification":
{
"training": [
"argparser",
"deterministic",
"torchrun",
"spawn",
"nproc_per_node",
"nnodes",
"master_addr",
"master_port",
"backend"
]
}
[Changes in TabTraining.vue(and other Vue components)]
<FormCheckbox
:label="deterministic.description"
:saveKey="deterministic.name"
v-show="trainingOptions.includes('deterministic')"
/>
const trainingOptions = ref(templates[store.config.template]["training"])
This approach seems a bit complex but can provide more control over the options and templates. Also it can help introduce template specific features like having specific evaluation functions and other metadata options.
This issue was discuss in our weekly meeting this week.
Here is an idea to think about for later versions, like v0.3 etc
Let's imagine I use the generator to quick-start with my specific problem: own dataflow, model etc.
I generate the code and start to bootstrap things between the training code and my custom things. Without running the training, it is almost impossible to ensure the correctness, however, I could think of some basic additional tests with verbose option to ensure that my own dataloaders and the model provide the expected info.
Let's say generated files are:
- main.py
- model.py
- dataflow.py
- utils.py
The idea is to provide additional folder:
tests
- test_dataflow.py
- test_model.py
where we can provide a skeleton code for
Anyway, this is something to discuss and brain storm...
It’s about trivial error.
We can run the template we want through the open in colab button in the pytorch-ignite code generator. And, we can press the button to link the following code to run in colab for running Template-vision-segmentation directly in colab.
!wget https://raw.githubusercontent.com/pytorch-ignite/nbs/main/nbs/0a809e9f-82c6-42cc-a7de-378f7f87cc7b/ignite-template-vision-segmentation.zip
!unzip ignite-template-vision-segmentation.zip
!pip install -r requirements.txt
!python main.py config.yaml
However, if this is executed as colab, the following problem arise because it is executed without the data required for training.
Traceback (most recent call last):
File "/content/data.py", line 80, in setup_data
dataset_train = VOCSegmentationPIL(
File "/content/data.py", line 56, in __init__
super().__init__(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torchvision/datasets/voc.py", line 101, in __init__
raise RuntimeError("Dataset not found or corrupted. You can use download=True to download it")
RuntimeError: Dataset not found or corrupted. You can use download=True to download it
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/content/main.py", line 128, in <module>
main()
File "/content/main.py", line 124, in main
p.run(run, config=config)
File "/usr/local/lib/python3.10/dist-packages/ignite/distributed/launcher.py", line 316, in run
func(local_rank, *args, **kwargs)
File "/content/main.py", line 36, in run
dataloader_train, dataloader_eval = setup_data(config)
File "/content/data.py", line 87, in setup_data
raise RuntimeError(
RuntimeError: Dataset not found. You can use `download_datasets` from data.py function to download it.
So, I suggest correcting download=False
to download=True
in code-generator/src/templates/template-vision-segmentation/data.py for running it immediately.
I suggest to replace the front image when open https://code-generator.pytorch-ignite.ai/
by the gif we have here in the readme. Any thoughts?
Right now the logging message is the following (trainin-info.log):
...
�[32m[ignite]�[0m: train [1/90]: {'epoch': 1, 'train_loss': 2.521751880645752}
�[32m[ignite]�[0m: train [1/100]: {'epoch': 1, 'train_loss': 2.4120213985443115}
...
Let's add current datetime instead of "ignite" as following:
...
�[32m[20230819-12:34:56]�[0m: train [1/90]: {'epoch': 1, 'train_loss': 2.521751880645752}
�[32m[20230819-12:35:12]�[0m: train [1/100]: {'epoch': 1, 'train_loss': 2.4120213985443115}
...
Currently, Code Generator has generated the files with given option from left sidebar.
Idea is make a tar file for generated files and an option to download.
Now we are generating python files on each interaction with the sidebar. The idea is to keep code rendering on the screen, but write generated strings into files, when user presses the "Download" button.
code-generator/templates/image_classification/main.py
Lines 184 to 187 in 07d6401
@ydcjeff can you explain this code ? It looks very weird !
This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.
Warning
These dependencies are deprecated:
Datasource | Name | Replacement PR? |
---|---|---|
npm | @iconify/iconify |
These updates have been manually edited so Renovate will no longer make changes. To discard all commits and start over, click on a checkbox.
@iconify/iconify
, @octokit/core
, @types/ejs
, @types/file-saver
, @types/jest
, @types/prismjs
, @vitejs/plugin-vue
, @vue/compiler-sfc
, albumentations
, continuumio/miniconda3
, ejs
, jest
, playwright-chromium
, prettier
, prismjs
, pytorch-ignite
, semver
, start-server-and-test
, torch
, torchvision
, uuid
, vue
, vue-router
)These updates have all been created already. Click a checkbox below to force a retry/rebase of any.
These are blocked by an existing closed PR and will not be recreated unless you click a checkbox below.
@types/jest
, jest
)docker/Dockerfile
continuumio/miniconda3 24.1.2-0
.github/workflows/ci.yml
actions/checkout v4
actions/setup-python v5
actions/setup-node v4
actions/cache v4
actions/cache v4
actions/checkout v4
actions/setup-python v5
actions/setup-node v4
actions/cache v4
package.json
@iconify/iconify ^3.1.0
@octokit/core ^5.0.0
@types/ejs ^3.1.0
@types/file-saver ^2.0.5
@types/jest ^27.4.0
@types/prismjs ^1.26.0
@vitejs/plugin-vue ^2.1.0
@vue/compiler-sfc ^3.2.30
ejs ^3.1.6
execa ^8.0.1
file-saver ^2.0.5
jest ^27.5.0
jszip ^3.10.1
playwright-chromium ^1.33.0
prettier ^2.5.1
prismjs ^1.26.0
prompts ^2.4.2
semver ^7.3.5
start-server-and-test ^2.0.0
uuid ^9.0.0
vite ^2.7.13
vue ^3.2.30
vue-router ^4.0.12
scripts/requirements.txt
torch >=1.10.2
torchvision >=0.11.3
pytorch-ignite >=0.4.8
src/templates/template-common/requirements.txt
torch >=1.10.2
torchvision >=0.11.3
pytorch-ignite >=0.4.8
src/templates/template-text-classification/requirements.txt
src/templates/template-vision-segmentation/requirements.txt
albumentations >=1.3.0
Add a resume_from
option to load a checkpoint (https://, str, Path) using
Ref : deit, ignite cifar10
We now have 2 versions of the app:
main
branchmaster
branchThe new version is made with Vue 3 while the old one is made with Streamlit.
UI
Set default configuration (#154 )
Misc
code-generator/templates/image_classification/main.py
Lines 153 to 166 in 07d6401
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.