delve-team / delve Goto Github PK

PyTorch model training and layer saturation monitor

Home Page: https://delve-docs.readthedocs.io

License: MIT License

Python 91.09% TeX 8.91%

convolutional-neural-networks deep-learning layer-saturation model-training neural-dynamics pruning pytorch training-monitor visualization

delve's People

Contributors

Stargazers

Watchers

Forkers

morristech mschuwalow radovankavicky gapdata rivol phecy saran-nns mingkin dionhaefner aung2phyowai liujuncn

delve's Issues

Why float(str(float)) instead of just float?

https://github.com/justinshenk/delve/blob/cae43ba0018a61a1042fb1339fb24b48b524e1ab/delve/metrics.py#L27

Any deeper reason to this?

ConvTranspose2d layers not being tracked

class simple(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, 3)
        self.deconv1 = nn.ConvTranspose2d(32, 3, 3)

simple_model = simple()
tracker2 = CheckLayerSat("my_experiment", save_to="plotcsv", modules=simple_model, device=image.device)

output:

added layer conv1
Skipping deconv1

This is an awesome tool, but I'd love to see how well the decoder part of my autoencoder works.

Test delve on Keras linear regression problem

Please provide results for testing on a Keras implementation of a linear regression task.

Improvement: Can SaturationTracker be added to only track layers in the specified list

Fully convolutional AutoEncoder

Hello,

I have developed AutoEncoder which is fully convolutional and I wanted to check what is the utilization of convolutional layers in it (no dense), but I am not able to do it with this module. Even though, it is written that conv layers are supported.

Add tests

Disable save in example

Setting save = False in

delve/examples/example.py

Line 42 in 6a2b594

    
           stats = CheckLayerSat('regression/h{}'.format(h), save_to="plotcsv", modules=layers, device=device, stats=["lsat", "lsat_eval"])

does not seem to have an effect.

TypeError with pytest

Running py.test,

delve/delve/writers.py

Line 469 in 6a2b594

if np.all(np.isnan(df.values[0])):

returns a TypeError:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''.

Contents of df.values[0]:

array([list([(tensor(-1.5482e-08, dtype=torch.float64), tensor(-1.3955e-09, dtype=torch.float64)), (tensor(-2.5362e-08, dtype=torch.float64), tensor(-2.5179e-09, dtype=torch.float64)), (tensor(-3.1511e-08, dtype=torch.float64), tensor(-3.6894e-09, dtype=torch.float64)), (tensor(-3.5553e-08, dtype=torch.float64), tensor(-4.1750e-09, dtype=torch.float64)), (tensor(-3.8271e-08, dtype=torch.float64), tensor(-4.4061e-09, dtype=torch.float64)), (tensor(-3.7972e-08, dtype=torch.float64), tensor(-2.7664e-09, dtype=torch.float64)), (tensor(-3.7489e-08, dtype=torch.float64), tensor(-1.7852e-09, dtype=torch.float64)), (tensor(-3.7178e-08, dtype=torch.float64), tensor(-1.3027e-09, dtype=torch.float64))])],
      dtype=object)

[JOSS review] Doc nitpicks

Things I noticed while reading the docs:

Spurious indices and tables link on the saturation page.
If I understand correctly then CheckLayerSat is the only way your users should interact with the library. In this case there's no need to include anything else in the API reference. Just focus on the essential API and exclude internal objects.
Broken link on top of Reference page.
I think the home page of your documentation fills a similar role as the GitHub README, in being the first point of interaction for new users where you should put your best foot forward. Right now your README is a lot more polished, so why not just include the README in the documentation home page and save yourself the hassle of maintaining both separately? (E.g. by converting the README to rst, see mpi4jax where we use this pattern.)
I would link more prominently to the integration with the tensorflow playground, which really does a great job of introducing the library! Love the gif.
Links under "dependencies" are broken (and the whole section is unnecessary IMO).
Emphasize more clearly what I should read to understand the theory behind Delve. You mention several papers but I think highlighting a specific one could be helpful.

(This is a part of the ongoing review at openjournals/joss-reviews#3992)

delve outdated examples

Traceback (most recent call last): File "example.py", line 39, in <module> "regression/h{}".format(h), "csv", model, device=device, reset_covariance=True, File "Z:\delve\delve\torchcallback.py", line 193, in __init__ self.timeseries_method = timeseries_method NameError: name 'timeseries_method' is not defined

Provide examples of usage

Examples of usage will make the configuration options easier to understand. The proposed documentation index is at https://github.com/delve-team/delve/blob/feature/sphinx/docs/source/index.rst. This divides examples, gallery, and usage into separate RST files.

Request: please provide some examples of logging and plotting examples in

https://github.com/delve-team/delve/blob/feature/sphinx/docs/source/examples.rst

and/or

https://github.com/delve-team/delve/blob/feature/sphinx/docs/source/gallery.rst

[JOSS review] API

I wonder if CheckLayerSat is really the best name for your main tracker object. The imperative sounds more like a function name to me, and Sat is so overloaded that it's not obvious what it stands for. I would probably use something like SaturationTracker or so.

But I understand that changing names in the public API can be a pain, so if you insist to keep it that's fine with me.

(This is a part of the ongoing review at openjournals/joss-reviews#3992)

Example console output unintelligible

The console output in examples/example.py in branch develop is unintelligible:

_axes 4457|                                                                                                                          | 0/2 [00:00<?, ?it/s]
_base 2514: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 23301.69it/s]
2522: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 76260.07it/s

It appears that the originally intended behavior of the tqdm output, showing saturation for each layer in the console (https://github.com/delve-team/delve/blob/master/images/output_screenshot.png), has diverged from the example.py at least as far back as 96b4b61 (August 2019).

Tqdm was originally introduced for convenience, before the additional writers were added. I presume a tensorboard writer will be more intereesting for most ML researchers, since it is more flexible and a standard in ML for experiment tracking.

A decision should be made whether to return the original layer-wise saturation console output to its former place, or to restrict console logging in the examples minimally to epoch progression and warnings. This could take a few days to debug.

[JOSS review] Pinning Pytorch

Is it really necessary to pin Pytorch to ==1.9.0? Seems quite restrictive to me, and makes the package harder to install (because if you e.g. do pip install delve and then pip install torchvision it gets overwritten again).

(This is a part of the ongoing review at openjournals/joss-reviews#3992)

Current Version of Delve has broken plots

When using the plotting writer in any training producing either saturation or intrinsic dimensionality plots the system crashed.

Plots from example.py are empty

Running python examples/example.py on the develop branch produces the following mostly empty plots in ./regression:

Proposed fix: understand why plots are empty, and preferably disable saving images by default.

[JOSS review] Test coverage

I suggest adding a service like codecov to see how much is actually covered by tests, and adding a badge to the README. There's no shame in not reaching 100% coverage, but if you don't measure it you won't know whether your tests work as intended.

(This is a part of the ongoing review at openjournals/joss-reviews#3992)

[JOSS review] Incorrect qualifiers

Qualifiers in setup.py:

        'Programming Language :: Python :: 3.4',
        'Programming Language :: Python :: 3.5',
        'Programming Language :: Python :: 3.6',

But since you have python_requires='>=3.6', this should probably be something like 3.6 through 3.10.

(This is a part of the ongoing review at openjournals/joss-reviews#3992)

Installation issue with readme file

Hi,

I am trying to install delve through pip install delve (pip v. 18.1 on python 3.6.6) and there appears to be an error regarding README.md file:

There appears this error:

Collecting delve
  Downloading https://files.pythonhosted.org/packages/13/06/70419c15e345c869fea16f9a730009c220501d2ab93891fb7157d56008fb/delve-0.1.5.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-2f5mmvyc/delve/setup.py", line 24, in <module>
        long_description=open("README.md").read(),
      File "/opt/conda/lib/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1587: ordinal not in range(128)
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-2f5mmvyc/delve/

I believe it is related to the animations you put into the readme which are treated as binary records, not acceptable by read() function.

Does it work with submodules?

Typically I use modules within nn.Sequential or custom-defined modules.

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        self.fc = torch.nn.Sequential(
            torch.nn.Linear(D_in, H),
            torch.nn.Linear(H, D_out)
        )

    def forward(self, x):
        return self.fc(x)

and then layers = model.parameters(). However, I get an error:

Traceback (most recent call last):
  File "example_submodule.py", line 43, in <module>
    stats = CheckLayerSat('regression/h{}'.format(h), layers)
  File "/Users/pmigdal/not_my_repos/delve/delve/main.py", line 50, in __init__
    self.layers = self._get_layers(modules)
  File "/Users/pmigdal/not_my_repos/delve/delve/main.py", line 167, in _get_layers
    for name in modules.state_dict().keys():
AttributeError: 'generator' object has no attribute 'state_dict'

(for full code example, see: https://gist.github.com/stared/b598c03ade397baf3fa03c52bd79e90d)

Does it work with submodules?

Idiomatic 1.0 code

I see that there are quite a few constructions which are obsolete in Python 0.4+.

For example: Variable. Plus, it seems that .to(device) is the preferred method to keep transfer to GPU (or not, if not available).

delve-team / delve Goto Github PK

delve's People

Contributors

Stargazers

Watchers

Forkers

delve's Issues

Recommend Projects

Recommend Topics

Recommend Org