Code Monkey home page Code Monkey logo

econ_layers's Introduction

Pytorch Layers for Economics Applications

image Build Status

Pytorch

Features

  • Exponential layer
  • Flexible multi-layer neural network with optional nonlinear last layer
  • Affine rescaling of output by an input

Development

To publish a new relase to pypi,

  1. Ensure that the CI is passing
  2. Modify setup.py to increment the minor version number, or a major version number once API stability is enforced
  3. Choose the "Releases" on the github page, then "Draft a new relase"
  4. Click on "Choose a tag" and then type a new release tag with the v followed by the version number you modified to be consistent
  5. After you choose "Publish Release" it will automatically push to pypi, and you can change compatability bounds in downstream packages as required.

Credits

This package was created with Cookiecutter and the giswqs/pypackage project template.

econ_layers's People

Contributors

jbrightuniverse avatar janrosa1 avatar jlperla avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

shizelong1985

econ_layers's Issues

cli.config will not be a dict after PL>=1.6

in test_jsonargparse.py, cli.config["model"]["ml_model"] will need to be converted to cli.config.as_dict()["model"]["ml_model"] or perhaps separated into two lines once the release is out

Do we need Optional in the FlexibleSequential?

Is the Optional for https://github.com/HighDimensionalEconLab/econ_layers/blob/main/econ_layers/layers.py#L64 and https://github.com/HighDimensionalEconLab/econ_layers/blob/main/econ_layers/layers.py#L66 is necessary for it to work? Typically we would only want the Optional if None also makes sense (e.g. the rescaling_layer needs that, for example, in https://github.com/HighDimensionalEconLab/econ_layers/blob/main/econ_layers/layers.py#L68 Not an important point, but just something to check at some point when you next update the repo.

Implement trainable diagonal exponential rescaling functions

Might see something here: https://towardsdatascience.com/how-to-build-your-own-pytorch-neural-network-layer-from-scratch-842144d623f6

The key is that it needs to have its internal weights as trainniable parameters for pytorch, rather than as fixed values.

If the input is x : R^N then for all of these, we want to map to a N x N matrix (which is diagonal in this case). We can then take that matrix and multiply it by the input to rescale. For the diagonal, the multiplication is then just a pointwise multiplication.

To start with, implement the pointwise f(x) = exp( D x) for a diagonal D and input x. i.e.

f(x_1) = exp(D_1 x_1)
f(x_2) = exp(D_2 x_2)
...
f(x_N) = exp(D_N x_N)

which is a N-parameter learnable function (i.e. self.D = torch.nn.Parameter(torch.zeros(N)) etc.)

Code that might not be that far off is

# Given an input y this calculates:
# exp(D x) * y for the pointwise multiple and exponential
class DiagonalExponentialRescaling(nn.Module):
    def __init__(self, n_in):
        super().__init__()
        self.n_in = n_in
        self.weights = torch.nn.Parameter(torch.Tensor(n_in))
        self.reset_parameters()
        
    def reset_parameters(self):
        # Lets start at zero, but later could have option
        torch.nn.init.zeros_(self.weights) # maybe this?  Not entirely sure.        
    
    def forward(self, x, y):
        exp_x = torch.exp(torch.mul(x, self.weights))  # exponential of input
        return torch.mul(exp_x, y)

Because this will be relatively low dimensional in parameters, we will need to make sure we start at the right place. Maybe even D_n = 0 is a better initial condition then something totally random.

Note that we would call this with two inputs

model = DiagonalExponentialRescaling(5)
x = torch.tensor([...])
y = torch.tensor([...]) # maybe coming out of another network
out = model(x, y)

For the unit tests:

Add rescaling by diving by a variable

This is a rescaling layer which takes in an index for the input and then divides by that value.

There are no trainable parameters in this one, which make it a lot easier to implement. Something like

class InputRescaling(nn.Module):
    def __init__(self, rescale_index):
        super().__init__()
        self.rescale_index= rescale_index
    def forward(self, x, y):
        return y / x[self.rescale_index]

Or something like that. @Mekahou would that do it?

Then after this, we should hook it up into the flexible layers as implementd in #2

Then if you wanted to have a flexible neural network that rescales by the 0th argument (e.g. by k if c(k,z) then you can do somehthing like

mod = FlexibleSequential(2, 1, layers = 3, hidden_dim = 128,
                                          RescalingLayer = InputRescaling,
                                          rescaling_layer_kwargs= {"rescale_index" : 0})

@kahou this would do it for the recursive, right?

Move pretraining callback into the package

The current code is

import pytorch_lightning as pl
import pytorch_lightning.callbacks

class ResetOptimizers(pl.Callback):
    def __init__(self,verbose):
        super().__init__()
        self.verbose = verbose

    def on_train_epoch_end(self, trainer, pl_module):
        if trainer.current_epoch == pl_module.hparams.pretrain_epochs - 1:
            if self.verbose:
                print("\nPretraining complete, resetting optimizers and schedulers")
            trainer.accelerator.setup_optimizers(trainer)
  • Add a new file called callbacks.py so it can be included with import econ_layers.callbacks etc.
  • I think we should change it to something where the field for the pretrain epochs is not hardcoded
class ResetOptimizers(pl.Callback):
    def __init__(self,verbose: bool,
                      epoch_reset_field : str = "pretrain_epochs"):
        super().__init__()
        self.verbose = verbose
        self.epoch_reset_field= epoch_reset_field 

    def on_train_epoch_end(self, trainer, pl_module):
        reset_epoch = getattr(pl_module.hparams, self.epoch_reset_field) - 1
        if trainer.current_epoch == reset_epoch :
            if self.verbose:
                print("\nPretraining complete, resetting optimizers and schedulers")
            trainer.accelerator.setup_optimizers(trainer)
  • Need to add in pytorch lightning as a dependency to this package
  • For jsonargparse to work well, you should also add type information to the fields.. I tried to add it above but may have made a mistake

More advanced rescaling by inputs

If we find that we want more control over what inputs are rescaled, then something like

class RescaleInputsbyInput(nn.Module):
    def __init__(self, rescale_index, inputs_to_rescale = None):
        super().__init__()
        self.rescale_index = rescale_index

        # if inputs_to_rescale is None, then it assume all except the current one.
        if inputs_to_rescale is None:
            self.inputs_to_rescale = # generate all indices except the rescale_index one.
       else:
           self.inputs_to_rescale = inputs_to_rescale
        
    def forward(self, x):
        rescale_scalar = 1 / x[self.rescale_index]        
        new_x = # multiple all indices in "inputs_to_rescale" by "rescale_scalar".
        return new_x

Add support for an input rescaling to complement the output rescaling

After #15 The current rescaling layesr doesn't do anything to the network on the inside. So it can only represent the following, for a given i index.

x_i NN(x_1, .... x_N)

But we also want to be able to do things like NN(x_1/x_i,, x_2/x_i, x_i, x_N/x_i) or something like that to represent functions that are "close to" homogenous of degree 1 or x_i NN(x_1/x_i, x_2/x_i, ... x_i, x_N/x_i). That is, it divides everything by x_i except the x_i term, and then it rescales back by the x_i. This is used in an "almost guess-and-verify" for homothetic problems. Almost because it isn't forcing the NN to be independent of x_i... it just makes it easy.

Since there may be more elaborate types of rescaling, lets leave it relatively general and follow the pattern of the newly renamed outputrescaling in #15

To do this

        self.input_rescaling_layer_args = input_rescaling_layer_kwargs
        self.output_rescaling_layer_args = output_rescaling_layer_kwargs
        self.InputRescalingLayer = InputRescalingLayer
        self.OutputRescalingLayer = OutputRescalingLayer

        if not self.InputRescalingLayer == None:
            self.rescale_input = InputRescalingLayer(**input_rescaling_layer_kwargs)
        else:
            self.rescale_input= None

        if not self.OutputRescalingLayer == None:
            self.rescale_output= OuputRescalingLayer(**output_rescaling_layer_kwargs)
        else:
            self.rescale_output= None
  • Then the forward is something like the following
    def forward(self, input):
        rescaled_input = input if self.input_rescale is None else self.rescale_input(input)
        out = self.model(rescaled_input)  # pass through to the stored net
        if not self.RescalingLayer is None:
            return self.rescale_output(input, out)  # note that the rescaling doesn't use the rescaled inputs
        else:
            return out
  • Then, we can come up with a simple network to do this in the "guess almost homethetic" case. Something like the following is a good place to start. It would assume there is only a single value that is not rescaled. Then you could do
class RescaleAllInputsbyInput(nn.Module):
    def __init__(self, rescale_index):
        super().__init__()
        self.rescale_index = rescale_index
        
    def forward(self, x):
        rescale_scalar = 1 / x[self.rescale_index]
        return torch.stack(x[0:rescale_index]* rescale_scalar, x[rescale_index], x[rescale_index+1,:]* rescale_scalar) # or whatever it correct.

Add affine term in RescaleOutputByInput

https://github.com/HighDimensionalEconLab/econ_layers/blob/main/econ_layers/layers.py#L21-L31

Maybe something like

class RescaleOutputsByInput(nn.Module):
    def __init__(self, rescale_index: int = 0, bias=False):
        super().__init__()
        self.rescale_index = rescale_index
        if bias:
            self.bias =  torch.nn.Parameter(torch.Tensor(1))
        else:
            self.register_parameter('bias', None)        
        self.reset_parameters()            

    def forward(self, x, y):
        if x.dim() == 1:
            return x[self.rescale_index] * y + self.bias
        else:
            return x[:, [self.rescale_index]] * y + self.bias

Check if the patch for the ReduceLRonPlateau works

i.e. Lightning-AI/pytorch-lightning#10850

If so, then I think what we do is put the class FutureLightningCLI(LightningCLI): into the econ_layers and then in downstream users can use that instead of from the LightningCLI directly? If that works, it is an easy patch for setups where we want to try the plateau, and we can just swap back when the new release occurs.

@jbrightuniverse It might be worth trying this sooner than later if the symmetry paper ends up sensitive to the LR, but lets get it working with the step/exponential ones first.

Code to ignore certain lightning warnings

Right now we have

warnings.filterwarnings(
    "ignore",
    category=UserWarning,
    module="pytorch_lightning.trainer.data_loading",
    lineno=102,
)


warnings.filterwarnings(
    "ignore",
    category=UserWarning,
    module="pytorch_lightning.trainer.callback_hook",
    lineno=100,
)


warnings.filterwarnings(
    "ignore",
    category=UserWarning,
    module="torch.optim.lr_scheduler",
    lineno=129,
)

After a version bump we can revisit these and see what is still relevant.

The problem is that the lineno changes with each PL version (or torch for the lr_scheduler warning), so we can see if it is possible to do an "if" and then change it depending on what version we are at. Then when new PL versions come out we can just add a case to the if...

Add utilities functions

  • Add a utilities.py
    For now, can just put in
def squeeze_cpu(x):
    return x.detach().squeeze().cpu().numpy() if torch.is_tensor(x) else x
def dict_to_cpu(d):
    return {name: squeeze_cpu(val).tolist() for (name, val) in d.items()}
  • Add in a simple unit test. might need to addd numpy and torch to the requirements for the pckage.

Add in a utility to run and test CLI model without the CLI

Maybe something like

def solve_cli_model(Model, args, config_file, default_seed = 123, ):
    sys.argv = ["dummy.py"] + [f"--{key}={val}" for key, val in args.items()]  # hack overwriting  argv

    cli = LightningCLI(
        Model,
        run=False,
        seed_everything_default=default_seed,
        save_config_overwrite=True,        
        parser_kwargs={"default_config_files": [config_file]},
    )
    # Solves the model
    trainer = cli.instantiate_trainer(
        logger=None,
        checkpoint_callback=None,
        callbacks=[],  # not using the early stopping/etc.
    )    
    trainer.fit(cli.model)

    # Calculates the "test" values for it
    trainer.test(cli.model)
    cli.model.eval()  # Turn off training mode, where it calculates gradients for every call.

    return cli.model, cli

Except maybe give a few options with defaults as:

  • use_logger = False, turns off the logger if they ask.
  • checkpoint_callback = False
  • callbacks = False where if False it zeros them out. Otherwise leaves them be.
  • test = True, for whe3ther to run the test or not.

Etc.

Add moments layer

A layer that takes R^1 -> R^M for the first M moments. Can get fancy later by making it work for things other than one dimenionslon inputs and for picking specific moments

class Moments(nn.Module):
    def __init__(
        self,
        n_moments: int,
    ):
    def forward(self, input):
        return torch.cat([input.pow(m) for m in torch.range(1, self.num_moments)], 1) # or something like that.

A few things:

  • Note that this has no learnable parameters.
  • Make sure to test it with dispatching over a batch in the typical way required for the symmetry paper.

After that you can use this in a configuation file like

model:
  phi:
    class_path: econ_layers.layers.Moments
    init_args:
      n_moments: 3

@jbrightuniverse @arnavs

Hook up rescaling into the main flexible layers for the scalar exponential case

After #1 is complete, we can add it as an option into the FlexibleSequential

I already added in the https://github.com/HighDimensionalEconLab/econ_layers/blob/main/econ_layers/layers.py#L30-L31 function

What is needed is then to use the rescaling in the forward. I think basically https://github.com/HighDimensionalEconLab/econ_layers/blob/main/econ_layers/layers.py#L69-L74

Becomes something like

    def forward(self, input):
        out = self.model(input)  # pass through to the stored net
        if not self.RescalingLayer is None:
            return rescale(input, out) 
        else:
            return out

And construction of the FlexibleSequential could be

mod = FlexibleSequential(2, 2, layers = 3, hidden_dim = 128,
                                          RescalingLayer = ScalarExponentialRescaling,
                                          rescaling_layer_kwargs= {}  # add if required? or whatever it is....

Or something like that.

For naming of arguments, kwargs, etc see other classes and strive for consistency

Implement trainable scalar rescaling

The scalar version, something like

# Scalar rescaling.  Only one parameter.
class ScalarExponentialRescaling(nn.Module):
    def __init__(self, n_in):
        super().__init__()
        self.n_in = n_in
        self.weight = torch.nn.Parameter(torch.Tensor(1)) # only one parameter to "learn"
        self.reset_parameters()
        
    def reset_parameters(self):
        # Lets start at zero, but later could have option
        torch.nn.init.zeros_(self.weight) # maybe this?  Not entirely sure.
        
    
    def forward(self, x, y):
        exp_x = torch.exp(self.weight * x)  # exponential of input
        return torch.mul(exp_x, y)        

Input rescaling: problem with data/batch dimmesion

In InputRescaling for def forward(self, x, y): return y * x[self.rescale_index] the rescaling layer multiply the NN output by the first batch element (in this case first tensor [z_0,k_0]) instead of the first data coordinate (in this case by z_i given the i-th batch element).

See if a specialized trainer supporting pre_fit for cli/etc. is feasible

From mauricio on slack:

Just in case you find it useful I give a different idea which I would probably take if doing something like this. Note that LightningCLI can receive as input the trainer_class. The reason for this is for users to be able to extend the lightning trainer class when needed. A possibility could be to extend the fit method such that internally it would do some pretraining (disabling callbacks and loggers) and then call super().fit(). Another possibility would be to have a new method e.g. pre_fit which would implement this. Then the user would call first pre_fit and then fit. The reason why I would tend to go this way is that this fits more as training logic than cli logic

Rename rescaling to prepare for both input and output rescaling

The current rescalebyinputs layer doesn't do anything to the network on the inside. So it can only represent the following, for a given i index.

x_i NN(x_1, .... x_N)

Lets rename things to prepare for allowing for things like x_i NN(x_1/x_i, x_2/x_i, ... x_i, x_N/x_i).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.