cvxgrp / cvxpylayers Goto Github PK

Differentiable convex optimization layers

License: Apache License 2.0

Python 100.00%

cvxpylayers's Introduction

cvxpylayers

cvxpylayers is a Python library for constructing differentiable convex optimization layers in PyTorch, JAX, and TensorFlow using CVXPY. A convex optimization layer solves a parametrized convex optimization problem in the forward pass to produce a solution. It computes the derivative of the solution with respect to the parameters in the backward pass.

This library accompanies our NeurIPS 2019 paper on differentiable convex optimization layers. For an informal introduction to convex optimization layers, see our blog post.

Our package uses CVXPY for specifying parametrized convex optimization problems.

Installation
Usage
Examples
Contributing
Projects using cvxpylayers
License
Citing

Installation

Use the package manager pip to install cvxpylayers.

pip install cvxpylayers

Our package includes convex optimization layers for PyTorch, JAX, and TensorFlow 2.0; the layers are functionally equivalent. You will need to install PyTorch, JAX, or TensorFlow separately, which can be done by following the instructions on their websites.

cvxpylayers has the following dependencies:

Python 3
NumPy
CVXPY >= 1.1.a4
PyTorch >= 1.0, JAX >= 0.2.12, or TensorFlow >= 2.0
diffcp >= 1.0.13

Usage

Below are usage examples of our PyTorch, JAX, and TensorFlow layers. Note that the parametrized convex optimization problems must be constructed in CVXPY, using DPP.

PyTorch

import cvxpy as cp
import torch
from cvxpylayers.torch import CvxpyLayer

n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
constraints = [x >= 0]
objective = cp.Minimize(0.5 * cp.pnorm(A @ x - b, p=1))
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()

cvxpylayer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
A_tch = torch.randn(m, n, requires_grad=True)
b_tch = torch.randn(m, requires_grad=True)

# solve the problem
solution, = cvxpylayer(A_tch, b_tch)

# compute the gradient of the sum of the solution with respect to A, b
solution.sum().backward()

Note: CvxpyLayer cannot be traced with torch.jit.

JAX

import cvxpy as cp
import jax
from cvxpylayers.jax import CvxpyLayer

n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
constraints = [x >= 0]
objective = cp.Minimize(0.5 * cp.pnorm(A @ x - b, p=1))
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()

cvxpylayer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
key = jax.random.PRNGKey(0)
key, k1, k2 = jax.random.split(key, 3)
A_jax = jax.random.normal(k1, shape=(m, n))
b_jax = jax.random.normal(k2, shape=(m,))

solution, = cvxpylayer(A_jax, b_jax)

# compute the gradient of the summed solution with respect to A, b
dcvxpylayer = jax.grad(lambda A, b: sum(cvxpylayer(A, b)[0]), argnums=[0, 1])
gradA, gradb = dcvxpylayer(A_jax, b_jax)

Note: CvxpyLayer cannot be traced with the JAX jit or vmap operations.

TensorFlow 2

import cvxpy as cp
import tensorflow as tf
from cvxpylayers.tensorflow import CvxpyLayer

n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
constraints = [x >= 0]
objective = cp.Minimize(0.5 * cp.pnorm(A @ x - b, p=1))
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()

cvxpylayer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
A_tf = tf.Variable(tf.random.normal((m, n)))
b_tf = tf.Variable(tf.random.normal((m,)))

with tf.GradientTape() as tape:
  # solve the problem, setting the values of A, b to A_tf, b_tf
  solution, = cvxpylayer(A_tf, b_tf)
  summed_solution = tf.math.reduce_sum(solution)
# compute the gradient of the summed solution with respect to A, b
gradA, gradb = tape.gradient(summed_solution, [A_tf, b_tf])

Note: CvxpyLayer cannot be traced with tf.function.

Log-log convex programs

Starting with version 0.1.3, cvxpylayers can also differentiate through log-log convex programs (LLCPs), which generalize geometric programs. Use the keyword argument gp=True when constructing a CvxpyLayer for an LLCP. Below is a simple usage example

import cvxpy as cp
import torch
from cvxpylayers.torch import CvxpyLayer

x = cp.Variable(pos=True)
y = cp.Variable(pos=True)
z = cp.Variable(pos=True)

a = cp.Parameter(pos=True, value=2.)
b = cp.Parameter(pos=True, value=1.)
c = cp.Parameter(value=0.5)

objective_fn = 1/(x*y*z)
objective = cp.Minimize(objective_fn)
constraints = [a*(x*y + x*z + y*z) <= b, x >= y**c]
problem = cp.Problem(objective, constraints)
assert problem.is_dgp(dpp=True)

layer = CvxpyLayer(problem, parameters=[a, b, c],
                   variables=[x, y, z], gp=True)
a_tch = torch.tensor(a.value, requires_grad=True)
b_tch = torch.tensor(b.value, requires_grad=True)
c_tch = torch.tensor(c.value, requires_grad=True)

x_star, y_star, z_star = layer(a_tch, b_tch, c_tch)
sum_of_solution = x_star + y_star + z_star
sum_of_solution.backward()

Solvers

At this time, we support two open-source solvers: SCS and ECOS. SCS can be used to solve any problem expressible in CVXPY; ECOS can be used to solve problems that don't use the positive semidefinite or exponential cone (this roughly means that if you have positive semidefinite matrices or use atoms like cp.log, ECOS cannot be used to solve your problem via cvxpylayers). By default, cvxpylayers uses SCS to solve the problem.

Using ECOS

First make sure that you have cvxpylayers and diffcp up to date, by running

pip install --upgrade cvxpylayers diffcp

Then, to use ECOS, you would pass the solver_args argument to the layer:

solution = layer(*params, solver_args={"solve_method": "ECOS"})

Passing arguments to the solvers

One can pass arguments to both SCS and ECOS by adding the argument as a key-value pair in the solver_args argument. For example, to increase the tolerance of SCS to 1e-8 one would write:

layer(*parameters, solver_args={"eps": 1e-8})

If SCS is not converging, we highly recommend switching to ECOS (if possible), and if not, using the following arguments to SCS:

solver_args={"eps": 1e-8, "max_iters": 10000, "acceleration_lookback": 0}

Examples

Our examples subdirectory contains simple applications of convex optimization layers in IPython notebooks.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

Please lint the code with flake8.

pip install flake8  # if not already installed
flake8

Running tests

cvxpylayers uses the pytest framework for running tests. To install pytest, run:

pip install pytest

Execute the tests from the main directory of this repository with:

pytest cvxpylayers/{torch,jax,tensorflow}

Projects using cvxpylayers

Below is a list of projects using cvxpylayers. If you have used cvxpylayers in a project, you're welcome to make a PR to add it to this list.

License

cvxpylayers carries an Apache 2.0 license.

Citing

If you use cvxpylayers for research, please cite our accompanying NeurIPS paper:

@inproceedings{cvxpylayers2019,
  author={Agrawal, A. and Amos, B. and Barratt, S. and Boyd, S. and Diamond, S. and Kolter, Z.},
  title={Differentiable Convex Optimization Layers},
  booktitle={Advances in Neural Information Processing Systems},
  year={2019},
}

If you use cvxpylayers to differentiate through a log-log convex program, please cite the accompanying paper:

@article{agrawal2020differentiating,
  title={Differentiating through log-log convex programs},
  author={Agrawal, Akshay and Boyd, Stephen},
  journal={arXiv},
  archivePrefix={arXiv},
  eprint={2004.12553},
  primaryClass={math.OC},
  year={2020},
}

cvxpylayers's People

Contributors

Stargazers

Watchers

Forkers

mirca perfmjs hyzcn framework-learner iit-lab gavinzjchao ahoyosid codeaudit yyht arashbehboodi hhy5277 tchigher shaunstanislauslau harryprince pandinosaurus saumyasaxena drgona bolimath trendingtechnology carljohanrehn ameeransari michaeljyt braman09 diyano fagan2888 ananthc kp-forks pepsalehi qxzsilver1 imanojkumar marmotatzju andrejmuhic sqinging gakkilovemath arminarj hejia-zhang fannix williamwenda schaemar leeosha aku02 nadia-el amir9ume snehashischatterjee1997 nilsonfh jasonzhou404 sruthiskumar97 daydreamer2023 huning2009 animesh w601sxs tensoralex mzliu2017 vamsi1609 worthless443 rbarraud dianalyx ranganthaya xutete joskid aelphy jiaodaxiaozi szev77 sailfish009 vishalbelsare hyunp2 zdk-archive yangkai12 vanilsongomes jasonshere syguan96 pratham16cse minjunkang sjiang2018 nrt23 zihangm edwardt astghikhakobyan asclepiusinformatica jochenc leofionn weiziang1 jingmouren quantsimulator xiaolongguo tiffen lawrencematrix tfzhou samtalki tjevgerres yutiansut fdoperezi binnhe superumbrella mgandhi5 zuntao-art marcoavesani jaekyeom python-repository-hub xfeif

cvxpylayers's Issues

Could objective function be a pytorch model?

Hi. I have a question that could the objective function be a pytorch model? For example, I define a pytorch model and want to find the optimal input which makes the output of the model is minimum.

import torch
import cvxpy as cp
from cvxpylayers.torch import Cvxpylayer

class net(nn.Module):
    def __init__(self):
        super(net, self).__init__()
        self.lin = nn.Sequential(nn.Linear(20, 100), nn.ReLU(), nn.Linear(100, 1))

    def forward(self, x):
        return self.lin(x)

x = cp.Parameter(20)
model = net()
objective = cp.Minimize(model(x))
constraints = [x > 0]
problem = cp.Problem(objective, constraints)
cvxpylayer = CvxpyLayer(problem, parameters=[x], variables=[model.parameters()])
...

The above code is not right, but that's what I mean. I have no idea about how to correct it. Could you give me some advice? Thank you.

Solver scs returned status Unbounded/Inaccurate

I occasionally face the following error when using the package for different matrices A and b. It does not happen permanently, but I have no idea when it happens. It especially happens when I set sigma_value in the following code to less than 1e-4

def tf_solver(A: np.ndarray, b: np.ndarray, sigma_value: float):

    A = tf.constant(A)
    b = tf.constant(b)
    x_tf = cp.Variable((A.shape[1]))
    A_tf = cp.Parameter(A.shape)
    b_tf = cp.Parameter((b.shape[0]))

    constraints = [cp.norm(A_tf @ x_tf - b_tf, p=2) <= sigma_value]
    objective = cp.Minimize(cp.norm(x_tf, p=1))
    problem = cp.Problem(objective, constraints)

    assert problem.is_dpp()

    cvxpylayer = cvxlayer(problem, parameters=[A_tf, b_tf], variables=[x_tf])

    with tf.GradientTape() as tape:
        x, = cvxpylayer(A, b)
        summed_solution = tf.math.reduce_sum(x)

    gradA, gradb = tape.gradient(summed_solution, [A, b])
    grad = [gradA, gradb]

    residue = tf.tensordot(A, x, 1) - b

    return x.numpy(), grad, residue.numpy()

69 cs_procedure
x, grad, residue = tf_solver(A=precond_bi_orthonormal_matrix, b=precond_func_sampling_vector, sigma_value=sigma_value)
csUtils.py 30 tf_solver
x, = cvxpylayer(A, b)
cvxpylayer.py 109 call
return compute(*parameters)
custom_gradient.py 258 call
return self._d(self._f, a, k)
custom_gradient.py 212 decorated
return _eager_mode_decorator(wrapped, args, kwargs)
custom_gradient.py 409 _eager_mode_decorator
result, grad_fn = f(*args, **kwargs)
cvxpylayer.py 108
lambda *parameters: self._compute(parameters, solver_args))
cvxpylayer.py 160 _compute
A=A, b=b, c=c, cone_dict=self.cones, **solver_args)
cone_program.py 212 solve_and_derivative
A, b, c, cone_dict, warm_start=warm_start, mode=mode, **kwargs)
cone_program.py 262 solve_and_derivative_internal
raise SolverError("Solver scs returned status %s" % status)
diffcp.cone_program.SolverError:
Solver scs returned status Unbounded/Inaccurate

Warm start

Both layers should support warm-starting a solve at the solution to the previous call.

huge memory cost in initialization with high dimension input

This may be related to #33

I am trying cvxpylayer on a problem with dimension 4k. It takes quite a while to initialize the layer. Moreover, the memory cost is also huge with 400G consumption. The memory is NOT released after the initialization.

ValueError: cannot reshape array of size -1390450388 into shape (1004, 'newshape')

The following error occurs after constructing a cvxpylayer (on Windows). The same problem does not occur on macOS or Linux.

ValueError: cannot reshape array of size -1390450388 into shape (1004, 'newshape')

This may be an issue internal to cvx or one of the solvers on the backend. Looking at the stack trace one of the lines in the stack trace is

File "C:\Users\Uro\Anaconda3\lib\site-packages\cvxpy\reductions\solvers\conic_solvers\conic_solver.py", line 271, in format_constraints
    reshaped_A = problem.A.reshape(restruct_mat.shape[1], -1, order='F').tocsr()

There is a comment mentioning a possible int overflow error when reshaping with scipy. Maybe this is related to the issue?

Here is an example of code that produces this problem (minus a large piece of data).

central_S = ...
n_hos = 2
n_structures = central_S.shape[1]
n_h_t_combos = central_S.shape[0]
control_strength = 3.0


x1 = cp.Variable(n_structures)
s = cp.Parameter( (n_h_t_combos, n_structures) )  # valid structures
w = cp.Parameter(n_structures)  # structure weight
z = cp.Parameter(n_structures)  # control parameter
b = cp.Parameter(n_h_t_combos)  # max bid


constraints = [x1 >= 0, s @ x1 <= b]  # constraint for positive allocation and less than true bid
objective = cp.Maximize( (w.T @ x1) - control_strength * cp.norm(x1 - z, 2) )
problem = cp.Problem(objective, constraints)

l_prog_layer = CvxpyLayer(problem, parameters=[s, w, b, z], variables=[x1])

A running version of this code including the large central_S matrix is below.

https://gist.github.com/urolyi1/fe94153f7a872046f1f73274391f6c51

Does it support sparse optimization?

Does the toolbox support differentiable optimization with big sparse matrix ?

bad_alloc error

Hi, I am getting bad_alloc error on executing following piece of code. If I run with m=300,p=400 it runs ok. Is there any solution or any hack to avoid running into memory issues like this?

import cvxpy as cp
import torch 
from cvxpylayers.torch import CvxpyLayer
n, m,p = 64, 3000,4000
x = cp.Variable((p))
w = cp.Parameter((p))
b = cp.Parameter((m))
A = cp.Parameter((m,p))
constraints = [A@x==b]
objective = cp.Minimize(w@x)
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()
cvxpylayer = CvxpyLayer(problem, parameters=[A,w,b], variables=[x])

Slow initialization of CVXPYlayer when dimensionality is large

I have the example code below, which just constructs an unconstrained quadratic program that is DPP. I time both the CVXPY problem and the CVXPYlayer initializations, while manually changing the problem dimensionality n.

On my dual-core MacBook, when I run the code with n = 10, it is done quickly (about 26 ms). With n = 100, constructing the CVXPY problem takes less than 1 ms, but the CVXPYlayer initialization takes around 42 seconds. For n = 1000, it is even longer (although I haven't yet let it finish running).

Any idea why this takes so long? I am not sure if this is a bug per se or expected behavior.

Versions: CVXPY 1.1.0a1, CVXPYlayers 0.1.2, macOS Catalina (10.15.1)

import cvxpy as cvx
from cvxpylayers.torch import CvxpyLayer
import time

n = 10
Q_sqrt = cvx.Parameter([n, n])
x = cvx.Variable(n)
objective = 0.5 * cvx.sum_squares(Q_sqrt @ x)

print('Constructing CVXPY problem ... ', end='')
start = time.time()
prob = cvx.Problem(cvx.Minimize(objective), [])
assert prob.is_dcp()
assert prob.is_dpp()
end = time.time()
print('done! (%.3f s)' % (end - start))

print('Constructing differentiable CVXPY layer ... ', end='')
start = time.time()
layer = CvxpyLayer(prob, parameters=[Q_sqrt], variables=[x])
end = time.time()
print('done! (%.3f s)' % (end - start))

Long time required in solving

Hi,
I was trying to solve an optimisation problem and witnessed that using cvxpy, it takes approximately ~0.4 seconds while using cvxpylayers, it keeps varying with the minimum being 1.5+ seconds. Can you please let me know if I am doing any mistake or any reason why it is so? The code is attached underneath:

`import numpy as np
import cvxpy as cp
import time
import torch
from cvxpylayers.torch import CvxpyLayer

device = 'cuda' if torch.cuda.is_available() else 'cpu'
m =100
n =60
m1 = m - n
fraction = 0.2 # fraction of original points to be selected
print("Existing and Incoming set count "+str(m1)+","+str(n))

def subsetLayer(D_in, D_exin):

finc = cp.Parameter((D_in, D_in))
fex = cp.Parameter((D_in, D_exin))

Zn = cp.Variable((D_in, D_in))
Zo = cp.Variable((D_in, D_exin))

obj = cp.sum(cp.multiply(fex, Zo))+cp.sum(cp.multiply(finc, Zn))

M = cp.sum(Zn, axis=1) + cp.sum(Zo,axis=1)
N_p = cp.norm(Zn, p=1, axis=0)
est_size = cp.norm(N_p, p=1)
constraints = [0 <= Zo, Zo <= 1, 0 <= Zn, Zn <= 1, M == 1, est_size <= fraction*D_in]
objective = cp.Minimize(obj)
prob = cp.Problem(objective, constraints)
print("Solving using cvxpylayers")
layer = CvxpyLayer(prob, parameters=[finc, fex], variables=[Zn])

return layer

D_in = 60
D_exin = 40

layer = subsetLayer(D_in,D_exin)

Do = np.random.randn(n, m1)
Dn = np.random.randn(n,n)

Doc = torch.from_numpy(Do).float().to(device)
Dnc = torch.from_numpy(Dn).float().to(device)

cvstart = time.time()
zv = layer(Dnc, Doc)[0]
print("Using CVXPYLayers: Existing "+str(D_exin)+" Incoming "+str(D_in))
print(time.time() - cvstart)

############## CVXPY####################

cvstart = time.time()
Zn = cp.Variable((n, n))
Zo = cp.Variable((n, m1))

obj = cp.sum(cp.multiply(Do, Zo))+cp.sum(cp.multiply(Dn, Zn))
M = cp.sum(Zn, axis=1) + cp.sum(Zo,axis=1)
N_p = cp.norm(Zn, p=1, axis=0)
est_size = cp.norm(N_p, p=1)
constraints = [0 <= Zo, Zo <= 1, 0 <= Zn, Zn <= 1, M == 1, est_size <= fraction*n]

objective = cp.Minimize(obj)
prob = cp.Problem(objective, constraints)
print("Solving using cvxpy")
result = prob.solve()
print("Using CVXPY: Existing "+str(m1)+" Incoming "+str(n))
print(time.time()-cvstart)`

Thanks!

Jax support

This would broadly involve:

Adding a jax directory that follows the same structure as the PyTorch/Tensorflow ones to our main module and appropriately add that layer: https://github.com/cvxgrp/cvxpylayers/tree/master/cvxpylayers
Filling in the details of the module and tests
Updating the README/image to include Jax
Adding some Jax examples: https://github.com/cvxgrp/cvxpylayers/tree/master/examples

\cc @gsp-27 who has expressed interest in helping with this -- please let us know if you make any progress here!

gradient inconsistency problem

Sorry I cannot stop digging into the gradient problem and want to figure out whether it is a bug or a computation precision problem. Wonder if there is a quick fix. Thanks!

This may be related to previous issues of #69 and #67

Suppose the loss is composed of two parts l = l1 + l2, then the gradient should following the same addition rule:
$\delta l_w = \delta l1_w + \delta l2_w. However, the following example shows a significant gap between them:

import numpy as np
import torch
# torch.set_default_dtype(torch.double)

import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.parameter import Parameter

import cvxpy as cp
from cvxpylayers.torch import CvxpyLayer

N = 200
M = 100
BOUND = 0.1
NORM_BOUND = 10
NUM_STEPS = 3

class Opt(nn.Module):
  def __init__(self, weight=5.0):
    super(Opt, self).__init__()
    self.weight = Parameter(
      torch.tensor([[weight]], dtype=torch.double, requires_grad=True))
    self.L = torch.randn(N, M)
    L_diag = torch.rand(N)
    L_diag[:N//2] *= 0.1
    L_diag[N//2:] *= 10
    self.L_diag = torch.diag(L_diag)

    self.L = self.L.double()
    self.L_diag = self.L_diag.double()

  def forward(self, x):
    num_steps = x.shape[0]
    xw = torch.matmul(x, self.weight).squeeze()
    init_v = torch.zeros(x.shape[1])
    init_v = init_v.double()
    v_list = []
    for i in range(num_steps):
      _p = cp.Parameter((N))
      _prev = cp.Parameter((N))
      _v = cp.Variable((N))
      obj = cp.Minimize(0.5*cp.sum_squares(self.L.T@_v)
                        + 0.5*cp.sum_squares(self.L_diag@_v)
                        + 50*cp.sum_squares(_v - _prev)
                        - _v@_p)
     
      cons = [_v <= 0.1,
              -_v <= 0.1,
              cp.norm(_v, 1) <= 10
      ]
      prob = cp.Problem(obj, cons)

      _prev.value = init_v.detach().numpy()
      _p.value = xw[i].detach().numpy()

      prob.solve(solver=cp.ECOS)
      v_ecos = _v.value
      if prob.status != "optimal":
        print(prob.status)

      self.cvxopt = CvxpyLayer(prob, [_p, _prev], [_v])

      v, = self.cvxopt(xw[i], init_v, solver_args={"eps": 1e-6, "max_iters": 200000})

      # print("ecos gap abs {:.6f} ".format(
      #   np.max(np.abs(v_ecos - v.detach().numpy())),
      #   ))

      init_v = v
      v_list.append(v)
    v = torch.stack(v_list)

    return v

def main():
  torch.manual_seed(0)
  opt = Opt(weight=5.0)

  x = torch.randn(N, 1)
  x = x.unsqueeze(0).repeat(NUM_STEPS, 1, 1)
  y = 0.1*x[-1] + torch.randn(N, 1)

  x, y = x.double(), y.double()

  optimizer = optim.Adam(opt.parameters(), 1e-1)
  for i in range(1):
    v = opt(x)
    loss1 = -torch.sum(y*v[-1])
    loss2 = torch.sum(v**2)*5

    loss = loss1
    optimizer.zero_grad()
    loss.backward(retain_graph=True)
    # optimizer.step()
    l = loss.detach().numpy().tolist()
    w = opt.weight.detach().numpy()
    g = opt.weight.grad.detach().numpy()
    g1 = g.copy()
    print("loss1 {:.3f} weight {:.3f} "
          "grad {:.3f} ".format(
            l, w[0][0], g[0][0],
          ))

    loss = loss2
    optimizer.zero_grad()
    loss.backward(retain_graph=True)
    # optimizer.step()
    l = loss.detach().numpy().tolist()
    w = opt.weight.detach().numpy()
    g = opt.weight.grad.detach().numpy()
    print("loss2 {:.3f} weight {:.3f} "
          "grad {:.3f} ".format(
            l, w[0][0], g[0][0],
          ))
    g2 = g.copy()

    loss = loss1 + loss2
    optimizer.zero_grad()
    loss.backward()
    # optimizer.step()
    l = loss.detach().numpy().tolist()
    w = opt.weight.detach().numpy()
    g = opt.weight.grad.detach().numpy()
    print("loss {:.3f} weight {:.3f} "
          "grad {:.3f} ".format(
            l, w[0][0], g[0][0],
          ))
    print("gap between grad {:.3f}".format(g[0][0]-(g1[0][0]+g2[0][0])))

Output:

loss1 -8.689 weight 5.000 grad 0.563
loss2 8.213 weight 5.000 grad 1.798
loss -0.476 weight 5.000 grad 1.999
gap between grad -0.362

Error in AA type 1

CvxpyLayer Pytorch Layers

I am calling cvxpylayers to solve a simple LP problem. A Pytorch model is being trained based on the output of cvxpylayers. Following is my simple code.

 x = cp.Variable(n)
 A = cp.Parameter((1, n))
 b = cp.Parameter(1)
 c = cp.Parameter(n)
constraints = [x >= 0,x<=1,A @ x <= b]
objective = cp.Maximize(c @ x)
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()
cvxpylayer = CvxpyLayer(problem, parameters=[A, b,c], variables=[x])

sol, = cvxpylayer(A_trch, b_trch,c_trch)
sol.sum().backward()

But I am observing an error message like this and I find my model parameters are not updated.

Error in AA type 1, iter: 2582, info: 0, norm 2.25e+09

Any help regarding this would be highly appreciated. Thanks.

Sharing parameters among multiple problems?

Hello guys,

Is it possible to share parameters among multiple problems and then compute gradients of shared parameters?

E.g.

layer1 = CvxpyLayer(prob1, parameters=[shared_param], variables=[h1])
layer2 = CvxpyLayer(prob2, parameters=[shared_param], variables=[h2])
layer3 = CvxpyLayer(prob3, parameters=[shared_param], variables=[h3])

sol1, = layer1(shared_param_tf)
sol2, = layer2(shared_param_tf)
sol3, = layer3(shared_param_tf)

obj = func(sol1, sol2, sol3)

grad_shared_param = tape.gradient(obj, [shared_param_tf])

FYI, I do get some results when I run something like the above.

How to obtain differentiable DPP layer ?

Hi,

your work is really great, thanks for making it available!

I am having problems running one instance, though. It would be great if you can help.

Let
X = A A^T
be a matrix. I am trying to optimize

|| A - Y ||, s.th.

X= ((1/p)* tr(X)) *I,

where
I
denotes the identity matrix.

My approach:

A = cp.Variable(n)
Y = cp.Parameter(n)
X = A * A.T
tr = cp.trace(X)
identity = np.eye(n)
obj = cp.Minimize(cp.norm(A-Y,'fro'))
diag_value =  (1/p)*tr
constr = [X == diag_value * identity]
problem = cp.Problem(obj,constr)
layer = CvxpyLayer(problem,parameters = [Y],variables = [A])

However, it says its not DPP.
Is it possible to make it DPP?
Can you explain me how to do that?

Thanks alot

Returning and differentiating through the dual

With diffcp, we can use adjoint_derivative to differentiate through the primal and dual solutions w.r.t. the coefficients. It would be convenient to expose the dual here too and the only part that's not immediately clear to me if it'll be easy or not is that we'll need to map from the cone program's solution back to the dual variables of the CVXPY constraints.

@sbarratt @akshayka -- I remember you mentioning this at some point, have you given any more thought to it?

MIP support in cvxpylayers

Hi, wanted to know if there is any way to support MIP in cvxplayers. Mainly want to constrain the variable to take values 0 or 1. My idea is to use sigmoid activation on the real variable. But then putting integer like constraints will be quite difficult. Is there any other way?

SoftmaxLayer example

Dear cvxpylayers team,

Thanks for sharing the great work.

I want to use the softmax layers in pytorch just like the ReLU layer in example folder.
https://github.com/cvxgrp/cvxpylayers/blob/master/examples/torch/ReLU%20Layers.ipynb

The expected input is a tensor with shape=[batch, channel, x, y, z] and output is the softmax results along the channel dim.
Just like torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None)
https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.softmax.

However, I have trouble to implement it.

class SoftmaxOptiLayer(torch.nn.Module):
    """
    Softmax as an optimization layer
    An alternative of torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None)

    Parameters:
    input: logits, shape=[batch, channel, x, y, z]
    dim: A dimension along which softmax will be computed.

    """
    def __init__(self, dim=None):
        super(SoftmaxOptiLayer, self).__init__()
        self.dim = dim

    def forward(self, input):
        # when x is batched, repeat W and b 
        _x = cp.Parameter(input)
        _y = cp.Variable(input)
        obj = cp.Minimize(-_x*_y - cp.sum(cp.entr(_y)))
        cons = ? #  how to compute the inner product along the specified dim?
        prob = cp.Problem(obj, cons)
        layer = CvxpyLayer(prob, parameters=[_x], variables=[_y])

        return layer(input)

In addition to ReLU, softmax is also a commonly used layer in CNNs.

Would it be possible for you to add a SoftmaxLayer in the examples/torch folder?

Looking forward to your reply.

Best regards,
Jun

Better handle when inputs are on different devices

import cvxpy as cp
import torch 
from cvxpylayers.torch import CvxpyLayer

n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
constraints = [x >= 0]
objective = cp.Minimize(0.5 * cp.pnorm(A @ x - b, p=1))
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()

cvxpylayer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
A_tch = torch.randn(m, n, requires_grad=True).cuda()
b_tch = torch.randn(m, requires_grad=True)

# solve the problem
solution, = cvxpylayer(A_tch, b_tch)

# compute the gradient of the sum of the solution with respect to A, b
solution.sum().backward()

Output

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-6-f0de266db3ac> in <module>
     16 
     17 # compute the gradient of the sum of the solution with respect to A, b
---> 18 solution.sum().backward()

~/anaconda3/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    118                 products. Defaults to ``False``.
    119         """
--> 120         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    121 
    122     def register_hook(self, hook):

~/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     97     Variable._execution_engine.run_backward(
     98         tensors, grad_tensors, retain_graph, create_graph,
---> 99         allow_unreachable=True)  # allow_unreachable flag
    100 
    101 

RuntimeError: Function _CvxpyLayerFnFnBackward returned an invalid gradient at index 1 - expected type torch.FloatTensor but got torch.cuda.FloatTensor

the compatibility issue with model.fit, rather than the for loop way to train the model

When I try to modify the example of TensorFlow ReLulayer with the model.fit way (the test code is given below) instead of the for loop way to train the model, I get the following error as below.

The test code:

import cvxpy as cp
import tensorflow as tf
from tensorflow.keras import layers
from cvxpylayers.tensorflow import CvxpyLayer

class ReluLayer(layers.Layer):
    def __init__(self, input_dim, output_dim):
        super(ReluLayer, self).__init__()
        self.W = tf.Variable(1e-3 * tf.random.normal((output_dim, input_dim), dtype=tf.float64))
        self.b = tf.Variable(1e-3 * tf.random.normal((output_dim,),  dtype=tf.float64))
        z = cp.Variable(output_dim)
        Wtilde = cp.Variable((output_dim, input_dim))
        W = cp.Parameter((output_dim, input_dim))
        b = cp.Parameter(output_dim)
        x = cp.Parameter(input_dim)
        problem = cp.Problem(cp.Minimize(
            cp.sum_squares(z - Wtilde @ x - b)), [z >= 0, Wtilde == W])
        self.cvxpy_layer = CvxpyLayer(problem, [W, b, x], [z])

    def call(self, x):
        if tf.rank(x) == 2:
            # when x is batched, repeat W and b 
            batch_size = x.shape[0]
            return self.cvxpy_layer(
                tf.stack([self.W for _ in tf.range(batch_size)]),
                tf.stack([self.b for _ in tf.range(batch_size)]), x)[0]
        else:
            return self.layer(self.W, self.b, x)[0]

tf.random.set_seed(0)
model = tf.keras.Sequential([
    ReluLayer(20, 20),
    ReluLayer(20, 20),
    tf.keras.layers.Dense(1, input_shape=(20,), dtype=tf.float64)
])
X = tf.random.normal((300, 20), dtype=tf.float64)
Y = tf.random.normal((300, 1), dtype=tf.float64)

model.compile(loss='mse',
              optimizer='adam')

model.fit(X, Y, batch_size=8, epochs=1)

The output error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-11-a36e0b848ff0> in <module>
      2               optimizer='adam')
      3 
----> 4 model.fit(X, Y, batch_size=1, epochs=1)

C:\WWW\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    817         max_queue_size=max_queue_size,
    818         workers=workers,
--> 819         use_multiprocessing=use_multiprocessing)
    820 
    821   def evaluate(self,

C:\WWW\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    233           max_queue_size=max_queue_size,
    234           workers=workers,
--> 235           use_multiprocessing=use_multiprocessing)
    236 
    237       total_samples = _get_total_number_of_samples(training_data_adapter)

C:\WWW\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in _process_training_inputs(model, x, y, batch_size, epochs, sample_weights, class_weights, steps_per_epoch, validation_split, validation_data, validation_steps, shuffle, distribution_strategy, max_queue_size, workers, use_multiprocessing)
    591         max_queue_size=max_queue_size,
    592         workers=workers,
--> 593         use_multiprocessing=use_multiprocessing)
    594     val_adapter = None
    595     if validation_data:

C:\WWW\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py in _process_inputs(model, mode, x, y, batch_size, epochs, sample_weights, class_weights, shuffle, steps, distribution_strategy, max_queue_size, workers, use_multiprocessing)
    644     standardize_function = None
    645     x, y, sample_weights = standardize(
--> 646         x, y, sample_weight=sample_weights)
    647   elif adapter_cls is data_adapter.ListsOfScalarsDataAdapter:
    648     standardize_function = standardize

C:\WWW\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, batch_size, check_steps, steps_name, steps, validation_split, shuffle, extract_tensors_from_dataset)
   2358     is_compile_called = False
   2359     if not self._is_compiled and self.optimizer:
-> 2360       self._compile_from_inputs(all_inputs, y_input, x, y)
   2361       is_compile_called = True
   2362 

C:\WWW\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\training.py in _compile_from_inputs(self, all_inputs, target, orig_inputs, orig_target)
   2578       if training_utils.has_tensors(target):
   2579         target = training_utils.cast_if_floating_dtype_and_mismatch(
-> 2580             target, self.outputs)
   2581       training_utils.validate_input_types(target, orig_target,
   2582                                           allow_dict=False, field_name='target')

C:\WWW\anaconda3\envs\tf2\lib\site-packages\tensorflow_core\python\keras\engine\training_utils.py in cast_if_floating_dtype_and_mismatch(targets, outputs)
   1334   if tensor_util.is_tensor(targets):
   1335     # There is one target, so output[0] should be the only output.
-> 1336     return cast_single_tensor(targets, dtype=outputs[0].dtype)
   1337   new_targets = []
   1338   for target, out in zip(targets, outputs):

IndexError: list index out of range

Dynamic size

I am wondering whether dynamical sized problem definition can be made possible? In natural language processing and other applications, many times we have input of the same structure, but different size. Right now, we need to create a problem on the fly. Maybe, it is possible to accelerate the computing in this regard. Do you think this can be a future feature, or this is infeasible?

Chunchuan

"_compute_once() got an unexpected keyword argument 'dpp'".

When I run the tutorial example given for pytorch, I get a value erro: got an unexpected keyword argument dpp. My cvxpy and cvxpylayers versions are '1.0.25' and '0.1.4'.

Code link to the tutorial : https://github.com/cvxgrp/cvxpylayers/blob/master/examples/torch/monotonic_output_regression.ipynb

Incorrect gradients for Sparse QP?

Hi all, thanks for making this nice package available!

I was testing the PyTorch sparse QP in examples/prof.py and wanted to check that the gradients from qpth are the same as cvxpylayers on the sparse QP case.
I added print(b_tch.grad.data.numpy()) after the backward pass for both methods, and found the gradient to be wildly different. Note that both solvers return almost the same optimal solution, and this is not an issue for the dense QP, which seems to point to a problem with the backward pass specifically for sparse inputs in cvxpylayers.

Any idea as to what's going on here?

Thanks!

build batchnormalization and nonnegative constraint layer using Cvxpylayer

There is no general way to applied constraints for the output of the neural network layer. I have an idea to construct complex constraints layers using Cvxpylayer. To build a neural network layer to make the output satisfy both the batchnormalization and non-negative constraints. The example code is given below(still with error):
The self-defined layer is:

import tensorflow as tf
from cvxpylayers.tensorflow import CvxpyLayer
import cvxpy as cp

def mean(x):
    return cp.sum(x) / x.size

def variance(x, mode='unbiased'):
    if mode == 'unbiased':
        scale = x.size - 1
    elif mode == 'mle':
        scale = x.size
    else:
        raise ValueError('unknown mode: ' + str(mode))
    return cp.sum_squares(x - mean(x)) / scale

class nonnegA_aveP(keras.layers.Layer):
    def __init__(self, input_dim, output_dim, ave_power=1.):       
        super(nonnegA_aveP, self).__init__()
        self.ave_power = ave_power
        self.W = tf.Variable(1e-3 * tf.random.normal((output_dim, input_dim), 
                                                     dtype=tf.float64))
        self.b = tf.Variable(1e-3 * tf.random.normal((output_dim,),  
                                                     dtype=tf.float64))
        y = cp.Variable(output_dim)
        Wtilde = cp.Variable((output_dim, input_dim))
        W = cp.Parameter((output_dim, input_dim))
        b = cp.Parameter(output_dim)
        x = cp.Parameter(input_dim)
        objective = cp.Minimize(cp.sum_squares(y - Wtilde @ x - b))
        constraints = [y >= 0, variance(x) == self.ave_power, Wtilde == W]
        problem = cp.Problem(objective, constraints)
        self.cvxpy_layer = CvxpyLayer(problem, [W, b, x], [y])   
    
    def call(self, x):
        x = tf.reshape(x, [-1, x.shape[0]])
        if tf.rank(x) == 2:
            # when x is batched, repeat W and b 
            batch_size = x.shape[0]
            output = self.cvxpy_layer(
                tf.stack([self.W for _ in tf.range(batch_size)]),
                tf.stack([self.b for _ in tf.range(batch_size)]), x)[0]
            return tf.reshape(output, [-1, x.shape[0]]) 
        else:   
            return self.layer(self.W, self.b, x)[0]

The test code is:

x = tf.random.normal([10, 4]) # The first axis is about batchsize which should be normalized
y = nonnegA_aveP(input_dim=x.shape[0], output_dim=x.shape[0], ave_power=1.)(x)
print(x, '')
print(y)

The output error is:

---------------------------------------------------------------------------
SolverError                               Traceback (most recent call last)
<ipython-input-40-e20563b1ffa6> in <module>
      1 x = tf.random.normal([10, 4])
----> 2 y = nonnegA_aveP(input_dim=x.shape[0], output_dim=x.shape[0], ave_power=1.)(x)
      3 print(x, '')
      4 print(y)

~/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py in __call__(self, inputs, *args, **kwargs)
    820           with base_layer_utils.autocast_context_manager(
    821               self._compute_dtype):
--> 822             outputs = self.call(cast_inputs, *args, **kwargs)
    823           self._handle_activity_regularization(inputs, outputs)
    824           self._set_mask_metadata(inputs, outputs, input_masks)

<ipython-input-39-c28f685dc060> in call(self, x)
     49             output = self.cvxpy_layer(
     50                 tf.stack([self.W for _ in tf.range(batch_size)]),
---> 51                 tf.stack([self.b for _ in tf.range(batch_size)]), x)[0]
     52             return tf.reshape(output, [-1, x.shape[0]])
     53 

~/anaconda3/envs/tf2/lib/python3.7/site-packages/cvxpylayers/tensorflow/cvxpylayer.py in __call__(self, solver_args, *parameters)
    107         compute = tf.custom_gradient(
    108             lambda *parameters: self._compute(parameters, solver_args))
--> 109         return compute(*parameters)
    110 
    111     def _dx_from_dsoln(self, dsoln):

~/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow_core/python/ops/custom_gradient.py in __call__(self, *a, **k)
    254 
    255   def __call__(self, *a, **k):
--> 256     return self._d(self._f, a, k)
    257 
    258 

~/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow_core/python/ops/custom_gradient.py in decorated(wrapped, args, kwargs)
    208     """Decorated function with custom gradient."""
    209     if context.executing_eagerly():
--> 210       return _eager_mode_decorator(wrapped, args, kwargs)
    211     else:
    212       return _graph_mode_decorator(wrapped, args, kwargs)

~/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow_core/python/ops/custom_gradient.py in _eager_mode_decorator(f, args, kwargs)
    404   """Implement custom gradient decorator for eager mode."""
    405   with backprop.GradientTape() as tape:
--> 406     result, grad_fn = f(*args, **kwargs)
    407   all_inputs = list(args) + list(kwargs.values())
    408   # The variables that grad_fn needs to return gradients for are the set of

~/anaconda3/envs/tf2/lib/python3.7/site-packages/cvxpylayers/tensorflow/cvxpylayer.py in <lambda>(*parameters)
    106                                  len(parameters), len(self.params)))
    107         compute = tf.custom_gradient(
--> 108             lambda *parameters: self._compute(parameters, solver_args))
    109         return compute(*parameters)
    110 

~/anaconda3/envs/tf2/lib/python3.7/site-packages/cvxpylayers/tensorflow/cvxpylayer.py in _compute(self, params, solver_args)
    148             xs, _, ss, _, DT = diffcp.solve_and_derivative_batch(
    149                 As=As, bs=bs, cs=cs, cone_dicts=[self.cones] * nbatch,
--> 150                 **solver_args)
    151             DT = self._restrict_DT_to_dx(DT, nbatch, ss[0].shape)
    152             solns = [self._split_solution(x) for x in xs]

~/anaconda3/envs/tf2/lib/python3.7/site-packages/diffcp/cone_program.py in solve_and_derivative_batch(As, bs, cs, cone_dicts, n_jobs_forward, n_jobs_backward, mode, warm_starts, **kwargs)
     91                     zip(As, bs, cs, cone_dicts, warm_starts)]
     92         with threadpool_limits(limits=1):
---> 93             results = pool.starmap(solve_and_derivative_wrapper, args)
     94         pool.close()
     95         xs = [r[0] for r in results]

~/anaconda3/envs/tf2/lib/python3.7/multiprocessing/pool.py in starmap(self, func, iterable, chunksize)
    274         `func` and (a, b) becomes func(a, b).
    275         '''
--> 276         return self._map_async(func, iterable, starmapstar, chunksize).get()
    277 
    278     def starmap_async(self, func, iterable, chunksize=None, callback=None,

~/anaconda3/envs/tf2/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):

~/anaconda3/envs/tf2/lib/python3.7/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
    119         job, i, func, args, kwds = task
    120         try:
--> 121             result = (True, func(*args, **kwds))
    122         except Exception as e:
    123             if wrap_exception and func is not _helper_reraises_exception:

~/anaconda3/envs/tf2/lib/python3.7/multiprocessing/pool.py in starmapstar(args)
     45 
     46 def starmapstar(args):
---> 47     return list(itertools.starmap(args[0], args[1]))
     48 
     49 #

~/anaconda3/envs/tf2/lib/python3.7/site-packages/diffcp/cone_program.py in solve_and_derivative_wrapper(A, b, c, cone_dict, warm_start, mode, kwargs)
     27     """A wrapper around solve_and_derivative for the batch function."""
     28     return solve_and_derivative(
---> 29         A, b, c, cone_dict, warm_start=warm_start, mode=mode, **kwargs)
     30 
     31 

~/anaconda3/envs/tf2/lib/python3.7/site-packages/diffcp/cone_program.py in solve_and_derivative(A, b, c, cone_dict, warm_start, mode, **kwargs)
    210     """
    211     result = solve_and_derivative_internal(
--> 212         A, b, c, cone_dict, warm_start=warm_start, mode=mode, **kwargs)
    213     x = result["x"]
    214     y = result["y"]

~/anaconda3/envs/tf2/lib/python3.7/site-packages/diffcp/cone_program.py in solve_and_derivative_internal(A, b, c, cone_dict, warm_start, mode, raise_on_error, **kwargs)
    260     elif status != "Solved":
    261         if raise_on_error:
--> 262             raise SolverError("Solver scs returned status %s" % status)
    263         else:
    264             result["D"] = None

SolverError: Solver scs returned status Infeasible

I am not sure whether such a problem can not be solved by Cvxpylayer or there are some skills to make the problem satisfy DPP rulesets.

Example not working

I ran the example on the home page for PyTorch:

import cvxpy as cp
import torch 
from cvxpylayers.torch import CvxpyLayer

n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
constraints = [x >= 0]
objective = cp.Minimize(0.5 * cp.pnorm(A @ x - b, p=1))
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()

cvxpylayer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
A_tch = torch.randn(m, n, requires_grad=True)
b_tch = torch.randn(m, requires_grad=True)

# solve the problem
solution, = cvxpylayer(A_tch, b_tch)

# compute the gradient of the sum of the solution with respect to A, b
solution.sum().backward()

I get gradient None. I am running the master branch of cvxpy and cvxpylayers. I assume this is a bug.

is_dpp() missing?

When instantiating a cvxplayer in the example notebook the call fails with the following internal error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-217a0beea246> in <module>
----> 1 layer = CvxpyLayer(prob, parameters=[_x], variables=[_y])

/anaconda3/envs/torch1.x/lib/python3.6/site-packages/cvxpylayers/torch/cvxpylayer.py in __init__(self, problem, parameters, variables)
     68                      values returned from the forward pass.
     69         """
---> 70         if not problem.is_dpp():
     71             raise ValueError('Problem must be DPP.')
     72 

AttributeError: 'Problem' object has no attribute 'is_dpp'

This happens with a fresh install of cvxpylayers via pip:

$ pip install cvxpylayers
Processing ./Library/Caches/pip/wheels/a0/41/c1/ed7e505cbe573483e953f442039065d0b9a12f4fe11c14df88/cvxpylayers-0.1.2-cp36-none-any.whl
Requirement already satisfied: scipy>=1.1.0 in /anaconda3/envs/torch1.x/lib/python3.6/site-packages (from cvxpylayers) (1.1.0)
Requirement already satisfied: diffcp>=1.0.13 in /anaconda3/envs/torch1.x/lib/python3.6/site-packages (from cvxpylayers) (1.0.13)
Requirement already satisfied: numpy>=1.15 in /anaconda3/envs/torch1.x/lib/python3.6/site-packages (from cvxpylayers) (1.15.4)
Requirement already satisfied: cvxpy>=1.1.0a1 in /anaconda3/envs/torch1.x/lib/python3.6/site-packages (from cvxpylayers) (1.1.0a2)
Requirement already satisfied: threadpoolctl>=1.1 in /anaconda3/envs/torch1.x/lib/python3.6/site-packages (from diffcp>=1.0.13->cvxpylayers) (2.0.0)
Requirement already satisfied: scs>=2.0.2 in /anaconda3/envs/torch1.x/lib/python3.6/site-packages (from diffcp>=1.0.13->cvxpylayers) (2.0.2)
Requirement already satisfied: pybind11>=2.4 in /anaconda3/envs/torch1.x/lib/python3.6/site-packages (from diffcp>=1.0.13->cvxpylayers) (2.4.3)
Requirement already satisfied: ecos>=2 in /anaconda3/envs/torch1.x/lib/python3.6/site-packages (from cvxpy>=1.1.0a1->cvxpylayers) (2.0.7.post1)
Requirement already satisfied: osqp>=0.4.1 in /anaconda3/envs/torch1.x/lib/python3.6/site-packages (from cvxpy>=1.1.0a1->cvxpylayers) (0.6.1)
Requirement already satisfied: future in /anaconda3/envs/torch1.x/lib/python3.6/site-packages (from osqp>=0.4.1->cvxpy>=1.1.0a1->cvxpylayers) (0.18.2)
Installing collected packages: cvxpylayers
Successfully installed cvxpylayers-0.1.2

This happens on the following platform:

Darwin WE35261 18.7.0 Darwin Kernel Version 18.7.0: Sun Dec  1 18:59:03 PST 2019; root:xnu-4903.278.19~1/RELEASE_X86_64 x86_64

The code (copied directly and unaltered from the supplied jupyter notebook) is:

import cvxpy as cp
from cvxpylayers.torch import CvxpyLayer

n = 201

_x = cp.Parameter(n)
_y = cp.Variable(n)
obj = cp.Minimize(cp.sum_squares(_y-_x))
cons = [_y >= 0]
prob = cp.Problem(obj, cons)

layer = CvxpyLayer(prob, parameters=[_x], variables=[_y])

I would be happy to be wrong about this problem, but I cannot see how my installation is different from other working installations locally.

Parameter affine or Parameter free

Hi,
I would like to clarify one particular thing i.e if I have an expression: cp.multiply(cp.exp(X),Y) where X is a parameter affine expression, and Y is a parameter-free expression and cvxpy as cp , is this particular problem not a DPP? Does cp.exp(X) return a parameter-free value leading to violation of the conditions?

It will be really helpful if my doubt is cleared.
Thanks!

softmax example

Dear @bamos ,

Thanks for your quick reply.

I can't re-open the previous issue, so I create a new issue.

I have read the variational softmax implementation in the blog. In the blog, the input is a 1-dim vector.

I recommend formulating this so that the inputs to the cvxpylayer are always shaped as [batch, n]

I'm sorry that I don't get this point.

My input is x.shape=[batch, class_channel, x, y, z]. If I reshape it into [batch, class_channel*x*y*z], the spatial information is broken. I still do not know how to deal with it in cvxpy.

Specifically, I do not know how should I formulate the obj and cons with the 5D (shape=[batch, class_channel, x, y, z]) input tensor.

obj = cp.Minimize(-_x.T*_y - cp.sum(cp.entr(_y)))
cons = [np.ones(d, dtype=np.float32).T*_y == 1.]

Could you give me more guidance at your convenience?

Looking forward to your reply.

Kindest regards,
Jun

Originally posted by @JunMa11 in #29 (comment)

possible to ignore Solved/Inaccurate ?

I am currently using a neural network followed by a cvxpylayer and getting:

UserWarning: Solved/Inaccurate.
warnings.warn("Solved/Inaccurate.")

followed by this message which interrupts my program:

Please consider re-formulating your problem so that it is always solvable or increasing the number of solver iterations.
Training interrupted

I assume Solved/Inaccurate is triggering the training interruption.
I was thinking of ignoring Solved/Inaccurate warning (as long it's not Infeasible), and as the neural network learns, it would provide better data to slowly make cvxpylayer solve with optimal.

Is is possible to set some where to ignore Solved/Inaccurate warning message?

Not able to solve large scale linear programming?

So I used this code to solve a simple linear programming with linear constraints.
For example, my problem has 6000 dimension variable, and roughly 300 equality and inequality constraints.
But when calling the layer, e.g.
layer = CvxpyLayer(prob, parameters=[c,A,b,G,h], variables=[x])
The problem was killed.
I was wondering how much memory does it need to solve such a LP in general.
What could be the possible solution for this problem?
Thanks.

Batched cone program canonicalization

Serializing the canonicalization takes a non-negligible amount of time for larger batches (~500ms for a batch size of 128) and is relatively easy to batch up for cone programs. I've sped this up to ~100ms by hacking in a manual batched canonicalization by pulling the sparse affine maps for the CP and applying them to the batch of parameters by replacing the call to apply_parameters here with the snippet below (which at the moment doesn't handle some edge cases as carefully as apply_parameters).

Where would the best place for code like this to live? Does it make sense add something like apply_parameters_batch to the cvxpy compiler that defaults to a serialized version but can optionally be over-ridden?

for i in range(batch_info.max_size):
    params_np_i = [
        p if sz == 0 else p[i]
        for p, sz in zip(params_np, batch_info.sizes)]
    d = dict(zip(self.param_ids, params_np_i))

    def param_value(idx):
        return d[idx]

    p = canonInterface.get_parameter_vector(
        self.compiler.total_param_size,
        self.compiler.param_id_to_col,
        self.compiler.param_id_to_size,
        param_value,
        zero_offset=False)
    params.append(p)

params = np.vstack(params).T
cs = self.compiler.c.dot(params).T
cs = cs[:,:-1]

params_sp = sp.csr_matrix(params)
Fs = self.compiler.A.dot(params_sp).T
As = -Fs[:,:-self.n]
As = [Ai.reshape(self.m, self.n).T for Ai in As]
bs = Fs[:,-self.n:].toarray()

\cc @SteveDiamond @priyald17

How long "should" a forward pass take to run?

We have a particular problem with 24 variables and 14 constraints. Running just the forward pass with a batch size of about 128 takes about 1/3 of a second -- this is on a laptop, but we see similar results on a node from our cluster.

Does this seem like a reasonable speed for running the forward pass? Any parameters we could try tweaking to speed it up?

During training, we've also run into "Unbounded/Inaccurate" errors, although these are hard to reproduce consistently. But it has us thinking that there might be something ill-conditioned about our problem, and that possibly preconditioning or some other trick could result in an equivalent problem that is easier for SCS to deal with.

Here is a gist that just defines the cvxpylayer and times 100 forward passes.

gradient check failed

I am testing the gradient of cvxpylayer using finite difference method. Below is an example script:

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.parameter import Parameter

import cvxpy as cp
from cvxpylayers.torch import CvxpyLayer

N = 100
M = 100
NORM_BOUND = 1

class Opt(nn.Module):
  def __init__(self):
    super(Opt, self).__init__()
    self.L = torch.randn(N, M)

  def forward(self, x, weight):
    xw = x.mm(weight).squeeze(-1)

    _p = cp.Parameter((N))
    _v = cp.Variable((N))
    obj = cp.Minimize(0.5*cp.sum_squares(self.L.T@_v)
                      + 0.5*cp.sum_squares(_v)
                      - _v@_p)
    cons = [_v <= 1, -_v <= 1, cp.norm(_v, 1) <= NORM_BOUND]
    prob = cp.Problem(obj, cons)
    self.cvxopt = CvxpyLayer(prob, [_p], [_v])

    v, = self.cvxopt(xw)
    return v

def main():
  opt = Opt()

  x = torch.randn(N, 2)
  weight = torch.tensor([[0.5], [0.5]], requires_grad=True)
  weight1 = torch.tensor([[0.5+1e-5], [0.5]], requires_grad=False)
  weight2 = torch.tensor([[0.5], [0.5+1e-5]], requires_grad=False)
  y = torch.randn(N)

  v = opt(x, weight)
  v1 = opt(x, weight1)
  v2 = opt(x, weight2)
  loss = torch.sum(y*v)
  loss1 = torch.sum(y*v1)
  loss2 = torch.sum(y*v2)

  loss.backward()
  print(weight.grad)
  print((loss1-loss)/1e-5)
  print((loss2-loss)/1e-5)

The gradient returned through backward is

tensor([[ 0.1767],
   [-0.1480]])

while the gradient with finite different is:

tensor(-7.9798, grad_fn=<DivBackward0>)
tensor(-5.3742, grad_fn=<DivBackward0>)

The gap is very large...

The gap is still large if I remove the norm constraint

    cons = [_v <= 0.1, -_v <= 0.1]

If all the constraints are removed

cons = []

the gap seems to be much smaller, but still large sometimes.

cvxpylayers with gpu

Hi,
I was wondering if we can use cvxpylayers with gpu to minimize convex loss function with high dimension (high number of parameters to estimate) but one layer (not deep learning problem)?
Thank you.

Best regards,
Naz

Support for Python 3.7 on mac OS

Failed to build diffcp

Hi! I've been having some trouble installing cvxpylayers using pip in Windows.

I have already done pip install --upgrade pip and it is in its latest version (20.1.1).
I also installed conda install -c conda-forge cvxpy and the error I am getting is ERROR: Failed building wheel for diffcp

Thank you very much.

Broadcasting

import cvxpy as cp
import torch 
from cvxpylayers.torch import CvxpyLayer

n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
constraints = [x >= 0]
objective = cp.Minimize(0.5 * cp.pnorm(A @ x - b, p=1))
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()

cvxpylayer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
n_batch = 128
A_tch = torch.randn(m, n, requires_grad=True)
b_tch = torch.randn(n_batch, m, requires_grad=True)

# solve the problem
solution, = cvxpylayer(A_tch, b_tch)

# compute the gradient of the sum of the solution with respect to A, b
solution.sum().backward()

Current output

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-a44521ae7231> in <module>
     14 
     15 # solve the problem
---> 16 solution, = cvxpylayer(A_tch, b_tch)
     17 
     18 # compute the gradient of the sum of the solution with respect to A, b

~/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    543             result = self._slow_forward(*input, **kwargs)
    544         else:
--> 545             result = self.forward(*input, **kwargs)
    546         for hook in self._forward_hooks.values():
    547             hook_result = hook(self, input, result)

~/anaconda3/lib/python3.7/site-packages/cvxpylayers-0.1.0-py3.7.egg/cvxpylayers/torch/cvxpylayer.py in forward(self, solver_args, *params)
    116             info=info,
    117         )
--> 118         sol = f(*params)
    119         self.info = info
    120         return sol

~/anaconda3/lib/python3.7/site-packages/cvxpylayers-0.1.0-py3.7.egg/cvxpylayers/torch/cvxpylayer.py in forward(ctx, *params)
    189                                 i, len(
    190                                     param_order[i].shape), len(
--> 191                                     p.shape)))
    192 
    193                     if not np.all(p.shape == param_order[i].shape):

RuntimeError: Inconsistent batch size passed in. Expected parameter 1 to have batch shape of 1 dims but got 2 dims.

[request] Conda package?

Hi there,
First of all thank you for the great work!
Would it be possible to put cvxpylayers (and diffcp) on conda?

AttributeError: 'ConeDims' object has no attribute 'nonpos'

Hi,
I encountered an AttributeError while testing the below sample code in tensorflow 2.2.0.

import cvxpy as cp
import tensorflow as tf
from cvxpylayers.tensorflow import CvxpyLayer

n, m = 2, 3
x = cp.Variable(n)
A = cp.Parameter((m, n))
b = cp.Parameter(m)
constraints = [x >= 0]
objective = cp.Minimize(0.5 * cp.pnorm(A @ x - b, p=1))
problem = cp.Problem(objective, constraints)
assert problem.is_dpp()

cvxpylayer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
A_tf = tf.Variable(tf.random.normal((m, n)))
b_tf = tf.Variable(tf.random.normal((m,)))

with tf.GradientTape() as tape:
  # solve the problem, setting the values of A, b to A_tf, b_tf
  solution, = cvxpylayer(A_tf, b_tf)
  summed_solution = tf.math.reduce_sum(solution)
# compute the gradient of the summed solution with respect to A, b
gradA, gradb = tape.gradient(summed_solution, [A_tf, b_tf])

More detailed for the AttributeError：

Traceback (most recent call last):
  File ".\model.py", line 15, in <module>
    cvxpylayer = CvxpyLayer(problem, parameters=[A, b], variables=[x])
  File "C:\Users\cryu854\Anaconda3\lib\site-packages\cvxpylayers\tensorflow\cvxpylayer.py", line 102, in __init__
    self.cones = utils.dims_to_solver_dict(data['dims'])
  File "C:\Users\cryu854\Anaconda3\lib\site-packages\cvxpylayers\utils.py", line 4, in dims_to_solver_dict
    "l": int(cone_dims.nonpos),
AttributeError: 'ConeDims' object has no attribute 'nonpos'

So I checked the version of my packages, I think the versions meet the requirements.

python = 3.7.4
tensorflow = 2.2.0
cvxpy = 1.1.0
diffcp = 1.0.13
Numpy = 1.15

After that, I traced the error and temporarily changed nonpos to nonneg , then the error disappeared and the result seems to work fine for now, but I know this would be failed somewhere.

def dims_to_solver_dict(cone_dims):
      cones = {
          "f": int(cone_dims.zero),
          "l": int(cone_dims.nonneg),
          "q": [int(v) for v in cone_dims.soc],
          "ep": int(cone_dims.exp),
          "s": [int(v) for v in cone_dims.psd]
      }
      return cones

Hope someone has a solution to this.

Simpler examples

The promise of mixing convex optimization and Pytorch/Tensorflow seems great, but most of the current examples in the repo are quite specific and elaborate.

I'd like to see more complete simple examples that highlight the simplest things you can do with mixing convex optimization and backprop.

The best example in the tutorial seems to be the LML layer, but there's no complete example of it in use. The polytope/ellipsoid examples aren't in complete practical examples.

I'd love to know how I can use these layers in real practical applications!

Infeasible soft constrained MPC problem

Hi,

I'm trying use cvxpylayers to train a model predictive control(MPC) layer. I use the slack variables so that it becomes a soft constrained MPC problem and is always feasible.
But by cvxpylayers sometimes it is infeasible.
Here is the code:

import torch
import cvxpy as cp
from cvxpylayers.torch import CvxpyLayer

def soft_MPC_layer(nu, nx, N, nineq_u, nineq_x):
    """Builds the MPC layer with soft constraints.

    The optimization problem is of the form
        \hat u,\hat x,\hat e  
            = argmin sum(i:0->N-1){ u(i)^T*R*u(i) +  x(i)^T*Q*x(i)
            + e(i)^T*E*e(i) } + x(N)^T*P*x(N) + e(N)^T*E*e(N)
                subject to x(i+1) = A*x(i) + B*u(i), i=0,1,...,N-1
                           Hu*u(i) <= ku
                           Hx*x(i) <= kx+e(i)
                           e(i) >= 0
                           x(0) = x0

    where N is the number of MPC steps,
        R in S^{nu,nu},
        S^{,} is the set of all positive semi-definite matrices,
        Q, P in S^{nx, nx}
        E in S^{ne, ne}, where ne = nineq_x
        A in R^{nx, nx}
        B in R^{nx, nu}
        Hu in R^{nineq_u, nu}
        ku in R^{nineq_u}
        Hx in R^{nineq_x, nx}
        ku in R^{nineq_x}

    Take the matrix square-root of R, Q, E：mentioned in paper P19
    (Differentiable Convex Optimization Layers).
    """
    R_sqrt = cp.Parameter((nu, nu))
    Q_sqrt = cp.Parameter((nx, nx))
    P_sqrt = cp.Parameter((nx, nx))
    E_sqrt = cp.Parameter((nineq_x, nineq_x))
    A = cp.Parameter((nx, nx))
    B = cp.Parameter((nx, nu))
    Hu = cp.Parameter((nineq_u, nu))
    ku = cp.Parameter(nineq_u)
    Hx = cp.Parameter((nineq_x, nx))
    kx= cp.Parameter(nineq_x)
    x0 = cp.Parameter(nx)

    u = cp.Variable((nu, N))
    x = cp.Variable((nx, N+1))
    e = cp.Variable((nineq_x, N+1))


    obj = 0
    cons = []
    cons += [ x[:,0] == x0 ]
    for k in range(N):
        obj += cp.sum_squares(R_sqrt*u[:,k]) \
                + cp.sum_squares(Q_sqrt*x[:,k]) \
                + cp.sum_squares(E_sqrt*e[:,k])
        cons += [ x[:,k+1] == A@x[:,k] + B@u[:,k],
            Hu@u[:,k] <= ku,Hx@x[:,k] <= kx+e[:,k], e[:,k] >= 0 ]
    obj += cp.sum_squares(P_sqrt*x[:,N]) \
            + cp.sum_squares(E_sqrt*e[:,N])
    cons += [Hx@x[:,N] <= kx+e[:,N], e[:,N] >= 0 ]
    objs = cp.Minimize(obj)

    prob = cp. Problem(objs, cons)
    assert prob.is_dpp()

    layer = CvxpyLayer (prob, 
                        parameters =[R_sqrt, Q_sqrt, P_sqrt, E_sqrt,
                                     A, B, Hu, ku, Hx, kx, x0], 
                        variables =[  u, x, e ])
    return layer

# Here is an infeasible example
R_sqrt = torch.tensor([[0.1]])
Q_sqrt = torch.tensor([[1., 0.],[0., 1.]])
P_sqrt = torch.tensor([[24.9936,  1.8688],[ 1.8688,  1.4416]])
E_sqrt = torch.tensor([[1., 0.],[0., 1.]])
A = torch.tensor([[1.0000, 0.0500],[0.7500, 1.0000]])
B = torch.tensor([[0.0000],[0.1500]])
Hu = torch.tensor([[ 1.],[-1.]]) 
ku = torch.tensor([2., 2.])
Hx = torch.tensor([[ 0.,  1.],[ 0., -1.]]) 
kx = torch.tensor([8., 8.])
x0 = torch.tensor([-8.7417, -8.0000])

layer = soft_MPC_layer(1 , 2, 5, 2, 2)
u_opt, x_opt, e_opt, = layer(R_sqrt, Q_sqrt,
                          P_sqrt, E_sqrt,A, B,
                          Hu, ku, Hx, kx, x0) 

# Change the x0 a little
# It's solved but constraints are violated
x0 = torch.tensor([-8.74, -8.0000])
u_opt, x_opt, e_opt, = layer(R_sqrt, Q_sqrt,
                          P_sqrt, E_sqrt,A, B,
                          Hu, ku, Hx, kx, x0) 
print(e_opt>=0)
print(Hx*x_opt[:,5] <= kx+e_opt[:,5])

Many thanks,
Jicheng Shi

signal denoising example - new data

Hi @akshayka @sbarratt @SteveDiamond !!

Thank you for the amazing library. i am trying the signal denoising example with new data but i am getting an error

import math

import cvxpy as cp
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
from cvxpylayers.torch import CvxpyLayer

import latexify
latexify.latexify()

torch.set_default_tensor_type(torch.DoubleTensor)
%matplotlib inline


N_train=1000




test=yf.download("SPY", start="2012-01-01", end="2017-04-30")['Close'].to_numpy()
outputs=torch.from_numpy(test)
inputs=np.linspace(1,100,len(outputs))
inputs=torch.from_numpy(inputs)

X_train = inputs[:N_train]
Y_train = outputs[:N_train]

X_val = inputs[N_train:]
Y_val = outputs[N_train:]


len(X_val)
len(Y_val)

def create_layer():
   y_cp = cp.Variable(n)
   x_minus_y = cp.Variable(n)
   
   x_param = cp.Parameter(n)
   theta_param = cp.Parameter((n, n))
   lambda_param = cp.Parameter(pos=True)
   objective = (
       cp.sum_squares(theta_param @ x_minus_y) +
       lambda_param*cp.sum_squares(cp.diff(y_cp))
   )
   constraints = [
       x_minus_y == x_param - y_cp
   ]
   problem = cp.Problem(cp.Minimize(objective), constraints)
   layer = CvxpyLayer(
       problem,
       parameters=[x_param, theta_param, lambda_param],
       variables=[y_cp])
   return layer
   

layer = create_layer()

import torch
from torch.utils.data import TensorDataset, DataLoader
import numpy as np
from cvxpylayers.torch import CvxpyLayer

torch.set_default_dtype(torch.double)

from tqdm.notebook import tqdm


def fit(loss, params, X, Y, Xval, Yval, batch_size=128, lr=1e-3, epochs=100, verbose=False, print_every=1, callback=None):
   """

   Arguments:
       loss: given x and y in batched form, evaluates loss.
       params: list of parameters to optimize.
       X: input data, torch tensor.
       Y: output data, torch tensor.
       Xval: input validation data, torch tensor.
       Yval: output validation data, torch tensor.
   """

   train_dset = TensorDataset(X, Y)
   train_loader = DataLoader(train_dset, batch_size=batch_size, shuffle=True)
   opt = torch.optim.Adam(params, lr=lr)

   train_losses = []
   val_losses = []
   for epoch in tqdm(range(epochs)):
       if callback is not None:
           callback()
           
       with torch.no_grad():
           val_losses.append(loss(Xval, Yval).item())
       if verbose and epoch % print_every == 0:
           print("val loss %03d | %3.5f" % (epoch + 1, val_losses[-1]))

       batch = 1
       train_losses.append([])
       for Xbatch, Ybatch in train_loader:
           opt.zero_grad()
           l = loss(Xbatch, Ybatch)
           l.backward()
           opt.step()
           train_losses[-1].append(l.item())
           if verbose and epoch % print_every == 0:
               print("batch %03d / %03d | %3.5f" %
                     (batch, len(train_loader), np.mean(train_losses[-1])))
           batch += 1
   return val_losses, train_losses

theta_tch = torch.eye(n, requires_grad=True)
lambda_tch = torch.tensor(0.5, requires_grad=True)
params = [theta_tch, lambda_tch]

def loss_fn(X, actual):
   preds = layer(X, theta_tch, lambda_tch)[0]
   mse_per_example = (preds - actual).pow(2).mean(axis=1)
   return mse_per_example.mean()


val_losses, train_losses =  fit(
   loss_fn, params, X_train, Y_train, X_val, Y_val, lr=1e-2, batch_size=8,
   epochs=15, verbose=True, print_every=1)

with the above i am getting -

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-58-0b0fb50d4406> in <module>
----> 1 val_losses, train_losses =  fit(
     2     loss_fn, params, X_train, Y_train, X_val, Y_val, lr=1e-2, batch_size=8,
     3     epochs=15, verbose=True, print_every=1)

<ipython-input-56-f19c59cb9b44> in fit(loss, params, X, Y, Xval, Yval, batch_size, lr, epochs, verbose, print_every, callback)
    32 
    33         with torch.no_grad():
---> 34             val_losses.append(loss(Xval, Yval).item())
    35         if verbose and epoch % print_every == 0:
    36             print("val loss %03d | %3.5f" % (epoch + 1, val_losses[-1]))

<ipython-input-57-0aead751c22d> in loss_fn(X, actual)
     4 
     5 def loss_fn(X, actual):
----> 6     preds = layer(X, theta_tch, lambda_tch)[0]
     7     mse_per_example = (preds - actual).pow(2).mean(axis=1)
     8     return mse_per_example.mean()

~/miniconda3/envs/myenv1/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   720             result = self._slow_forward(*input, **kwargs)
   721         else:
--> 722             result = self.forward(*input, **kwargs)
   723         for hook in itertools.chain(
   724                 _global_forward_hooks.values(),

~/cvxpylayers/cvxpylayers/torch/cvxpylayer.py in forward(self, solver_args, *params)
   150             info=info,
   151         )
--> 152         sol = f(*params)
   153         self.info = info
   154         return sol

~/cvxpylayers/cvxpylayers/torch/cvxpylayer.py in forward(ctx, *params)
   224                 p_shape = p.shape if batch_size == 0 else p.shape[1:]
   225                 if not np.all(p_shape == param_order[i].shape):
--> 226                     raise ValueError(
   227                         "Inconsistent parameter shapes passed in. "
   228                         "Expected parameter {} to have non-batched shape of "

ValueError: Inconsistent parameter shapes passed in. Expected parameter 0 to have non-batched shape of (100,) but got torch.Size([339]).

How to choose solver for CvxpyLayer

Is it possible to choose a solver for CvxpyLayer? I suspect solver_args has it, but it is hard to find all the valid keys for it.

I tried cvxpylayer(..., solver_args={'solver': 'SCS'}) and returns:

'solver' is an invalid keyword argument for this function

Quadratic form

Hi,

in the examples provided in the blog,

the objective for a quadratic form is defined as

obj = cp.Minimize(0.5*cp.sum_squares(Q_sqrt*x) + q.T @ x)

for example in the The OptNet QP section, and also the Ellipsoid Projections section uses the cp.sum_squares function.

Can someone explain why not quad_form is used ? For example:

obj = cp.Minimize(0.5*cp.quad_form(Q,x) + q.T @ x)

To my understanding, using sum_squares would be correct if the Q matrix has all entries set to zero, except for the diagonal entries. Only in this case, I see that cp.sum_squares(Q_sqrt*x) = cp.quad_form(Q,x)

Is this implicitly assumed in the examples ?
Otherwise, how do you get a value like q_{1,2} x_{1}x_{2}, where x_{i} denotes the i-th component of x, and q_{1,2} is the entry of Q at position (1,2).

Incompatible with tf.function (Tensor object has no attribute 'numpy')

Hi,

When I try to build a sequential model I get a compilation error. While I understand why it does not work I do not know how to fix it as it involves TF/keras. Is there a way to modify CvxpyLayer to make it work?

Reproducible example in tf 2.2:

from cvxpylayers.tensorflow import CvxpyLayer
import cvxpy as cp
import tensorflow as tf
from tensorflow.keras import layers

tf.keras.backend.set_floatx('float64')

class CustomLayer(layers.Layer):
   def __init__(self, input_dim, output_dim):
       super(CustomLayer, self).__init__()
       z = cp.Variable(output_dim)
       x = cp.Parameter(input_dim)
       problem = cp.Problem(cp.Minimize(
           cp.sum_squares(z -   x  )), [z >= 0 ])
       self.cvxpy_layer = CvxpyLayer(problem, [  x], [z])
   @tf.function
   def call(self, x):
      return self.cvxpy_layer( [x])[0]
 
tf.random.set_seed(0)
model = tf.keras.Sequential([
   CustomLayer(20, 20)])
model.build((300, 20))

Error raised:

AttributeError: in user code:

    <ipython-input-13-1b01fd49531e>:18 call  *
        return self.cvxpy_layer( [x])[0]
    /usr/local/lib/python3.6/dist-packages/cvxpylayers/tensorflow/cvxpylayer.py:124 __call__  *
        compute = tf.custom_gradient(
    /usr/local/lib/python3.6/dist-packages/cvxpylayers/tensorflow/cvxpylayer.py:154 _compute  *
        params = [p.numpy() for p in params]

    AttributeError: 'Tensor' object has no attribute 'numpy'

training diverges/loss increasing

Hi,

I was testing a simple problem and find the training diverges and loss is increasing.

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.parameter import Parameter

import cvxpy as cp
from cvxpylayers.torch import CvxpyLayer

N = 200
M = 100
BOUND = 0.1
NORM_BOUND = 10
NUM_STEPS = 3

class Opt(nn.Module):
  def __init__(self, weight=5.0):
    super(Opt, self).__init__()
    self.weight = Parameter(
      torch.tensor([[weight]], requires_grad=True))

    self.L = torch.randn(N, M)
    L_diag = torch.rand(N)
    L_diag[:N//2] *= 0.1
    L_diag[N//2:] *= 10
    self.L_diag = torch.diag(L_diag)

    _p = cp.Parameter((N))
    _prev = cp.Parameter((N))
    _v = cp.Variable((N))
    obj = cp.Minimize(0.5*cp.sum_squares(self.L.T@_v)
                      + 0.5*cp.sum_squares(self.L_diag@_v)
                      + 50*cp.sum_squares(_v - _prev)
                      - _v@_p)
    cons = [_v <= BOUND,
            -_v <= BOUND,
            cp.norm(_v, 1) <= NORM_BOUND
    ]
    prob = cp.Problem(obj, cons)
    self.cvxopt = CvxpyLayer(prob, [_p, _prev], [_v])

  def forward(self, x):
    num_steps = x.shape[0]
    xw = torch.matmul(x, self.weight).squeeze()
    init_v = torch.zeros(x.shape[1])
    v_list = []
    for i in range(num_steps):
      v, = self.cvxopt(xw[i], init_v, solver_args={"eps": 1e-7, "max_iters": 200000})
      init_v = v
      v_list.append(v)
    v = torch.stack(v_list)

    return v

def main():
  torch.manual_seed(0)
  opt = Opt(weight=5.0)

  x = torch.randn(N, 1)
  x = x.unsqueeze(0).repeat(NUM_STEPS, 1, 1)
  y = 0.1*x[-1] + torch.randn(N, 1)

  optimizer = optim.Adam(opt.parameters(), 1e-1)
  for i in range(200000):
    v = opt(x)
    loss = -torch.sum(y*v[-1])
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    l = loss.detach().numpy().tolist()
    w = opt.weight.detach().numpy()
    g = opt.weight.grad.detach().numpy()
    print("loss {:.3f} weight {:.3f} "
          "grad {:.3f} ".format(
            l, w[0][0], g[0][0],
          ))


if __name__ == "__main__":
  main()

The training log looks like this:

loss -8.689 weight 4.900 grad 0.564
loss -8.668 weight 4.800 grad 0.508
loss -8.638 weight 4.701 grad 0.475
loss -8.603 weight 4.603 grad 0.405
loss -8.563 weight 4.506 grad 0.398
loss -8.520 weight 4.411 grad 0.292
loss -8.470 weight 4.321 grad 0.196
loss -8.415 weight 4.232 grad 0.283
loss -8.344 weight 4.163 grad -0.213
loss -8.281 weight 4.104 grad -0.079
loss -8.233 weight 4.056 grad -0.078
loss -8.192 weight 4.017 grad -0.120
loss -8.162 weight 3.985 grad -0.081
loss -8.137 weight 3.959 grad -0.056
...

Also used the finite difference method to test the gradient at 5.0 following #67
Got following output (by setting solver_args={"eps": 1e-8, "max_iters": 200000}:

grad with backward:

[[0.5678954]]

grad with finite difference:

tensor(-0.0734, grad_fn=<DivBackward0>)

, which has different signs.

Also tried to set everything as torch.double and set eps=1e-9, neither solves the problem.

Large errors on hypergradient of logistic regression

we want to use cvxpylayers to compute the gradient of a test loss w.r.t. the regularization parameter used on the train data for sparse models (Lasso and L1 regularized logistic regression). It's also called hypergradients (gradient w.r.t. hyperparameters).

Basically we aim to do with cvxpylayers what we did in our recent ICML paper: http://proceedings.mlr.press/v119/bertrand20a.html

While it runs fine for both models we get surprisingly high errors for the logistic model
using scipy.optimize.check_grad. We get:

4.49142934577651e-06 for Lasso
1.0356275707793758 for Logistic Regression

It can be even worse on other data.

Could it be a bug? Or just numerical issues due to log and exp?

cc @QB3 @Klopfe @Vaiter @josephsalmon @mblondel

See code below.

import numpy as np
from scipy.optimize import check_grad
import cvxpy as cp
import torch
from cvxpylayers.torch import CvxpyLayer

torch.set_default_dtype(torch.double)


def lasso_cvxpy(X, y, lam, idx_train, idx_val):
    Xtrain, Xtest, ytrain, ytest = map(
        torch.from_numpy,
        [X[idx_train, :], X[idx_val], y[idx_train], y[idx_val]],
    )

    n_samples_train, n_features = Xtrain.shape

    # set up variables and parameters
    beta_cp = cp.Variable(n_features)
    lambda_cp = cp.Parameter(nonneg=True)

    # set up objective
    loss = (1 / (2 * n_samples_train)) * cp.sum(
        cp.square(Xtrain @ beta_cp - ytrain)
    )
    reg = lambda_cp * cp.norm1(beta_cp)
    objective = loss + reg

    # define problem
    problem = cp.Problem(cp.Minimize(objective))
    assert problem.is_dpp()

    # solve problem
    layer = CvxpyLayer(problem, [lambda_cp], [beta_cp])
    lambda_th = torch.tensor(np.squeeze(lam), requires_grad=True)
    (beta_,) = layer(lambda_th)

    # get test loss and it's gradient
    test_loss = (Xtest @ beta_ - ytest).pow(2).mean()
    test_loss.backward()

    val = test_loss.detach().numpy()
    grad = np.array(lambda_th.grad)
    return val, grad


def logreg_cvxpy(X, y, lam, idx_train, idx_val):
    assert np.all(np.unique(y) == np.array([-1, 1]))
    Xtrain, Xtest, ytrain, ytest = map(
        torch.from_numpy,
        [X[idx_train, :], X[idx_val], y[idx_train], y[idx_val]],
    )

    n_samples_train, n_features = Xtrain.shape

    # set up variables and parameters
    beta_cp = cp.Variable(n_features)
    lambda_cp = cp.Parameter(nonneg=True)

    # set up objective
    loss = (
        cp.sum(cp.logistic(cp.multiply(-ytrain, Xtrain @ beta_cp)))
        / n_samples_train
    )
    reg = lambda_cp * cp.norm(beta_cp, 1)
    objective = loss + reg

    # define problem
    problem = cp.Problem(cp.Minimize(objective))
    assert problem.is_dpp()

    # solve problem
    layer = CvxpyLayer(problem, parameters=[lambda_cp], variables=[beta_cp])
    alpha_th = torch.tensor(np.squeeze(lam), requires_grad=True)
    (beta_,) = layer(alpha_th)

    # get test loss and it's gradient
    # test_loss = torch.mean(torch.nn.LogSigmoid()(ytest * (Xtest @ beta_)))
    test_loss = torch.mean(torch.log(1 + torch.exp(-ytest * (Xtest @ beta_))))
    test_loss.backward()

    val = test_loss.detach().numpy()
    grad = np.array(alpha_th.grad)
    return val, grad


n, m = 3, 20
rng = np.random.RandomState(42)
coef = rng.randn(n)
X = rng.randn(m, n)
y = X @ coef + 0.1 * rng.randn(m)
y_sign = np.sign(y)
rng.shuffle(y_sign)  # add some label noise

idx_train = np.arange(0, m // 2)
idx_val = np.arange(m // 2, m)

lam = 0.1

val, grad = lasso_cvxpy(X, y, lam, idx_train, idx_val)

dict_cvxpy_func = {
    "lasso": lasso_cvxpy,
    "logreg": logreg_cvxpy,
}


def test_hypergradient(model_name):
    cvxpy_func = dict_cvxpy_func[model_name]
    this_y = y
    if model_name == "logreg":
        this_y = y_sign

    def get_val(lam):
        val_cvxpy, _ = cvxpy_func(X, this_y, lam, idx_train, idx_val)
        return val_cvxpy

    def get_grad(lam):
        _, grad_cvxpy = cvxpy_func(X, this_y, lam, idx_train, idx_val)
        return grad_cvxpy

    grad_error = check_grad(get_val, get_grad, lam)
    print(grad_error)
    assert grad_error < 1


for model_name in dict_cvxpy_func.keys():
    test_hypergradient(model_name)

parallelize operation

Hi, first of all thanks for the amazing library!

I have a question about whether the operation is parallelized when I feed batch data to the layer.

For example, I have a QP problem in the form min ||Ax-b||^2, with A size (n, m), b size (m), and I create a cvxlayer L that will take A, b as input.

If generate A, b of size (num_batch, n, m), (num_batch, m), L(A, b) will give me an output of size (num_batch, n), however, the execution time increase linearly with num_batch. I wonder if this is the expected behavior? If so, is there anyway to make the operation parallelizable?

Add support for other solvers and OSQP

There are a few technical details to be discussed/worked out here and we should decide on the best interface for this, but just quickly filing this for now to get things started and to have something to point to. It would be really powerful to provide an option to add in a custom solver for the forward pass that we then map to a cone program and differentiate through the conic form for the backward pass so this doesn't need to be explicitly provided. One extremely compelling use case of this is that it would also enable us to easily add in an OSQP solver for the QP forward pass.

I can post some proposals for an interface here later, but for now the one main technical issue that I'm not immediately sure how to work out is how to map from the optimal primal solution of the original problem form to the optimal primal/dual solutions of the conic form for the backward pass

\cc @bstellato

Required CXVPY version

Hello,
thank you for providing this very useful library. I have some problems using it in my codebase, as it depends on an alpha version of CVXPY (1.1.a1).

Do you know if the functionalities introduced in CVXPY 1.1 are required for cxvpylayers? If not, is it possible to down the required version to the last stable one? Thank you!

Minor bug in examples/torch/convex_approximate_dynamic_programming.ipynb

I had to change the 3rd line in the "train" function from
q = torch.zeros((m, 1), requires_grad=True)
to
q = torch.zeros(m, 1).double().requires_grad_(True)
to avoid the error
"RuntimeError: Two or more parameters have different dtypes. Expected parameter 3 to have dtype torch.float64 but got dtype torch.float32."