linjianma / autohoot Goto Github PK

Automatic High-Order Optimization for Tensors

License: Apache License 2.0

Python 100.00%

autohoot's Introduction

AutoHOOT: Automatic High-Order Optimization for Tensors

AutoHOOT is a Python-based automatic differentiation framework targeting at high-order optimization for large scale tensor computations.

AutoHOOT contains a new explicit Jacobian / Hessian expression generation kernel whose outputs keep the input tensors’ granularity and are easy to optimize. It also contains a new computational graph optimizer that combines both the traditional optimization techniques for compilers and techniques based on specific tensor algebra. The optimization module generates expressions as good as manually written codes in other frameworks for the numerical algorithms of tensor computations.

The library is compatible with other AD libraries, including TensorFlow and Jax, and numerical libraries, including NumPy and Cyclops Tensor Framework.

The example usage of the libaray is shown in the examples and tests folders.

Installation

Consider install in editable mode as this package is in developement:

pip install -e path/to/the/project/directory

Tests cases

Run all tests with

# sudo pip install pytest
python -m pytest tests/*.py
# You can specify the python file as well.

# Run with specific backends
pytest tests/autodiff_test.py --backendopt numpy jax

Notations

This repo involves a lot of tensor contraction operations. We use the following notations in our documentation:

For the matrix multiplication C = A @ B, the following expressions are equal:

C = einsum("ij,jk->ik", A, B)

C["ik"] = A["ij"] * B["jk"]

C[0,2] = A[0,1] * B[1,2]

Overview of Module API and Data Structures

Suppose our expression is y = x1 @ x2 + x1, we first define our variables x1 and x2 symbolically,

import autodiff as ad

x1 = ad.Variable(name = "x1", shape=[3, 3])
x2 = ad.Variable(name = "x2", shape=[3, 3])

Then, you can define the symoblic expression for y,

y = ad.einsum("ab,bc->ac", x1, x2) + x1

With this computation graph, we can evaluate the value of y given any values of x1 and x2: simply walk the graph in a topological order, and for each node, use its associated operator to compute an output value given input values. The evaluation is done in Executor.run method.

executor = ad.Executor([y])
y_val = executor.run(feed_dict = {x1 : x1_val, x2 : x2_val})

If we want to evaluate the gradients of y with respect to x1 and x2, as we would often do for loss function wrt parameters in usual machine learning training steps, we need to construct the gradient nodes, grad_x1 and grad_x2.

grad_x1, grad_x2 = ad.gradients(y, [x1, x2])

Once we construct the gradients node, and have references to them, we can evaluate the gradients using Executor as before,

executor = ad.Executor([y, grad_x1, grad_x2])
y_val, grad_x1_val, grad_x2_val = executor.run(feed_dict = {x1 : x1_val, x2 : x2_val})

grad_x1_val, grad_x2_val now contain the values of dy/dx1 and dy/dx2.

Second-order information

This repo also supports Hessian-vector products through reverse-mode autodiff. As to a expression y = x^T @ H @ x, we first define the expression

x = ad.Variable(name="x", shape=[3, 1])
H = ad.Variable(name="H", shape=[3, 3])
v = ad.Variable(name="v", shape=[3, 1])
y = ad.sum(ad.einsum("ab,bc->ac", ad.einsum("ab,bc->ac", ad.transpose(x), H), x))

Then define the expression for the gradient and Hessian-vector product:

grad_x, = ad.gradients(y, [x])
Hv, = ad.hvp(output_node=y, node_list=[x], vector_list=[v])

Then we can evaluate y, grad_x and Hv all at once:

y_val, grad_x_val, Hv_val = executor.run(feed_dict={
    x: x_val, H: H_val, v: v_val
    })

Source code generation

This repo also supports source code generation for the constructed computational graphs. See the test file for example usage.

Acknowledging Usage

The library is available to everyone. If you would like to acknowledge the usage of the library, please cite the following paper:

Linjian Ma, Jiayu Ye, and Edgar Solomonik. AutoHOOT: Automatic High-Order Optimization for Tensors. International Conference on Parallel Architectures and Compilation Techniques (PACT), October 2020.

autohoot's People

Contributors

Stargazers

Watchers

Forkers

byzantine

autohoot's Issues

Error with older version of attrs

I get the following error:

(python_env) ➜  software python -c "import autohoot"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/mnt/home/mfishman/software/AutoHOOT/autohoot/__init__.py", line 15, in <module>
    from . import autodiff
  File "/mnt/home/mfishman/software/AutoHOOT/autohoot/autodiff.py", line 20, in <module>
    from autohoot.utils import find_topo_sort, sum_node_list, inner_product, find_topo_sort_p
  File "/mnt/home/mfishman/software/AutoHOOT/autohoot/utils.py", line 236, in <module>
    @attr.s(eq=False)
TypeError: attrs() got an unexpected keyword argument 'eq'

with attrs-19.1.0. Upgrading to attrs-21.2.0 fixes this. Maybe the version should be specified in requirements.txt?

Einsum node doesn't support sum-like operations.

The gradients of the sum-like einsum operations, such as b = einsum("ij->", a) is not calculated correctly.

dimensionality restriction for gradients function

Our gradients function only supports the case where the output node represents a scalar. However, it is not easy to add this restriction to the current code to avoid wrong calculations, cuz our current computation graph doesn't contain dimensionality information.

Need to assign node.name as a property variable

Transpose issue in dedup of sequential_optiaml_tree

For the input graphs:

new_A = ad.einsum('ba,ca,da,ebc->ed',B,C,ad.tensorinv(ad.einsum('ab,ac,db,dc->bc',B,B,C,C), ind=1),input_tensor)
new_B = ad.einsum('ba,ca,da,bec->ed',A,C,ad.tensorinv(ad.einsum('ab,ac,db,dc->bc',A,A,C,C), ind=1),input_tensor)
new_C = ad.einsum('ba,ca,da,bce->ed',A,B,ad.tensorinv(ad.einsum('ab,ac,db,dc->bc',A,A,B,B), ind=1),input_tensor)

after we run

    new_A, new_B, new_C = generate_sequential_optiaml_tree({
        new_A: A,
        new_B: B,
        new_C: C
    })

the resulting graphs will be

new_A = ad.einsum('abe,da,ba->ed',ad.einsum('ebc,ca->abe',input_tensor,C),ad.tensorinv(ad.einsum('ab,ac,db,dc->bc',B,B,C,C), ind=1),B)
new_B = ad.einsum('abe,da,ba->ed',ad.einsum('bec,ca->abe',input_tensor,C),ad.tensorinv(ad.einsum('ab,ac,db,dc->bc',A,A,C,C), ind=1),A)
new_C = ad.einsum('ba,ca,da,bce->ed',A,B,ad.tensorinv(ad.einsum('ab,ac,db,dc->bc',A,A,B,B), ind=1),input_tensor)

Note that in the new new_A and new_B expressions, we have ad.einsum('ebc,ca->abe',input_tensor,C) and ad.einsum('bec,ca->abe',input_tensor,C) which are just transposes and cannot be further deduped.

source generation module can output string to simplify test functions.

Implement Tensorflow as a backend.

Optimizing the Hessian of `xᵀ A x`

Hi,

I am trying to compute the Hessian of xᵀ A x w.r.t. x, then optimize the computation graph, expecting the result to be A + Aᵀ. However, the call to optimize crashes with a ValueError (please see the MWE below).

Interestingly, if I replace xᵀ A x by xᵀ A B x, the optimization works and yields the desired result A B + (A B)ᵀ.

Best,
Felix

from autohoot import autodiff as ad
from autohoot.graph_ops import graph_transformer

dim = 3
x = ad.Variable(name="x", shape=[dim])
A = ad.Variable(name="A", shape=[dim, dim])
B = ad.Variable(name="B", shape=[dim, dim])

# ✔ Compute the Hessian of `y = xᵀ A B x` w.r.t. `x`
y = ad.einsum("i,ij,jk,k->", x, A, B, x)
Hx_y = ad.hessian(y, [x])[0][0]
print(Hx_y)
# >>> (T.einsum('ac,cb->ab',T.identity(3),T.einsum('ab,bc->ca',A,B))+T.einsum('ac,cb->ab',T.identity(3),T.einsum('ab,bc->ac',A,B)))

# ✔ Optimize the graph to get `A B + (A B)ᵀ`
Hx_y_opt = graph_transformer.optimize(Hx_y)
print(Hx_y_opt)
# >>> (T.einsum('ab,bc->ca',A,B)+T.einsum('ab,bc->ac',A,B))

# ✔ Compute the Hessian of `z = xᵀ A x` w.r.t. `x`
z = ad.einsum("i,ij,j->", x, A, x)
Hx_z = ad.hessian(z, [x])[0][0]
print(Hx_z)
# >>> (T.einsum('ac,cb->ab',T.identity(3),T.einsum('ab,bc->ca',A,B))+T.einsum('ac,cb->ab',T.identity(3),T.einsum('ab,bc->ac',A,B)))

# ❎ Optimize the graph to get `A + Aᵀ`
Hx_z_opt = graph_transformer.optimize(Hx_z)
print(Hx_z_opt)
# >>> ValueError: Output character 'd' did not appear in the input

Jax frontend doesn't work with jaxlib>0.1.59

As title

Numpy einsum path bugs

Numpy einsum path has multiple bugs. I plan to change everything to dependent on Opt_einsum rather than numpy

Enable larger tensor networks in einsum expressions

Currently, the Einsum expression capacity is bottlenecked by the number of characters allowed. One easy fix is similar to what's done in opt_einsum: https://optimized-einsum.readthedocs.io/en/stable/_modules/opt_einsum/parser.html#get_symbol, where we allow unicode characters, and be careful when we call functions like numpy.einsum, since that einsum only allow simple letters. This will increase the number of indices allowed to at least 10^6.

I suggest we going in this way since I believe there's minor refactorization needed for this modification.

linearize function crashes when expression is too long.

when the expression is too long, will produce RecursionError: maximum recursion depth exceeded while calling a Python object for the linearize function.

The following example can reproduce the error:

import autodiff as ad
from graph_ops.graph_transformer import linearize
from examples.cpd import cpd_graph

A, B, C, input_tensor, loss, residual = cpd_graph(100, 100)
hessian = ad.hessian(loss, [A, B, C])
linearize(hessian[0][1])

Dedup corner case

Currently, the dedup function regards T.einsum('cb,ca->ab',B,B) and T.einsum('cb,ca->ba',B,B) as different expressions. However, their results are the same and should be viewed as duplicated.

Possible Symbolic Execution Idea

We can use https://docs.sympy.org/latest/tutorial/intro.html#the-power-of-symbolic-computation.

This can do something like
`
simplify('x + x') = 2x

simplify('x - (x-xxxxx)') = xxxxx
`

Though, we would need to regenerate the node w.r.t the name.

WDYT?

OOM when run large scale CTF experiments

When generating the optimized contraction path, current implementation generate random tensors and regard those as numpy einsum path inputs. It would be an issue when doing large scale experiments on the parallel systems, where Numpy will create the big tensors on each process, resulting in OOM.