Code Monkey home page Code Monkey logo

xdiff.jl's Introduction

XDiff.jl - eXpression Differentiation package

Build Status

This package is unique in that it can differentiate vector-valued expressions in Einstein notation. However, if you only need gradients of scalar-valued functions (which is typicial in machine learning), please use XGrad.jl instead. XGrad.jl is re-thought and stabilized version of this package, adding many useful featues in place of (not frequently used) derivatives of vector-valued functions. If nevertheless you want to continue using XDiff.jl, please pin Espresso.jl to version v3.0.0, which is the last supporting Einstein notation.

XDiff.jl is an expression differentiation package, supporting fully symbolic approach to finding tensor derivatives. Unlike automatic differentiation packages, XDiff.jl can output not only ready-to-use derivative functions, but also their symbolic expressions suitable for further optimization and code generation.

Expression differentiation

xdiff takes an expression and a set of "example values" and returns another expression that calculates the value together with derivatives of an output variable w.r.t each argument. Example values are anything similar to expected data, i.e. with the same data type and size. In the example below we want w and x to be vectors of size (3,) while b to be a scalar.

# expressions after a semicolon are "example values" - something that looks like expected data
xdiff(:(y = sum(w .* x) + b); w=rand(3), x=rand(3), b=rand())
# quote 
#     dy!dx = @get_or_create(mem, :dy!dx, zeros(Float64, (3,)))
#     dy!dw = @get_or_create(mem, :dy!dw, zeros(Float64, (3,)))
#     y = @get_or_create(mem, :y, zero(Float64))
#     tmp658 = @get_or_create(mem, :tmp658, zero(Float64))
#     dy!dtmp658 = @get_or_create(mem, :dy!dtmp658, zero(Float64))
#     tmp658 = sum(w .* x)
#     y = tmp658 .+ b
#     dy!dtmp658 = 1.0
#     dy!dw .= x
#     dy!dx .= w
#     tmp677 = (y, dy!dw, dy!dx, dy!dtmp658)
# end

By default, xdiff generates a highly-optimized code that uses a set of buffers stored in a dictionary mem. You may also generate slower, but more readable expression using VectorCodeGen:

ctx = Dict(:codegen => VectorCodeGen())
xdiff(:(y = sum(w .* x) + b); ctx=ctx, w=rand(3), x=rand(3), b=rand())
# quote 
#     tmp691 = w' * x
#     y = tmp691 + b
#     dy!dtmp691 = 1.0
#     dy!db = 1.0
#     dy!dw = x
#     dy!dx = w
#     tmp698 = (y, dy!dw, dy!dx, dy!db)
# end

or in Einstein indexing notation using EinCodeGen:

ctx = Dict(:codegen => EinCodeGen())
xdiff(:(y = sum(w .* x) + b); ctx=ctx, w=rand(3), x=rand(3), b=rand())
# quote
#     tmp700 = w[i] .* x[i]
#     y = tmp700 + b
#     dy!dtmp700 = 1.0
#     dy!db = 1.0
#     dy!dw[j] = dy!dtmp700 .* x[j]
#     dy!dx[j] = dy!dtmp700 .* w[j]
#     tmp707 = (y, dy!dw, dy!dx, dy!db)
# end

Function differentiation

xdiff also provides a convenient interface for generating function derivatives:

# evaluate using `include("file_with_function.jl")` 
f(w, x, b) = sum(w .* x) .+ b

df = xdiff(f; w=rand(3), x=rand(3), b=rand())
df(rand(3), rand(3), rand())
# (0.8922305671741435, [0.936149, 0.80665, 0.189789], [0.735201, 0.000282879, 0.605989], 1.0)

Note, that xdiff will try to extract function body as it was written, but it doesn't always work smoothly. One сommon case when function body isn't available is when function is defined in REPL, so for better experience load functions using include(<filename>) or using <module>.

Limitations

  • loops are not supported
  • conditional branching is not supported

Loops and conditional operators may introduce discontinuity points, potentially resulting in very complex and heavy piecewise expressions, and thus are not supported. However, many such expressions may be rewritten into analytical form. For example, many loops may be rewritten into some aggregation function like sum() (already supported), and many conditions may be expressed as multiplication like f(x) * (x > 0) + g(x) * (x <= 0) (planned). Please, submit an issue if you are interested in supporting some specific feature.

How it works

On the high level, scalar expressions are differentiated as follows:

  1. Expression is parsed into an ExGraph - a set of primitive expressions, mostly assignments and single function calls.
  2. Resulting ExGraph is evaluated using example values to determine types and shape of all variables (forward pass).
  3. Similar to reverse-mode automatic differentiation, derivatives are propagated backward from output to input variables. Unlike AD, however, derivatives aren't represented as values, but instead as symbolic exprssions.

Tensor expressions exploit very similar pipeline, but act in Einstein notation.

  1. Tensor expression is transformed into Einstein notation.
  2. Expression in Einstein notation is parsed into an Einraph (indexed variant of ExGraph).
  3. Resulting EinGraph is evaluated.
  4. Partial derivatives are computed using tensor or element-wise rules for each element of each tensor, then propagated from output to input variables.
  5. Optionally, derivative expressions are converted back to vectorized notation.

xdiff.jl's People

Contributors

dfdx avatar femtocleaner[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

xdiff.jl's Issues

Test ALL to_buffered rules

with something like:

X = rand(...)
Y = rand(...)
Ze = zeros(...)
Zb = zeros(...)
eval(ein_expr)
eval(buf_expr)
@assert Ze == Zb

sizes of variables should be determined automatically from indices in Einstein notation.

Cache derivative function for different input sizes

Currently we generate derivative functions that are bound to input sizes. If input size changes, user needs to repeat differentiation process which may be costly. Rather than put it on the user, we can maintain a cache of functions, one per a set of input sizes, and provide the user with a "meta" function that adaptively selects needed derivative function.

We also need to keep track of buffer sizes in memory dict. The easiest way should be to replace @get_or_create with something like @get_or_alloc that also checks the size of the buffer.

This issue indirectly depends on GPU codegen.

Improve correctness

  • temporarily remove automatic parsing of nested functions - it's not stable enough, so better an exception than an implicit error
  • remove "assumed" broadcasting: if there's no rule in TDIFF_RULES, arrays shouldn't be automatically broadcasted

Info about upcoming removal of packages in the General registry

As described in https://discourse.julialang.org/t/ann-plans-for-removing-packages-that-do-not-yet-support-1-0-from-the-general-registry/ we are planning on removing packages that do not support 1.0 from the General registry. This package has been detected to not support 1.0 and is thus slated to be removed. The removal of packages from the registry will happen approximately a month after this issue is open.

To transition to the new Pkg system using Project.toml, see https://github.com/JuliaRegistries/Registrator.jl#transitioning-from-require-to-projecttoml.
To then tag a new version of the package, see https://github.com/JuliaRegistries/Registrator.jl#via-the-github-app.

If you believe this package has erroneously been detected as not supporting 1.0 or have any other questions, don't hesitate to discuss it here or in the thread linked at the top of this post.

Unpack struct fields

It would be much more pleasant to write:

struct MyModel
    W1::Matrix
    b1::Vector
    W2::Matrix
    b2::Vector
    W3::Matrix
    b3::Vector
end

predict_cost(m, x, y)
predict_cost_grad(m, x, y)

rather than

predict_cost(W1, b1, W2, b2, W3, b3, x, y)
predict_cost_grad(W1, b1, W2, b2, W3, b3, x, y)

I believe we can automatically generate intermediate functions like:

predict_cost(m_W1, m_b1, m_W2, m_b2, m_W3, m_b3, x, y)

Find derivatives for them and then generate corresponding derivative functions for both - flat arguments and structs.

Reduce common subexpressions

Example from autoencoder derivatives:

julia> dexs[:We1]                                                                                                                                                                                                 
 quote  # /home/slipslop/.julia/v0.6/Espresso/src/from_einstein.jl, line 101:                                                                                                                                      
     ##707 = squeeze(sum(Wd, 1), 1)                                                                                                                                                                                
     ##708 = ##707 .* We2                                                                                                                                                                                          
     ##709 = squeeze(sum(##708, 1), 1)                                                                                                                                                                             
     ##710 = ##709 .* x                                                                                                                                                                                            
     dcost_dWe1 = repmat((squeeze(sum(##710, 2), 2))', size(We1, 1))                                                                                                                                               
 end                                                                                                                                                                                                               
                                                                                                                                                                                                                   
!julia> dexs[:x]                                                                                                                                                                                                   
 quote  # /home/slipslop/.julia/v0.6/Espresso/src/from_einstein.jl, line 101:                                                                                                                                      
     ##711 = -1.0                                                                                                                                                                                                  
     ##712 = squeeze(sum(Wd, 1), 1)                                                                                                                                                                                
     ##713 = ##712 .* We2                                                                                                                                                                                          
     ##714 = squeeze(sum(##713, 1), 1)                                                                                                                                                                             
     ##715 = ##714 .* We1                                                                                                                                                                                          
     ##716 = ##711 .+ ##715                                                                                                                                                                                        
     dcost_dx = transpose(squeeze(sum(##716, 1), 1))                                                                                                                                                               
 end

Here variables ##707 to ##709 compute the same thing as variables ##712 to ##714. It should be easy to rewrite all reverences of the later to references to the former and remove unused vars.

`*` transformed to `.*` for scalars

I am playing around with Espresso and XDiff, because I think it would be cool for some future version of LossFunctions.jl to generate all the losses, their derivatives, and their properties just based on defining the function (like @makeloss L(x) = 1/2 * x^2)

Taking the example of a simple popular version of the least squares loss L(x) = 1/2 * x^2, I found that rdiff changes the operation * to the broadcasted version .* which seems to prevent constant folding

julia> using Espresso, XDiff

julia> rdiff(:(1/2 * x^2), x=1.)
Dict{Symbol,Expr} with 1 entry:
  :x => :(0.5 .* (2x))

julia> simplify(:(0.5 .* (2x)))
:(0.5 .* (2x))

julia> simplify(:(0.5 * (2x)))
:x

Is this expected behaviour? How would you suggest to deal with such cases?

Multiple methods for a derivative function

We need methods like foo_deriv_123(W, x, b, y) and foo_deriv_123(W, x, b, y, mem) to be able to call them using Base.invokelatest. Also may be a good starting point for #19.

Can be achieved using something like:

func_expr1 = ...
func_expr2 = ...
f = eval(mod, func_expr1)
eval(mod, func_expr2)

Special functions

Currently we support:

  1. Matrix multiplication
  2. Summation
  3. Element-wise operations and broadcasting
  4. Simple convolutions

These operations work well in Einstein notation and allow to find things like Jacobians automatically. However, it's also possible to add gradient-only functions (e.g. segmented_sum) which I will call "special functions".

Support simple loops

Although preferred way to express iteration is through aggregation functions, we may still encounter loops in 3rd party code and it would be nice to support at least simple ones. One particular example is:

y = 0.0
for x in xs
    y += f(x)
end

We have 2 options here:

  1. Preprocess loops transforming them into aggregation functions. E.g. loop above may be transformed into sum(f.(x)).
  2. Translate a derivative of a loop into a loop of derivatives. This is more general-purpose, but looks also much harder to implement.

Deprecate from_einstein?

I feel like I'm constantly doing double work to translate expressions from Einstein notation to vectorized and to buffered notations. It's not only annoying, but also error-prone. The main question is: can we remove from_einstein and use to_buffered everywhere?

So far the only place where from_einstein is actively used is in evaluate!.

Documentation

Just to track the most important parts that need to be documented:

  • derivative of an expression
  • derivative of a function
  • gradient vs. Jacobian, vectorized vs. Einstein notation
  • special functions
  • diff rules (@scalardiff, @tensordiff, @specialdiff)
  • code generation
  • GPU
  • benchmarks

Support broadcasting

  1. Parse broadcasting expressions (e.g. log.(x)) into a new ExNode{:broadcast}.
  2. Remove support for automatic broadcasting for elementwise functions.
  3. Adjust differentiation rules.

Parse nested function calls

E.g.:

predict(W, b, x) = W * x + b
predict_cost(W, b, x, y) = sumabs2(predict(W, x, b) - y)

Most of this functionality is already in place, IIRC the only issue is the conflicting names of intermediate variables in nested and main functions which we may simply rename.

Self-consistency checks?

Just fixed a bug in eliminate_common - one of many graph transformation subroutines that are deadly hard to debug. To discover issues with such utils we can add self-consistency checks. The simplest one is to evaluate graph before and after transformation and test if some invariants hold. The price for it is a prolonged derivative compilation and so should only be done if a debug option/global var is set.

`meta` field and keyword arguments

I find more and more use cases for a meta field in ExNode. One of them - to store keyword parameters to function call. For example, we could parse this:

Y = conv(X, W; stride=2)

to an ExNode like this:

ExNode
  var = Y
  ex = conv(X, W)
  meta = {kw = (stride => 2)}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.