juliadiff / reversediff.jl Goto Github PK

View Code? Open in Web Editor NEW

343.0 343.0 57.0 592 KB

Reverse Mode Automatic Differentiation for Julia

License: Other

Julia 100.00%

automatic-differentiation calculus julia

reversediff.jl's People

Contributors

Stargazers

Watchers

Forkers

dpsanders bicycle1885 tkelman jaredcrean2 ccxxxx andreasnoack simondanisch rdeits gragusa yochju miguelraz cscherrer rcnlee lstagner ericagol eeshan9815 chrisrackauckas dulatf probcomp hulalazz navytensor amontoison batmanabcdefg stjordanis sinsixx asylumcorp carlosal1015 mikeinnes sethaxen mohamed82008 mipals quantumlang mateuszbaran yiyuezhuo bzinberg libbum devmotion standardgalactic rajexplo freyreeste playfloor red-portal hyrodium isgasho ahbenglcb moelf sichenghe torfjelde ranocha thummeto chasingzenith tmigot jacobusmmsmit prbzrg josbert1 eloceanografo bsc-quantic

reversediff.jl's Issues

Support undifferentiated parameters for pre-recorded API

Right now, if I have a function f(a, b, c) and I only want to create a function which returns the gradient w.r.t. to a and b, I have two options:

∇f(a, b, c) = ReverseDiff.gradient((x, y) -> f(x, y, c), (a, b)))
∇f! = ReverseDiff.compile_gradient(f, (a, b, c)), and just ignore the c gradient that will pop out

The former has to re-record the function for every call, while the latter wastes some computation differentiating w.r.t. c.

We should support something akin to Tensorflow's placeholders for the pre-recorded API, allowing you to drop in updatable parameters that aren't differentiated against. This can be accomplished by recording the tape as normal, and then "turning off" differentiation on the selected parameters (the idiom for that currently is to set the tape to NULL_TAPE, but I'm going to play around with it). Some refactoring should probably be done to get the most out of this change performance-wise (e.g., allow the instantiation of a TrackedArray with deriv == nothing).

As for the API, I can think of two different paths we could take:

Select which arguments are to be differentiated against using a wrt function, e.g. ReverseDiff.compile_gradient(f, (wrt(a), wrt(b), c))
Select which arguments are not to be differentiated against using a param function, e.g. ReverseDiff.compile_gradient(f, (a, b, param(c)))

Cannot compute derivatives of quadratic form.

I tried to compute the gradient of a function with quadratic form. However, it failed with ambiguity error as follows:

import ReverseDiff
const A = [1.0 2.0; 2.0 5.0]
quadratic(x) = x' * A * x
ReverseDiff.gradient(quadratic, ones(2))

MethodError: *(::RowVector{ReverseDiff.TrackedReal{Float64,Float64,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}},ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}}, ::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}) is ambiguous. Candidates:
  *(x::AbstractArray{T,2} where T, y::ReverseDiff.TrackedArray{V,D,N,VA,DA} where DA where VA where N) where {V, D} in ReverseDiff at /Users/kenta/.julia/v0.6/ReverseDiff/src/derivatives/linalg/arithmetic.jl:193
  *(x::AbstractArray, y::ReverseDiff.TrackedArray{V,D,N,VA,DA} where DA where VA where N) where {V, D} in ReverseDiff at /Users/kenta/.julia/v0.6/ReverseDiff/src/derivatives/linalg/arithmetic.jl:193
  *(rowvec::RowVector{T,V} where V<:(AbstractArray{T,1} where T), vec::AbstractArray{T,1}) where T<:Real in Base.LinAlg at linalg/rowvector.jl:170
Possible fix, define
  *(::RowVector{ReverseDiff.TrackedReal{V,D,ReverseDiff.TrackedArray{V,D,1,VA,DA}},V} where V<:(AbstractArray{T,1} where T), ::ReverseDiff.TrackedArray{V,D,1,VA,DA})

Stacktrace:
 [1] * at ./operators.jl:424 [inlined]
 [2] quadratic(::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}) at ./In[32]:2
 [3] Type at /Users/kenta/.julia/v0.6/ReverseDiff/src/api/tape.jl:199 [inlined]
 [4] gradient(::Function, ::Array{Float64,1}, ::ReverseDiff.GradientConfig{ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}}) at /Users/kenta/.julia/v0.6/ReverseDiff/src/api/gradients.jl:22 (repeats 2 times)

I think matrix multiplication is already supported. I'm not sure whether it is an unsupported feature or a kind of bug, so let me file an issue here.

I'm using ReverseDiff.jl v0.1.4 on Julia 0.6.

Support scalar inputs to differentiation API functions

I did not include a scalar derivative function in the ReverseDiff API on purpose, since ForwardDiff is a much better tool for such a method. However, we do allow users to pass in tuples of arguments to API methods. In this case, we should allow elements of the tuples to be scalars. For example, see here; it's unintuitive (and poor style for the objective code) that b should need to be an array and not a scalar.

Exploiting sparsity in higher-order differentiation computations

We should explore the use of techniques such as edge-pushing, graph coloring, etc. to discover and exploit sparsity patterns in ReverseDiff's second-order derivative computations.

Maybe sparsity optimization could be employed automagically whenever nested tapes are encountered?

installing ReverseDiff downgrades ForwardDiff from 0.5.0 to 0.4.2

Removing it allows the latest ForwardDiff again:

julia> Pkg.rm("ReverseDiff")
INFO: Installing CommonSubexpressions v0.0.1
INFO: Upgrading ForwardDiff: v0.4.2 => v0.5.0
INFO: Installing RealInterface v0.0.3
INFO: Removing FunctionWrappers v0.1.0
INFO: Removing ReverseDiff v0.1.5
INFO: Package database updated

ReverseDiff doesn't work for noop f(x)=x

julia> x_seed = [1.0]
1-element Array{Float64,1}:
 1.0

julia> f(x) = x
f (generic function with 1 method)

julia> gcfg = ReverseDiff.GradientConfig(x_seed)
ReverseDiff.GradientConfig

julia> g! = (x, out) -> ReverseDiff.gradient!(out, f, x, gcfg)
(::#17) (generic function with 1 method)

julia> g!([1.0], [0.0])
1-element Array{Float64,1}:
 0.0

julia> f(x) = 1.0*x
f (generic function with 1 method)

julia> g!([1.0], [0.0])
1-element Array{Float64,1}:
 1.0

julia> f(x) = 3.0*x
f (generic function with 1 method)

julia> g!([1.0], [0.0])
1-element Array{Float64,1}:
 3.0

Granted, not the most interesting function in the world, but there must be something that you're missing here, since it doesn't think the function changes with x.

MethodError from using `sum(x, 2)`

p = randn(2,3)
ReverseDiff.@forward f(p) = exp.(p) ./ sum(exp.(p), 2) # softmax
f! = ReverseDiff.compile_gradient(x -> sum(f(x)), similar(p))
f!(similar(p), p)

Gives the following error

ERROR: MethodError: no method matching broadcast_deriv_increment!(::Array{ReverseDiff.TrackedReal{Float64,Float64,Void},2}, ::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}, ::Void)
Closest candidates are:
  broadcast_deriv_increment!(::AbstractArray{T,N}, ::Any) at /home/dom/.julia/v0.5/ReverseDiff/src/derivatives/elementwise.jl:632
  broadcast_deriv_increment!(::Any, ::Any, ::Ref{T}) at /home/dom/.julia/v0.5/ReverseDiff/src/derivatives/elementwise.jl:569
  broadcast_deriv_increment!(::AbstractArray{T,N}, ::Any, ::AbstractArray{T,N}) at /home/dom/.julia/v0.5/ReverseDiff/src/derivatives/elementwise.jl:673
  ...
 in special_reverse_exec!(::ReverseDiff.SpecialInstruction{Base.#./,Tuple{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},Array{ReverseDiff.TrackedReal{Float64,Float64,Void},2}},ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},Tuple{Array{Float64,2},Void}}) at /home/dom/.julia/v0.5/ReverseDiff/src/derivatives/elementwise.jl:465
 in reverse_exec!(::ReverseDiff.SpecialInstruction{Base.#./,Tuple{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},Array{ReverseDiff.TrackedReal{Float64,Float64,Void},2}},ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},Tuple{Array{Float64,2},Void}}) at /home/dom/.julia/v0.5/ReverseDiff/src/tape.jl:74
 in (::##33#34)() at /home/dom/.julia/v0.5/ReverseDiff/src/api/tape.jl:80
 in seeded_reverse_pass!(::Array{Float64,2}, ::ReverseDiff.TrackedReal{Float64,Float64,Void}, ::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}, ::ReverseDiff.Compiled{ReverseDiff.GradientTape{##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void}},##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void},##31#32,##33#34}) at /home/dom/.julia/v0.5/ReverseDiff/src/api/utils.jl:30
 in seeded_reverse_pass! at /home/dom/.julia/v0.5/ReverseDiff/src/api/tape.jl:41 [inlined]
 in gradient!(::Array{Float64,2}, ::ReverseDiff.Compiled{ReverseDiff.GradientTape{##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void}},##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void},##31#32,##33#34}, ::Array{Float64,2}) at /home/dom/.julia/v0.5/ReverseDiff/src/api/gradients.jl:80
 in (::ReverseDiff.##301#302{ReverseDiff.Compiled{ReverseDiff.GradientTape{##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void}},##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void},##31#32,##33#34}})(::Array{Float64,2}, ::Array{Float64,2}) at /home/dom/.julia/v0.5/ReverseDiff/src/api/tape.jl:100

If I do ReverseDiff.@forward f(p) = exp.(p) ./ sum(exp.(p)) there's no error. So, I'm guessing it's something to do with adding the dimension to sum over.

Confusion with .^

I'm confused by how .^ works.

using ReverseDiff

x = [-1. -2; -1 -2]
y = [1.; 2]
w = [2.; 3]
b = [-3.]

ReverseDiff.@forward sigmoid(x) = 1. ./ (1. + exp(-x))
#= mse(a, y) = mean((a - y) .* (a - y)) =#
mse(a, y) = mean((a - y) .^ 2)
f1(x, y, w, b) = mse(sigmoid(x*w .+ b), y)

out = f1(x, y, w, b)
println(out)

inputs = (x, y, w, b)
result = map(similar, inputs)
f1! = ReverseDiff.compile_gradient(f1, result)

@time f1!(result, inputs)
@time f1!(result, inputs)

for r in result
    println(r)
end

#= Expected output =#
#=  =#
#= [array([[ -3.34017280e-05,  -5.01025919e-05], =#
#=        [ -6.68040138e-05,  -1.00206021e-04]]), array([[ 0.9999833], =#
#=        [ 1.9999833]]), array([[  5.01028709e-05], =#
#=        [  1.00205742e-04]]), array([ -5.01028709e-05])] =#

(a - y) .^ 2 is calling https://github.com/JuliaDiff/ReverseDiff.jl/tree/master/src/derivatives#L512 which has a call to log, so, if some values are negative it'll throw a DomainError. I'm wondering what's going on here because on the surface we're just squaring values. mse(a, y) = mean((a - y) .* (a - y)) works as expected.

support for GPU-backed arrays

Originally, I thought we'd need a special macro directive for this, but I now believe this can just work by allowing the user to pass in GPU-backed arrays as input to the API methods. In other words, ReverseDiff won't provide a GPU-backed array type; it will compose with the GPU-backed array types provided by other well-established libraries.

For now, Julia's ArrayFire wrapper should be sufficient to start playing around with this functionality for a subset of ReverseDiff's methods - at least enough to build some proof-of-concept examples. Eventually, I'd like to support GPUArrays.jl, though it's probably not productive to work on that until GPUArrays has solid linear algebra coverage out-of-the-box.

In some cases, existing derivative definitions might "just work" for GPU-backed arrays (though they might have poor performance). In many cases, we'll likely have to dispatch on the array type to write special derivative methods for GPU-backed arrays.

cc the GPU-related folks: @SimonDanisch @vchuravy @maleadt @ranjanan @MikeInnes

More in depth examples

Hi Jared,

Awesome project here! I do think it's lacking a bit in the example realm. The gradient.jl example is nice, shows the API and what not, but, I'm still not sure how it fits in a more complicated example.

Maybe a neural net training on MNIST would be a good starting point. I've tried to do this myself but ran into some issues.

https://github.com/HIPS/autograd/tree/master/examples has a bunch of good ones.

Softmax function doesn't work

I'm testing out some neural nets stuff and I found an error when doing softmax function.

https://gist.github.com/domluna/bf1d3061244e7d7d9a3da467a6614006

I commented out quite a bit to try get a minimal example. The softmax function isn't correct since sum over dimensions isn't yet implemented so changed it to just sum(exped) to try to make it work.

ERROR: LoadError: MethodError: no method matching eachindex(::ReverseDiffPrototype.TraceReal{Float64,Float64})
Closest candidates are:
  eachindex(::Tuple) at tuple.jl:19
  eachindex(::Tuple, ::Tuple...) at tuple.jl:22
  eachindex(::AbstractArray{T,1}) at abstractarray.jl:679
  ...
 in propagate_adjoint!(::ReverseDiffPrototype.TraceReal{Float64,Float64}, ::ReverseDiffPrototype.TraceReal{Float64,Float64}) at /Users/dluna/.julia/v0.5/ReverseDiffPrototype/src/backprop.jl:82
 in special_backprop_step!(::Base.Broadcast.#broadcast, ::Tuple{Array{ReverseDiffPrototype.TraceReal{Float64,Float64},2},ReverseDiffPrototype.TraceReal{Float64,Float64}}, ::ReverseDiffPrototype.TraceReal{Float64,Float64}, ::ReverseDiffPrototype.TraceReal{Float64,Float64}) at /Users/dluna/.julia/v0.5/ReverseDiffPrototype/src/backprop.jl:74
 in backprop_step!(::ReverseDiffPrototype.TraceNode{Base.Broadcast.#broadcast,Tuple{Array{ReverseDiffPrototype.TraceReal{Float64,Float64},2},ReverseDiffPrototype.TraceReal{Float64,Float64}},ReverseDiffPrototype.TraceReal{Float64,Float64},ReverseDiffPrototype.TraceReal{Float64,Float64}}) at /Users/dluna/.julia/v0.5/ReverseDiffPrototype/src/backprop.jl:20
 in backprop!(::Array{ReverseDiffPrototype.TraceNode,1}) at /Users/dluna/.julia/v0.5/ReverseDiffPrototype/src/backprop.jl:14
 in gradient(::##243#245{Array{Float64,2},Array{Float64,2},Array{Int64,1}}, ::Array{Float64,2}, ::Array{ReverseDiffPrototype.TraceReal{Float64,Float64},2}) at /Users/dluna/.julia/v0.5/ReverseDiffPrototype/src/api.jl:8
 in nn_backward(::Array{Float64,2}, ::Array{Float64,2}, ::Array{Float64,2}, ::Array{Int64,1}) at /Users/dluna/.julia/v0.5/Butterfly/src/rdp.jl:34
 in include_from_node1(::String) at ./loading.jl:426
 in include_from_node1(::String) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
while loading /Users/dluna/.julia/v0.5/Butterfly/src/rdp.jl, in expression starting on line 39

MethodError when using scalar variables

The following snippet:

f(w, x, b) = sum(w .* x .+ b)                              
ReverseDiff.gradient(f, ([1.0, 1.0, 1.0], [2.0, 2.0, 2.0], 3.0))

gives this error:

ERROR: MethodError: objects of type Float64 are not callable
Stacktrace:
[1] map(::ReverseDiff.##297#298{Float64,Array{ReverseDiff.AbstractInstruction,1}}, ::Tuple{Array{Float64,1},Array{Float64,1},Float64}) at ./tuple.jl:160
[2] Type at /home/slipslop/.julia/v0.6/ReverseDiff/src/api/Config.jl:46 [inlined]
[3] Type at /home/slipslop/.julia/v0.6/ReverseDiff/src/api/Config.jl:37 [inlined] (repeats 2 times)
[4] gradient(::Function, ::Tuple{Array{Float64,1},Array{Float64,1},Float64}) at /home/slipslop/.julia/v0.6/ReverseDiff/src/api/gradients.jl:22

Note that this error dissapears if I remove b which is the only scalar in the expression.

Julia 0.6-rc2
ReverseDiff.jl 820690c (latest master)

World age issues with compiled tapes on 0.6

Compiled tapes work fine from the REPL, but trying to construct one inside a function results in everyone's favorite 0.6 error:

julia> using ReverseDiff

julia> function foo(f, x)
           tape = ReverseDiff.compile(ReverseDiff.GradientTape(f, x))
           result = similar(x)
           ReverseDiff.gradient!(result, tape, x)
       end
foo (generic function with 1 method)

julia> g = y -> sum(x -> x^3 + x^2 - x - sin(x), y)
(::#3) (generic function with 1 method)

julia> foo(g, rand(10))
ERROR: MethodError: no method matching forward_pass!(::ReverseDiff.CompiledTape{Symbol("##657"),ReverseDiff.GradientTape{##3#5,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},ReverseDiff.TrackedReal{Float64,Float64,Void}}})
The applicable method may be too new: running in world age 21811, while current world is 21814.
Closest candidates are:
  forward_pass!(::ReverseDiff.CompiledTape{Symbol("##657"),ReverseDiff.GradientTape{##3#5,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},ReverseDiff.TrackedReal{Float64,Float64,Void}}}) at /Users/rdeits/.julia/v0.6/ReverseDiff/src/api/tape.jl:84 (method too new to be called from this world context.)
  forward_pass!(::ReverseDiff.CompiledTape{Symbol("##656"),ReverseDiff.GradientTape{#f,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},ReverseDiff.TrackedReal{Float64,Float64,Void}}}) at /Users/rdeits/.julia/v0.6/ReverseDiff/src/api/tape.jl:84
  forward_pass!(::Array{ReverseDiff.AbstractInstruction,1}) at /Users/rdeits/.julia/v0.6/ReverseDiff/src/tape.jl:76
  ...
Stacktrace:
 [1] seeded_forward_pass! at /Users/rdeits/.julia/v0.6/ReverseDiff/src/api/tape.jl:42 [inlined]
 [2] gradient!(::Array{Float64,1}, ::ReverseDiff.CompiledTape{Symbol("##657"),ReverseDiff.GradientTape{##3#5,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},ReverseDiff.TrackedReal{Float64,Float64,Void}}}, ::Array{Float64,1}) at /Users/rdeits/.julia/v0.6/ReverseDiff/src/api/gradients.jl:79
 [3] foo(::Function, ::Array{Float64,1}) at ./REPL[2]:4

julia> versioninfo()
Julia Version 0.6.0-rc2.0
Commit 68e911be53 (2017-05-18 02:31 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i7-2860QM CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, sandybridge)

(this is with ReverseDiff master)

Error on simple LogReg example

I'm probably doing something dumb here:

using ReverseDiff
using ReverseDiff: GradientConfig, gradient

f(W, b, x) = W*x + b
softmax(xs) = exp.(xs)/sum(exp(xs))
mse(x, y) = mean((x .- y).^2)

function net(W, b, x, y)
  ŷ = softmax(f(W, b, x))
  mse(ŷ, y)
end

W = randn(5, 10)
b = randn(5)
x = rand(10)
y = rand(5)
inputs = (W, b, x, y)
net(inputs...)

gradient(f, inputs)

Make it easy for users to inject derivative definitions

Just realized I misinterpreted the point of #5, so I'm superceding that issue with this one. We should probably have a mechanism for user-defined derivatives that works for n-ary, multivariate functions.

Cannot reproduce example

When trying

using ReverseDiff

f(a, b) = sum(a' * b + a * b')
a, b = rand(100, 100), rand(100, 100)
inputs = (a, b)
gradient(f, (a,b))

I receive an error:

ERROR: MethodError: no method matching gradient(::#f, ::Tuple{Array{Float64,2},Array{Float64,2}})
Closest candidates are:
  gradient(::Function) at /home/paulo/.julia/v0.6/Calculus/src/derivative.jl:22
  gradient(::Function, ::Union{Array{T<:Number,1}, T<:Number}) where T<:Number at /home/paulo/.julia/v0.6/Calculus/src/derivative.jl:17
  gradient(::Function, ::Union{Array{T<:Number,1}, T<:Number}, ::Symbol) where T<:Number at /home/paulo/.julia/v0.6/Calculus/src/derivative.jl:17
  ...
Stacktrace:
 [1] macro expansion at ./REPL.jl:97 [inlined]
 [2] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73

Here are my packages:

julia> Pkg.status()
4 required packages:
 - Distances                     0.4.1
 - Distributions                 0.14.0
 - MLBase                        0.6.1
 - ReverseDiff                   0.1.4
35 additional packages:
 - AbstractFFTs                  0.1.0
 - BinDeps                       0.6.0
 - Calculus                      0.2.2
 - CatViews                      0.1.0
 - Compat                        0.26.0
 - DataArrays                    0.5.3
 - DataFrames                    0.10.0
 - DataStructures                0.5.3
 - DiffBase                      0.2.0
 - FFTW                          0.0.2
 - FileIO                        0.4.2
 - ForwardDiff                   0.4.2
 - FunctionWrappers              0.1.0
 - GZip                          0.3.0
 - Iterators                     0.3.1
 - Learn                         0.0.0-             master (unregistered)
 - LearnBase                     0.1.6
 - LossFunctions                 0.1.0
 - MLDataPattern                 0.1.2
 - MLDataUtils                   0.2.0
 - MLLabelUtils                  0.1.4
 - MappedArrays                  0.0.7
 - NaNMath                       0.2.5
 - PDMats                        0.7.0
 - QuadGK                        0.1.2
 - RecipesBase                   0.2.0
 - Reexport                      0.0.3
 - Rmath                         0.1.7
 - SHA                           0.3.3
 - ShowItLikeYouBuildIt          0.0.1
 - SortingAlgorithms             0.1.1
 - SpecialFunctions              0.1.1
 - StatsBase                     0.17.0
 - StatsFuns                     0.5.0
 - URIParser                     0.1.8

Concurrent Differentiation

Our implementation will break if you try to perform concurrent differentiations over the same function, since the TraceReal tag will always point to the same Trace.

Implement more of Giles' linear algebra derivatives

Many of ReverseDiff's linear algebra derivative implementations are based on a wonderful reference paper by Mike B. Giles. It turns out that, in his infinite generosity, Giles released an extended version of the paper shortly after the original paper, which contains even more derivative definitions (including some factorizations).

We should implement the rest of these in ReverseDiff for the sake of completeness.

(Thanks to @tpapp for exposing me to the extended version of the Giles paper in his ForwardDiff issue.)

JacobianConfig constructor is broken (Julia v0.6 bug JuliaLang/julia#20200)

This is actually a bug in Julia, not ReverseDiff, but I wanted to open this issue to track the ReverseDiff fix here. The corresponding issue in Base is JuliaLang/julia#20200.

Here's an example of how this bug affects ReverseDiff:

julia> ReverseDiff.JacobianConfig(rand(4), rand(4))
ERROR: MethodError: no method matching ReverseDiff.GradientConfig(::Array{Float64,1}, ::Array{Float64,1})
Closest candidates are:
  ReverseDiff.GradientConfig{T}(::AbstractArray{T,N} where N) at /Users/jarrettrevels/.julia/v0.6/ReverseDiff/src/api/Config.jl:35
  ReverseDiff.GradientConfig{T}(::Any) at sysimg.jl:24
  ReverseDiff.GradientConfig{T}(::AbstractArray{T,N} where N, ::Array{ReverseDiff.AbstractInstruction,1}) at /Users/jarrettrevels/.julia/v0.6/ReverseDiff/src/api/Config.jl:35
  ...
Stacktrace:
 [1] ReverseDiff.JacobianConfig(::Array{Float64,1}, ::Array{Float64,1}) at /Users/jarrettrevels/.julia/v0.6/ReverseDiff/src/api/Config.jl:86

Incorrect element type when nesting ForwardDiff inside ReverseDiff

Cross-posted from https://discourse.julialang.org/t/nesting-forwarddiff-inside-reversediff/3684/3

The following (possibly ill-advised) nesting of ForwardDiff inside ReverseDiff fails:

f = (x, w) -> sum(x .* w)
g = (x, w) -> ForwardDiff.gradient(x -> f(x, w), x)
ReverseDiff.jacobian(w -> g(x, w), w)

MethodError: Cannot `convert` an object of type ForwardDiff.Dual{2,Float64} to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.

 in increment_deriv! at /Users/rdeits/.julia/v0.5/ReverseDiff/src/derivatives/propagation.jl:34 [inlined]
 in broadcast_increment_deriv!(::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Array{ForwardDiff.Dual{2,Float64},1}, ::CartesianIndex{1}, ::CartesianIndex{1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/derivatives/propagation.jl:142
 in special_reverse_exec!(::ReverseDiff.SpecialInstruction{Base.#.*,Tuple{Array{ForwardDiff.Dual{2,Float64},1},ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}},ReverseDiff.TrackedArray{ForwardDiff.Dual{2,Float64},Float64,1,Array{ForwardDiff.Dual{2,Float64},1},Array{Float64,1}},Tuple{CartesianIndex{1},CartesianIndex{1}}}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/derivatives/elementwise.jl:473
 in reverse_exec!(::ReverseDiff.SpecialInstruction{Base.#.*,Tuple{Array{ForwardDiff.Dual{2,Float64},1},ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}},ReverseDiff.TrackedArray{ForwardDiff.Dual{2,Float64},Float64,1,Array{ForwardDiff.Dual{2,Float64},1},Array{Float64,1}},Tuple{CartesianIndex{1},CartesianIndex{1}}}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/tape.jl:93
 in reverse_pass!(::Array{ReverseDiff.AbstractInstruction,1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/tape.jl:87
 in seeded_reverse_pass!(::Array{Float64,2}, ::Array{ReverseDiff.TrackedReal{ForwardDiff.Dual{2,Float64},Float64,Void},1}, ::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}, ::ReverseDiff.JacobianTape{##137#138,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},Array{ReverseDiff.TrackedReal{ForwardDiff.Dual{2,Float64},Float64,Void},1}}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/utils.jl:51
 in seeded_reverse_pass! at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/tape.jl:47 [inlined]
 in jacobian!(::Array{Float64,2}, ::ReverseDiff.JacobianTape{##137#138,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},Array{ReverseDiff.TrackedReal{ForwardDiff.Dual{2,Float64},Float64,Void},1}}, ::Array{Float64,1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:122
 in jacobian! at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:106 [inlined]
 in jacobian(::Function, ::Array{Float64,1}, ::ReverseDiff.JacobianConfig{ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},Void}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:25
 in jacobian(::Function, ::Array{Float64,1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:23

Replace Array{TapeReal} with TapeArray

We want this so that custom Array types that don't subtype AbstractArray can work. Also, arrays of distributions with TapeReal parameters.

Possible API: wrap and unwrap to convert Arrays of TapeReals to TapeArrays and vice-versa.

ReverseDiff.gradient incorrect with `if` and independent values.

I think this is a mistake (and maybe related to trying to do ReLu in #77?

When I tried to find the gradient for a function that sometimes doesn't use one of its inputs, instead of just that one input's gradient turning to 0, the gradients for all the elements of the input vector turned to 0:

julia> f(X) = sum([if x > 0; x else 0 end for x in X])
>> f (generic function with 1 method)

julia> ReverseDiff.gradient(f, [2,3,-1])
>> 3-element Array{Int64,1}:
 0
 0
 0

julia> ReverseDiff.gradient(f, [2,3,1])
>> 3-element Array{Int64,1}:
 1
 1
 1

I think this is a mistake? Only the third element in the returned array should be 0, since the other two variables do have a linear effect on the output of f.

Also, though, it currently does get this right for max(0,x), which is another way to implement the same thing (I guess it considers max(0,x) inclusive for x==0):

julia> f(X) = sum([max(0, x) for x in X])
>> f (generic function with 1 method)

julia> ReverseDiff.gradient(f, [2,3,0])
>> 3-element Array{Int64,1}:
 1
 1
 1

julia> ReverseDiff.gradient(f, [2,3,-1])
>> 3-element Array{Int64,1}:
 1
 1
 0

Need better documentation/examples/error messages for using HessianConfig with DiffResult API

The following works for gradient!():

using DiffBase, ReverseDiff

f(x) = sum(sin, x)+prod(tan, x)*sum(sqrt, x);

x = rand(4);

result = DiffBase.GradientResult(x);

rcfg = ReverseDiff.GradientConfig(x);

ReverseDiff.gradient!(result, f, x, rcfg);

DiffBase.value(result)

DiffBase.gradient(result)

However, the Hessian analogue of the above fails:

using DiffBase, ReverseDiff

f(x) = sum(sin, x)+prod(tan, x)*sum(sqrt, x);

x = rand(4);

result = DiffBase.HessianResult(x);

rcfg = ReverseDiff.HessianConfig(x);

ReverseDiff.hessian!(result, f, x, rcfg);

DiffBase.value(result)

DiffBase.gradient(result)

DiffBase.hessian(result)

fma not defined for ReverseDiff.TrackedReal

MWE:

using ReverseDiff
f(x) = fma(x...)
ReverseDiff.gradient(f, zeros(3))

DomainError

julia> f(x)=sum((x-3).^3+2(x+1).^2)

julia> g=ReverseDiff.gradient(f,[.3])

ERROR: DomainError:
in nan_dom_err at ./math.jl:196 [inlined]

Generalizing taping infrastructure (e.g. tools for recording execution traces)

From the Discourse announcement I made a while back:

Internally, ReverseDiff contains facilities for recording execution traces of native Julia code to reusable, compilable instruction tapes, as well as mechanisms for propagating values "forwards" and "backwards" through these tapes. Since these tapes can be analyzed as computation graphs, my hope is that this infrastructure can eventually be rendered useful for non-AD purposes, such as performance optimization, scheduled parallel execution, and constraint programming.

I've opened this issue to track this effort. Right now, TrackedArrays/TrackedReals and the instruction tape are written specifically for differentiation purposes. It will require a lot of thought and experimentation to figure out the right way to restructure this code to be generally useful for non-AD purposes. Eventually, this code might be factored out of ReverseDiff and into a different package.

cc @dpsanders (who might someday find this stuff useful for interval constraint programming).

Macro for univariate functions

Matrix derivatives have complex indexing, but for defining univariate derivatives, it's nice to have a macro. Currently, most cases can be handled with ForwardDiff internally, but a macro that takes a derivative expression (or function?) and defines record! and backprop! would reduce some code repetition.

Provide user-level option for specifying assumptions about the target function

Taking a page from ReverseDiffSparse, we can get away with graph reuse and faster traversal if we can assume the target function is data-independent and representable as a tree. It would be cool if the user could either annotate the function (with a macro or function wrapper type), or pass an option to API methods that would allow us to make these assumptions so that we can manipulate the computation graph more efficiently.

Support the use of `@forward` on non-scalar functions

See, for example, #34.

expression too large

I think I'm hitting a limit to the number of variables I can have near 100,000, e.g.,

$ julia 
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> using ReverseDiff: compile_gradient

julia> compile_gradient(x->norm(x), rand(10000))
(::#301) (generic function with 1 method)

julia> compile_gradient(x->norm(x), rand(100000))
ERROR: syntax: expression too large
 in compile(::ReverseDiff.GradientTape{##7#8,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},ReverseDiff.TrackedReal{Float64,Float64,Void}}) at /home/marius/.julia/v0.5/ReverseDiff/src/api/tape.jl:83
 in compile_gradient(::Function, ::Array{Float64,1}, ::Vararg{Array{Float64,1},N}) at /home/marius/.julia/v0.5/ReverseDiff/src/api/tape.jl:104

Is there any way to go beyond this or is it just impractical?

Actually, where I'd be interested if I can get to is about O(10,000,000)... is that just too crazy to even consider?

Using ReverseDiff as a Knet backend instead of AutoGrad

Hi,

I ported Knet's MNIST example. You can see changes I made by comparing current reversediff branch with the revision two commits before.

Speed Issues

Although I am taking advantage of ReverseDiff's tape compilation feature, currently ReverseDiff is too slow compared to AutoGrad on this MLP example. Here are the results,

60.575495 seconds (1.21 M allocations: 2.609 GB, 0.44% gc time) (ReverseDiff.jl - compiled)
32.504943 seconds (5.49 M allocations: 6.813 GB, 1.42% gc time) (AutoGrad.jl)

Capabilities

ReverseDiff has not the ability to work with KnetArrays.
In Knet, we use ReLU activation (max(0,x) actually), but ReverseDiff is not able to take derivative of this operation currently.
In AutoGrad, we have a loss function and its first input parameter is weights bundle. It can be an array, a tuple, a dictionary or a combination of these structures. Actually, this is good, because we can use same loss function for different networks (e.g. 1 hidden layer MLP and 2 hidden layer MLP, both uses same loss function). Unlike AutoGrad, in ReverseDiff, we need to pass all parameters to the loss function.
I think indexing does not work for ReverseDiff. In neural networks, we heavily take advantage of indexing which brings both speed and memory improvements (the old method we were using was matrix multiplication with one-hot vectors). This is what I'm talking about,

julia> using AutoGrad

julia> using ReverseDiff

julia> using ReverseDiff: gradient

julia> f(x,y,i) = sumabs2(x[i]-y)
f (generic function with 1 method)

julia> gradient(f, (rand(3,4),1,1))
ERROR: MethodError: objects of type Int64 are not callable
 in Type at /mnt/kufs/scratch/ikesen16/.julia/somon/v0.5/ReverseDiff/src/api/Config.jl:46 [inlined]
 in Type at /mnt/kufs/scratch/ikesen16/.julia/somon/v0.5/ReverseDiff/src/api/Config.jl:37 [inlined] (repeats 2 times)
 in gradient(::Function, ::Tuple{Array{Float64,2},Int64,Int64}) at /mnt/kufs/scratch/ikesen16/.julia/somon/v0.5/ReverseDiff/src/api/gradients.jl:22

julia> gf = grad(f)
(::gradfun) (generic function with 1 method)

julia> gf(rand(3,4),1,1)
3×4 Array{Float64,2}:
 -0.916041  0.0  0.0  0.0
  0.0       0.0  0.0  0.0
  0.0       0.0  0.0  0.0

Optional arguments is not supported by ReverseDiff.
In softmax operation, we have a safer version which prevents float overflow and it takes advantage of maximum operation. However, ReverseDiff does not support maximum/minimum functions.

Is there an API-based workaround to avoid recreating the tape when extra arguments change their value?

I was wondering whether there exists a way to avoid having to recreate the pre-recorded tape when extra arguments (hidden within a closure) are changed. In order to explain what I have in mind, here is a simple example script:

using DiffBase, ForwardDiff, ReverseDiff

type Z
  a::Float64
end

g(x, z::Z) = sum(sin, z.a)+prod(tan, x)*sum(sqrt, x)*z.a;

z = Z(3.);

f(x) = g(x, z);

x = rand(2);

fresult = DiffBase.GradientResult(x);
rresult = DiffBase.GradientResult(x);

ForwardDiff.gradient!(fresult, f, x);

const ftape = ReverseDiff.GradientTape(f, x);
const cftape = ReverseDiff.compile(ftape);

ReverseDiff.gradient!(rresult, cftape, x);

DiffBase.value(fresult)
DiffBase.value(rresult)

DiffBase.gradient(fresult)
DiffBase.gradient(rresult)

# Change value of z.a, which is part of the extra input argument z of g(x, z), enclosed by the f(x) closure
# ForwardDiff succeeds, ReverseDiff based on cftape fails

z.a = 3.5;

ForwardDiff.gradient!(fresult, f, x);

ReverseDiff.gradient!(rresult, cftape, x);

DiffBase.value(fresult)
DiffBase.value(rresult)

DiffBase.gradient(fresult)
DiffBase.gradient(rresult)

# Use original closure rather than its associated tape
# Both ForwardDiff and ReverseDiff succeed

ForwardDiff.gradient!(fresult, f, x);

ReverseDiff.gradient!(rresult, f, x);

DiffBase.value(fresult)
DiffBase.value(rresult)

DiffBase.gradient(fresult)
DiffBase.gradient(rresult)

setindex! on TrackedArrays

Currently, TrackedArrays only support being overwritten in a few "blessed" situations (e.g. as inputs to specially-cased functions like A_mul_B!). I believe it may be possible to define record/propagation rules for setindex! on TrackedArray that will allow destructive indexing by preserving the pre-overwrite state in the tape.

In-place assignment on strictly typed vectors does not report error

If using an in-place assignment operator .= from a tracked vector into a typed vector, no error is reported and an incorrect answer is given. Consider the following:

using ReverseDiff

function test1(x)
    out = Vector{Float64}(length(x))
    out .= 3.0.*x
    out .^= 2
    return sqrt(dot(out,out))
end
function test2(x)
    out = Vector{Any}(length(x))
    out .= 3.0.*x
    out .^= 2
    return sqrt(dot(out,out))
end

input = rand(3)
test1(input) == test2(input)
g1 = ReverseDiff.gradient(test1, input)
g2 = ReverseDiff.gradient(test2, input)

Output in Julia 0.5:

julia> test1(input) == test2(input)
    true
julia> g1 = ReverseDiff.gradient(test1, input)
    3-element Array{Float64,1}:
    0.0
    0.0
    0.0
julia> g2 = ReverseDiff.gradient(test2, input)
    3-element Array{Float64,1}:
    0.0936007
    6.53833
    13.5873

Bad interaction with StaticArrays broadcast

I was trying out using ReverseDiff with my package, which relies heavily on StaticArrays. Unfortunately, I ran into an issue with broadcast:

using ReverseDiff: GradientTape
using StaticArrays

function f(x)
    T = eltype(x)
    v = SVector(one(T), zero(T))
    sum(sum(x) * v)
end
f_tape = GradientTape(f, rand(1))

results in the following ambiguity error on 0.6 with StaticArrays 0.0.4 and ReverseDiff 0.1.2:

ERROR: MethodError: broadcast(::Base.#*, ::ReverseDiff.TrackedReal{Float64,Float64,Void}, ::StaticArrays.SVector{2,ReverseDiff.TrackedReal{Float64,Float64,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}}}) is ambiguous. Candidates:
  broadcast(f, a::Union{Number, StaticArrays.StaticArray}...) in StaticArrays at /home/twan/code/RigidBodyDynamics/v0.6/StaticArrays/src/broadcast.jl:8
  broadcast(::Base.#*, x::ReverseDiff.TrackedReal{X,D,O} where O, y::AbstractArray{T,1} where T) where {X, D} in ReverseDiff at /home/twan/code/RigidBodyDynamics/v0.6/ReverseDiff/src/derivatives/elementwise.jl:344
Stacktrace:
 [1] f(::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}) at ./REPL[7]:4
 [2] Type at /home/twan/code/RigidBodyDynamics/v0.6/ReverseDiff/src/api/tape.jl:176 [inlined]
 [3] ReverseDiff.GradientTape(::Function, ::Array{Float64,1}) at /home/twan/code/RigidBodyDynamics/v0.6/ReverseDiff/src/api/tape.jl:175

The relevant methods are:

How do you think this should be resolved?

Adding matrix derivatives

Support for treating matrix multiplication, determinants, etc. Much more efficient than recording tapes for the individual elements.

Dynamic computational graph

Hello all,

First, thanks for your ForwardDiff and ReverseDiff packages !

Maybe that's not the right place to ask this, but would you know whether a dynamic computational graph package exists for Julia, such as chainer, dynet or PyTorch for Python ?

Have you thought of implementing something like this ? Do you know if that's possible with Julia ?

Best,
Emile

ForwardDiff limit

Is there a reason for the 0.5.0 upper limit and could it be lifted? Julia is going to go to v0.6 soon, and currently, the upper limit can cause conflicts between packages such as Optim, NLsolve, ForwardDiff and ReverseDiff.

Will ReverseDiff.jl support branching?

Hi,

I've been poking around this package and I'm very exited.

However, a missing feature I really want is supporting functions with branches. For example, following functions do not work at the moment:

relu(x) = max(0, x)
elu(x) = x > 0 ? x : expm1(x)
log1pexp(x) = x < 18.0 ? log1p(exp(x)) : x < 33.3 ? x + exp(-x) : x

These are important for performance and numerical stability but I'm ignorant of technologies used in this library. So, I'd like to know whether branching will be supported in the future or at least technically feasible.

Support Base.LinearSlow array types

Currently, only array types whose linearindexing trait is LinearFast are supported. With a bit of generalization, we should be able to support LinearSlow arrays as well (like sparse arrays). We'd have to:

define getindex(t, args...) on TrackedArray (this is tricky, since we still want to intercept non-scalar/slicing operations, so we'll have to think a bit about Union dispatching)
generalize the index field on TrackedReal
define linearindexing on TrackedArray as linearindexing(value(t))

...and, of course, add a bunch of tests.

The number of arguments limitation of the forward macro.

The docstring of ReverseDiff.@forward says:

Currently, only length(args) <= 2 is supported.

However, in a simple function, this macro looks working well even for >3 arguments:

julia> import ReverseDiff

julia> ReverseDiff.@forward f(x, y, z, w) = x + 2y + 3z + 4w
ReverseDiff.ForwardOptimize{##hidden_f}(#hidden_f)

julia> f(1.0, 2.0, 3.0, 4.0)
30.0

julia> ∇f = ReverseDiff.compile_gradient(x -> f(x[1], x[2], x[3], x[4]), zeros(4))
(::#301) (generic function with 1 method)

julia> ∇f(zeros(4), ones(4))
4-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0

Can I expect this works always or is there any pitfall in the case?

Programmatically defining the model

I'm finding the MNIST example on the MNIST branch very insightful.

In the example, the structure of model is hardcoded. What is the best way, in the ReverseDiff framework, to programmatically define the structure? For example, if I wanted the user to be able to specify at runtime the number of hidden layers and the size and activation functions of each.

Thanks!

Perturbation Confusion (Nested Differentiation Bug)

I already mention this in ReverseDiff's documentation, but I'm opening this issue to increase visibility and track progress.

ReverseDiff has the same perturbation confusion issue as ForwardDiff. My plan to solve this in ReverseDiff is the same as my plan to solve this in ForwardDiff; the tag will be added to the TrackedArray/TrackedReal types (and possibly the tape types, I'm not sure yet).

Potential bug all_results[1].value != fun(x)

I have runt into what I believe is a bug in ReverseDiff
I have made a gist with a program that should generate a weird behavior, along with the output of the program. I'm simply trying to fit a one layer neural network to some data. The function I differentiate is lossfun. When running the program, the output of all_results[1].value which I assume should be equal to lossfun, in fact diverges from lossfun after a while, having started out similar.

Program that reproduce bug, along with output: https://gist.github.com/baggepinnen/5c413f9ca12d5853672fd4e2d8dbdaea

Wrong gradient when using convert

Julia Version 0.6.0

julia> f(x) = convert(Float64, 2x[1])
f (generic function with 1 method)

julia> gradient(f, [1])
1-element Array{Int64,1}:
 0

julia> f(x) = 2x[1]
f (generic function with 1 method)

julia> gradient(f, [1])
1-element Array{Int64,1}:
 2

ReverseDiff performance

(apologies if this is not the right place for such discussions)

What exactly can we expect of the performance of ReverseDiff? In particular, under what conditions exactly will it produce the gradient of a R^N -> R function in a time comparable to the evaluation of the function (say within a factor of 3-4)?

Below is a function (a simple arithmetic loop over an array), courtesy of @cortner, for which ReverseDiff is consistently ~30 times slower than the function on my machine. Moreoever, the recording and compilation step takes a large amount of time. I thought this was because the FLOPS/byte ratio is too low, as the manual suggests, but increasing it does not change the factor of ~30.

using ReverseDiff: GradientTape, GradientConfig, gradient, gradient!, compile

φ1(r) = r^2 + r^4 + exp(-r)
φ2(r) = -0.1 * r^3

function F(x)
    N = length(x)
    f = φ1(x[2]-x[1])
    for i = 3:N
        f += φ1(x[i]-x[i-1]) + φ2(x[i] - x[i-2])
    end
    return f
end

function benchmark()
    for N in (10000, 20000, 40000)

        # pre-record a GradientTape for `F` using inputs with Float64 elements
        @time const f_tape = GradientTape(F, rand(N))
        # compile `f_tape` into a more optimized representation
        @time const compiled_f_tape = compile(f_tape)
        # some inputs and work buffers to play around with
        gresult = zeros(N)
        x = rand(N)
        println("N = $N")
        println(" > Time for F")
        for n = 1:3
            @time F(x)
        end
        println(" > Time for DF")
        for n = 1:3
            @time gradient!(gresult, compiled_f_tape, x)
        end
    end
end
benchmark()

Gradient Tape with random number

Would it be possible to add the feature of letting GradientTape record a tape with an "active" rand() function in it? Currently, if I run twice the same tape such as

f = (a,b) -> sum((a+rand()).*(b+rand()))
input = (rand(4),rand(4))
f_tape = ReverseDiff.GradientTape(f,input)
output = similar.(input)
print("Out #1, tape = ")
print(ReverseDiff.gradient!(output,f_tape,input),"\n")
print("Out #2, tape = ")
print(ReverseDiff.gradient!(output,f_tape,input),"\n")

print("Out #1, no tape = ")
print(ReverseDiff.gradient!(output,f,input),"\n")
print("Out #2, no tape = ")
print(ReverseDiff.gradient!(output,f,input),"\n")

returns

Out #1, tape = ([1.15379, 0.991579, 0.564506, 0.968936], [1.16635, 1.27959, 0.955812, 0.927267])
Out #2, tape = ([1.15379, 0.991579, 0.564506, 0.968936], [1.16635, 1.27959, 0.955812, 0.927267])
Out #1, no tape = ([0.909424, 0.747213, 0.32014, 0.72457], [1.50359, 1.61683, 1.29305, 1.26451])
Out #2, no tape = ([1.76153, 1.59931, 1.17224, 1.57667], [0.904445, 1.01769, 0.693907, 0.665362])

One solution is to provide the rand() as an external input to my function, but is it a good way to go, since I don't need the derivative wrt this input?

Depwarn when mix ReverseDiff with ForwardDiff

I have the following code

using ReverseDiff

m,g = 1, 9.8
t = 1
p = [5.,6]
q = [1.,2]

L(t,q,q̇) = m/2 * dot(q̇,q̇) - m*g*q[2]

function Legendre_transformation(F, w)
    wv = a->ForwardDiff.gradient(F, a)
    z = zeros(w)
    M = ForwardDiff.jacobian(wv, z)
    b = wv(z)
    v = M\(w-b)
    w'v - F(v)
end

function Lagrangian2Hamiltonian(Lagrangian, t, q, p)
    L = q̇ -> Lagrangian(t, q, q̇)
    Legendre_transformation(L, p)
end

H = (q, p)->Lagrangian2Hamiltonian(L, t, q, p)

ṗ(p, q) = ReverseDiff.gradient(a->-H(a, p), q)
ṗ(p, q)

that will produce the following depwarn

WARNING: `invoke(f, (types...), ...)` is deprecated, use `invoke(f, Tuple{types...}, ...)` instead                                                             
Stacktrace:                            
 [1] depwarn(::String, ::Symbol) at ./deprecated.jl:70                                                                                                         
 [2] -(::ForwardDiff.Dual{2,ForwardDiff.Dual{2,Float64}}, ::ReverseDiff.TrackedReal{Float64,Float64,Void}) at /home/arch/.julia/v0.6/ReverseDiff/src/derivative
s/scalars.jl:20
...

(Example for) result of type DiffBase.GradientResult

If not mistaken, the function gradient!(result, f, input) accepts a result input argument of type Tuple{Array{Float64,2},Array{Float64,2}}. Is it also possible to accept a result input argument of type DiffBase.GradientResult, in a way that ForwardDiff does?

I checked the documentation, where it seems that result can be of type DiffBase.DiffResult, and wasn't sure what is the relation between DiffBase.DiffResult and DiffBase.GradientResult.

Moreover, would it be possible to provide a simple example of calling gradient!() using a result of type DiffBase.GradientResult or DiffBase.DiffResult?

Our higher-order function optimizations don't work if the passed-in function closes over TraceReals

julia> using ReverseDiffPrototype; const RDP = ReverseDiffPrototype;

julia> function f(x)
           a = one(eltype(x))
           return sum(map(i -> a * i, x))
       end
f (generic function with 1 method)

julia> RDP.gradient!(zeros(3), f, rand(3)) # wrong, should be all ones
3-element Array{Float64,1}:
 0.823614
 0.459833
 0.581877

Note that we could replace a = one(eltype(x)) with something like a = x[1], and the bug would persist.

The closed-over TraceReal is causing each application of the closure to be written to the trace, laying down incorrect nodes that get accumulated in the reverse pass. It's a perturbation-confusion-esque bug, but one that we're inducing on ourselves instead of it being induced by competing differentiation operators.

Not only does it give the wrong answer, but it also kind of defeats the point of the optimization (to elide the tracing of the closure application).

I'm unsure how we can resolve this short of scrubbing the closure for TraceReals, just as we'd have to do to fix JuliaDiff/ForwardDiff.jl#83.

@mlubin any ideas?