juliadiff / reversediff.jl Goto Github PK
View Code? Open in Web Editor NEWReverse Mode Automatic Differentiation for Julia
License: Other
Reverse Mode Automatic Differentiation for Julia
License: Other
Right now, if I have a function f(a, b, c)
and I only want to create a function which returns the gradient w.r.t. to a
and b
, I have two options:
∇f(a, b, c) = ReverseDiff.gradient((x, y) -> f(x, y, c), (a, b)))
∇f! = ReverseDiff.compile_gradient(f, (a, b, c))
, and just ignore the c
gradient that will pop outThe former has to re-record the function for every call, while the latter wastes some computation differentiating w.r.t. c
.
We should support something akin to Tensorflow's placeholders for the pre-recorded API, allowing you to drop in updatable parameters that aren't differentiated against. This can be accomplished by recording the tape as normal, and then "turning off" differentiation on the selected parameters (the idiom for that currently is to set the tape to NULL_TAPE
, but I'm going to play around with it). Some refactoring should probably be done to get the most out of this change performance-wise (e.g., allow the instantiation of a TrackedArray
with deriv == nothing
).
As for the API, I can think of two different paths we could take:
wrt
function, e.g. ReverseDiff.compile_gradient(f, (wrt(a), wrt(b), c))
param
function, e.g. ReverseDiff.compile_gradient(f, (a, b, param(c)))
I tried to compute the gradient of a function with quadratic form. However, it failed with ambiguity error as follows:
import ReverseDiff
const A = [1.0 2.0; 2.0 5.0]
quadratic(x) = x' * A * x
ReverseDiff.gradient(quadratic, ones(2))
MethodError: *(::RowVector{ReverseDiff.TrackedReal{Float64,Float64,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}},ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}}, ::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}) is ambiguous. Candidates:
*(x::AbstractArray{T,2} where T, y::ReverseDiff.TrackedArray{V,D,N,VA,DA} where DA where VA where N) where {V, D} in ReverseDiff at /Users/kenta/.julia/v0.6/ReverseDiff/src/derivatives/linalg/arithmetic.jl:193
*(x::AbstractArray, y::ReverseDiff.TrackedArray{V,D,N,VA,DA} where DA where VA where N) where {V, D} in ReverseDiff at /Users/kenta/.julia/v0.6/ReverseDiff/src/derivatives/linalg/arithmetic.jl:193
*(rowvec::RowVector{T,V} where V<:(AbstractArray{T,1} where T), vec::AbstractArray{T,1}) where T<:Real in Base.LinAlg at linalg/rowvector.jl:170
Possible fix, define
*(::RowVector{ReverseDiff.TrackedReal{V,D,ReverseDiff.TrackedArray{V,D,1,VA,DA}},V} where V<:(AbstractArray{T,1} where T), ::ReverseDiff.TrackedArray{V,D,1,VA,DA})
Stacktrace:
[1] * at ./operators.jl:424 [inlined]
[2] quadratic(::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}) at ./In[32]:2
[3] Type at /Users/kenta/.julia/v0.6/ReverseDiff/src/api/tape.jl:199 [inlined]
[4] gradient(::Function, ::Array{Float64,1}, ::ReverseDiff.GradientConfig{ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}}) at /Users/kenta/.julia/v0.6/ReverseDiff/src/api/gradients.jl:22 (repeats 2 times)
I think matrix multiplication is already supported. I'm not sure whether it is an unsupported feature or a kind of bug, so let me file an issue here.
I'm using ReverseDiff.jl v0.1.4 on Julia 0.6.
I did not include a scalar derivative
function in the ReverseDiff API on purpose, since ForwardDiff is a much better tool for such a method. However, we do allow users to pass in tuples of arguments to API methods. In this case, we should allow elements of the tuples to be scalars. For example, see here; it's unintuitive (and poor style for the objective code) that b
should need to be an array and not a scalar.
We should explore the use of techniques such as edge-pushing, graph coloring, etc. to discover and exploit sparsity patterns in ReverseDiff's second-order derivative computations.
Maybe sparsity optimization could be employed automagically whenever nested tapes are encountered?
Removing it allows the latest ForwardDiff again:
julia> Pkg.rm("ReverseDiff")
INFO: Installing CommonSubexpressions v0.0.1
INFO: Upgrading ForwardDiff: v0.4.2 => v0.5.0
INFO: Installing RealInterface v0.0.3
INFO: Removing FunctionWrappers v0.1.0
INFO: Removing ReverseDiff v0.1.5
INFO: Package database updated
julia> x_seed = [1.0]
1-element Array{Float64,1}:
1.0
julia> f(x) = x
f (generic function with 1 method)
julia> gcfg = ReverseDiff.GradientConfig(x_seed)
ReverseDiff.GradientConfig
julia> g! = (x, out) -> ReverseDiff.gradient!(out, f, x, gcfg)
(::#17) (generic function with 1 method)
julia> g!([1.0], [0.0])
1-element Array{Float64,1}:
0.0
julia> f(x) = 1.0*x
f (generic function with 1 method)
julia> g!([1.0], [0.0])
1-element Array{Float64,1}:
1.0
julia> f(x) = 3.0*x
f (generic function with 1 method)
julia> g!([1.0], [0.0])
1-element Array{Float64,1}:
3.0
Granted, not the most interesting function in the world, but there must be something that you're missing here, since it doesn't think the function changes with x
.
p = randn(2,3)
ReverseDiff.@forward f(p) = exp.(p) ./ sum(exp.(p), 2) # softmax
f! = ReverseDiff.compile_gradient(x -> sum(f(x)), similar(p))
f!(similar(p), p)
Gives the following error
ERROR: MethodError: no method matching broadcast_deriv_increment!(::Array{ReverseDiff.TrackedReal{Float64,Float64,Void},2}, ::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}, ::Void)
Closest candidates are:
broadcast_deriv_increment!(::AbstractArray{T,N}, ::Any) at /home/dom/.julia/v0.5/ReverseDiff/src/derivatives/elementwise.jl:632
broadcast_deriv_increment!(::Any, ::Any, ::Ref{T}) at /home/dom/.julia/v0.5/ReverseDiff/src/derivatives/elementwise.jl:569
broadcast_deriv_increment!(::AbstractArray{T,N}, ::Any, ::AbstractArray{T,N}) at /home/dom/.julia/v0.5/ReverseDiff/src/derivatives/elementwise.jl:673
...
in special_reverse_exec!(::ReverseDiff.SpecialInstruction{Base.#./,Tuple{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},Array{ReverseDiff.TrackedReal{Float64,Float64,Void},2}},ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},Tuple{Array{Float64,2},Void}}) at /home/dom/.julia/v0.5/ReverseDiff/src/derivatives/elementwise.jl:465
in reverse_exec!(::ReverseDiff.SpecialInstruction{Base.#./,Tuple{ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},Array{ReverseDiff.TrackedReal{Float64,Float64,Void},2}},ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},Tuple{Array{Float64,2},Void}}) at /home/dom/.julia/v0.5/ReverseDiff/src/tape.jl:74
in (::##33#34)() at /home/dom/.julia/v0.5/ReverseDiff/src/api/tape.jl:80
in seeded_reverse_pass!(::Array{Float64,2}, ::ReverseDiff.TrackedReal{Float64,Float64,Void}, ::ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}, ::ReverseDiff.Compiled{ReverseDiff.GradientTape{##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void}},##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void},##31#32,##33#34}) at /home/dom/.julia/v0.5/ReverseDiff/src/api/utils.jl:30
in seeded_reverse_pass! at /home/dom/.julia/v0.5/ReverseDiff/src/api/tape.jl:41 [inlined]
in gradient!(::Array{Float64,2}, ::ReverseDiff.Compiled{ReverseDiff.GradientTape{##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void}},##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void},##31#32,##33#34}, ::Array{Float64,2}) at /home/dom/.julia/v0.5/ReverseDiff/src/api/gradients.jl:80
in (::ReverseDiff.##301#302{ReverseDiff.Compiled{ReverseDiff.GradientTape{##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void}},##29#30,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}},ReverseDiff.TrackedReal{Float64,Float64,Void},##31#32,##33#34}})(::Array{Float64,2}, ::Array{Float64,2}) at /home/dom/.julia/v0.5/ReverseDiff/src/api/tape.jl:100
If I do ReverseDiff.@forward f(p) = exp.(p) ./ sum(exp.(p))
there's no error. So, I'm guessing it's something to do with adding the dimension to sum over.
I'm confused by how .^
works.
using ReverseDiff
x = [-1. -2; -1 -2]
y = [1.; 2]
w = [2.; 3]
b = [-3.]
ReverseDiff.@forward sigmoid(x) = 1. ./ (1. + exp(-x))
#= mse(a, y) = mean((a - y) .* (a - y)) =#
mse(a, y) = mean((a - y) .^ 2)
f1(x, y, w, b) = mse(sigmoid(x*w .+ b), y)
out = f1(x, y, w, b)
println(out)
inputs = (x, y, w, b)
result = map(similar, inputs)
f1! = ReverseDiff.compile_gradient(f1, result)
@time f1!(result, inputs)
@time f1!(result, inputs)
for r in result
println(r)
end
#= Expected output =#
#= =#
#= [array([[ -3.34017280e-05, -5.01025919e-05], =#
#= [ -6.68040138e-05, -1.00206021e-04]]), array([[ 0.9999833], =#
#= [ 1.9999833]]), array([[ 5.01028709e-05], =#
#= [ 1.00205742e-04]]), array([ -5.01028709e-05])] =#
(a - y) .^ 2
is calling https://github.com/JuliaDiff/ReverseDiff.jl/tree/master/src/derivatives#L512 which has a call to log
, so, if some values are negative it'll throw a DomainError
. I'm wondering what's going on here because on the surface we're just squaring values. mse(a, y) = mean((a - y) .* (a - y))
works as expected.
Originally, I thought we'd need a special macro directive for this, but I now believe this can just work by allowing the user to pass in GPU-backed arrays as input to the API methods. In other words, ReverseDiff won't provide a GPU-backed array type; it will compose with the GPU-backed array types provided by other well-established libraries.
For now, Julia's ArrayFire wrapper should be sufficient to start playing around with this functionality for a subset of ReverseDiff's methods - at least enough to build some proof-of-concept examples. Eventually, I'd like to support GPUArrays.jl, though it's probably not productive to work on that until GPUArrays has solid linear algebra coverage out-of-the-box.
In some cases, existing derivative definitions might "just work" for GPU-backed arrays (though they might have poor performance). In many cases, we'll likely have to dispatch on the array type to write special derivative methods for GPU-backed arrays.
cc the GPU-related folks: @SimonDanisch @vchuravy @maleadt @ranjanan @MikeInnes
Hi Jared,
Awesome project here! I do think it's lacking a bit in the example realm. The gradient.jl
example is nice, shows the API and what not, but, I'm still not sure how it fits in a more complicated example.
Maybe a neural net training on MNIST would be a good starting point. I've tried to do this myself but ran into some issues.
https://github.com/HIPS/autograd/tree/master/examples has a bunch of good ones.
I'm testing out some neural nets stuff and I found an error when doing softmax function.
https://gist.github.com/domluna/bf1d3061244e7d7d9a3da467a6614006
I commented out quite a bit to try get a minimal example. The softmax function isn't correct since sum over dimensions isn't yet implemented so changed it to just sum(exped)
to try to make it work.
ERROR: LoadError: MethodError: no method matching eachindex(::ReverseDiffPrototype.TraceReal{Float64,Float64})
Closest candidates are:
eachindex(::Tuple) at tuple.jl:19
eachindex(::Tuple, ::Tuple...) at tuple.jl:22
eachindex(::AbstractArray{T,1}) at abstractarray.jl:679
...
in propagate_adjoint!(::ReverseDiffPrototype.TraceReal{Float64,Float64}, ::ReverseDiffPrototype.TraceReal{Float64,Float64}) at /Users/dluna/.julia/v0.5/ReverseDiffPrototype/src/backprop.jl:82
in special_backprop_step!(::Base.Broadcast.#broadcast, ::Tuple{Array{ReverseDiffPrototype.TraceReal{Float64,Float64},2},ReverseDiffPrototype.TraceReal{Float64,Float64}}, ::ReverseDiffPrototype.TraceReal{Float64,Float64}, ::ReverseDiffPrototype.TraceReal{Float64,Float64}) at /Users/dluna/.julia/v0.5/ReverseDiffPrototype/src/backprop.jl:74
in backprop_step!(::ReverseDiffPrototype.TraceNode{Base.Broadcast.#broadcast,Tuple{Array{ReverseDiffPrototype.TraceReal{Float64,Float64},2},ReverseDiffPrototype.TraceReal{Float64,Float64}},ReverseDiffPrototype.TraceReal{Float64,Float64},ReverseDiffPrototype.TraceReal{Float64,Float64}}) at /Users/dluna/.julia/v0.5/ReverseDiffPrototype/src/backprop.jl:20
in backprop!(::Array{ReverseDiffPrototype.TraceNode,1}) at /Users/dluna/.julia/v0.5/ReverseDiffPrototype/src/backprop.jl:14
in gradient(::##243#245{Array{Float64,2},Array{Float64,2},Array{Int64,1}}, ::Array{Float64,2}, ::Array{ReverseDiffPrototype.TraceReal{Float64,Float64},2}) at /Users/dluna/.julia/v0.5/ReverseDiffPrototype/src/api.jl:8
in nn_backward(::Array{Float64,2}, ::Array{Float64,2}, ::Array{Float64,2}, ::Array{Int64,1}) at /Users/dluna/.julia/v0.5/Butterfly/src/rdp.jl:34
in include_from_node1(::String) at ./loading.jl:426
in include_from_node1(::String) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
while loading /Users/dluna/.julia/v0.5/Butterfly/src/rdp.jl, in expression starting on line 39
The following snippet:
f(w, x, b) = sum(w .* x .+ b)
ReverseDiff.gradient(f, ([1.0, 1.0, 1.0], [2.0, 2.0, 2.0], 3.0))
gives this error:
ERROR: MethodError: objects of type Float64 are not callable
Stacktrace:
[1] map(::ReverseDiff.##297#298{Float64,Array{ReverseDiff.AbstractInstruction,1}}, ::Tuple{Array{Float64,1},Array{Float64,1},Float64}) at ./tuple.jl:160
[2] Type at /home/slipslop/.julia/v0.6/ReverseDiff/src/api/Config.jl:46 [inlined]
[3] Type at /home/slipslop/.julia/v0.6/ReverseDiff/src/api/Config.jl:37 [inlined] (repeats 2 times)
[4] gradient(::Function, ::Tuple{Array{Float64,1},Array{Float64,1},Float64}) at /home/slipslop/.julia/v0.6/ReverseDiff/src/api/gradients.jl:22
Note that this error dissapears if I remove b
which is the only scalar in the expression.
Julia 0.6-rc2
ReverseDiff.jl 820690c (latest master)
Compiled tapes work fine from the REPL, but trying to construct one inside a function results in everyone's favorite 0.6 error:
julia> using ReverseDiff
julia> function foo(f, x)
tape = ReverseDiff.compile(ReverseDiff.GradientTape(f, x))
result = similar(x)
ReverseDiff.gradient!(result, tape, x)
end
foo (generic function with 1 method)
julia> g = y -> sum(x -> x^3 + x^2 - x - sin(x), y)
(::#3) (generic function with 1 method)
julia> foo(g, rand(10))
ERROR: MethodError: no method matching forward_pass!(::ReverseDiff.CompiledTape{Symbol("##657"),ReverseDiff.GradientTape{##3#5,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},ReverseDiff.TrackedReal{Float64,Float64,Void}}})
The applicable method may be too new: running in world age 21811, while current world is 21814.
Closest candidates are:
forward_pass!(::ReverseDiff.CompiledTape{Symbol("##657"),ReverseDiff.GradientTape{##3#5,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},ReverseDiff.TrackedReal{Float64,Float64,Void}}}) at /Users/rdeits/.julia/v0.6/ReverseDiff/src/api/tape.jl:84 (method too new to be called from this world context.)
forward_pass!(::ReverseDiff.CompiledTape{Symbol("##656"),ReverseDiff.GradientTape{#f,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},ReverseDiff.TrackedReal{Float64,Float64,Void}}}) at /Users/rdeits/.julia/v0.6/ReverseDiff/src/api/tape.jl:84
forward_pass!(::Array{ReverseDiff.AbstractInstruction,1}) at /Users/rdeits/.julia/v0.6/ReverseDiff/src/tape.jl:76
...
Stacktrace:
[1] seeded_forward_pass! at /Users/rdeits/.julia/v0.6/ReverseDiff/src/api/tape.jl:42 [inlined]
[2] gradient!(::Array{Float64,1}, ::ReverseDiff.CompiledTape{Symbol("##657"),ReverseDiff.GradientTape{##3#5,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},ReverseDiff.TrackedReal{Float64,Float64,Void}}}, ::Array{Float64,1}) at /Users/rdeits/.julia/v0.6/ReverseDiff/src/api/gradients.jl:79
[3] foo(::Function, ::Array{Float64,1}) at ./REPL[2]:4
julia> versioninfo()
Julia Version 0.6.0-rc2.0
Commit 68e911be53 (2017-05-18 02:31 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin13.4.0)
CPU: Intel(R) Core(TM) i7-2860QM CPU @ 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, sandybridge)
(this is with ReverseDiff master)
I'm probably doing something dumb here:
using ReverseDiff
using ReverseDiff: GradientConfig, gradient
f(W, b, x) = W*x + b
softmax(xs) = exp.(xs)/sum(exp(xs))
mse(x, y) = mean((x .- y).^2)
function net(W, b, x, y)
ŷ = softmax(f(W, b, x))
mse(ŷ, y)
end
W = randn(5, 10)
b = randn(5)
x = rand(10)
y = rand(5)
inputs = (W, b, x, y)
net(inputs...)
gradient(f, inputs)
Just realized I misinterpreted the point of #5, so I'm superceding that issue with this one. We should probably have a mechanism for user-defined derivatives that works for n-ary, multivariate functions.
When trying
using ReverseDiff
f(a, b) = sum(a' * b + a * b')
a, b = rand(100, 100), rand(100, 100)
inputs = (a, b)
gradient(f, (a,b))
I receive an error:
ERROR: MethodError: no method matching gradient(::#f, ::Tuple{Array{Float64,2},Array{Float64,2}})
Closest candidates are:
gradient(::Function) at /home/paulo/.julia/v0.6/Calculus/src/derivative.jl:22
gradient(::Function, ::Union{Array{T<:Number,1}, T<:Number}) where T<:Number at /home/paulo/.julia/v0.6/Calculus/src/derivative.jl:17
gradient(::Function, ::Union{Array{T<:Number,1}, T<:Number}, ::Symbol) where T<:Number at /home/paulo/.julia/v0.6/Calculus/src/derivative.jl:17
...
Stacktrace:
[1] macro expansion at ./REPL.jl:97 [inlined]
[2] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73
Here are my packages:
julia> Pkg.status()
4 required packages:
- Distances 0.4.1
- Distributions 0.14.0
- MLBase 0.6.1
- ReverseDiff 0.1.4
35 additional packages:
- AbstractFFTs 0.1.0
- BinDeps 0.6.0
- Calculus 0.2.2
- CatViews 0.1.0
- Compat 0.26.0
- DataArrays 0.5.3
- DataFrames 0.10.0
- DataStructures 0.5.3
- DiffBase 0.2.0
- FFTW 0.0.2
- FileIO 0.4.2
- ForwardDiff 0.4.2
- FunctionWrappers 0.1.0
- GZip 0.3.0
- Iterators 0.3.1
- Learn 0.0.0- master (unregistered)
- LearnBase 0.1.6
- LossFunctions 0.1.0
- MLDataPattern 0.1.2
- MLDataUtils 0.2.0
- MLLabelUtils 0.1.4
- MappedArrays 0.0.7
- NaNMath 0.2.5
- PDMats 0.7.0
- QuadGK 0.1.2
- RecipesBase 0.2.0
- Reexport 0.0.3
- Rmath 0.1.7
- SHA 0.3.3
- ShowItLikeYouBuildIt 0.0.1
- SortingAlgorithms 0.1.1
- SpecialFunctions 0.1.1
- StatsBase 0.17.0
- StatsFuns 0.5.0
- URIParser 0.1.8
Our implementation will break if you try to perform concurrent differentiations over the same function, since the TraceReal
tag will always point to the same Trace
.
Many of ReverseDiff's linear algebra derivative implementations are based on a wonderful reference paper by Mike B. Giles. It turns out that, in his infinite generosity, Giles released an extended version of the paper shortly after the original paper, which contains even more derivative definitions (including some factorizations).
We should implement the rest of these in ReverseDiff for the sake of completeness.
(Thanks to @tpapp for exposing me to the extended version of the Giles paper in his ForwardDiff issue.)
This is actually a bug in Julia, not ReverseDiff, but I wanted to open this issue to track the ReverseDiff fix here. The corresponding issue in Base is JuliaLang/julia#20200.
Here's an example of how this bug affects ReverseDiff:
julia> ReverseDiff.JacobianConfig(rand(4), rand(4))
ERROR: MethodError: no method matching ReverseDiff.GradientConfig(::Array{Float64,1}, ::Array{Float64,1})
Closest candidates are:
ReverseDiff.GradientConfig{T}(::AbstractArray{T,N} where N) at /Users/jarrettrevels/.julia/v0.6/ReverseDiff/src/api/Config.jl:35
ReverseDiff.GradientConfig{T}(::Any) at sysimg.jl:24
ReverseDiff.GradientConfig{T}(::AbstractArray{T,N} where N, ::Array{ReverseDiff.AbstractInstruction,1}) at /Users/jarrettrevels/.julia/v0.6/ReverseDiff/src/api/Config.jl:35
...
Stacktrace:
[1] ReverseDiff.JacobianConfig(::Array{Float64,1}, ::Array{Float64,1}) at /Users/jarrettrevels/.julia/v0.6/ReverseDiff/src/api/Config.jl:86
Cross-posted from https://discourse.julialang.org/t/nesting-forwarddiff-inside-reversediff/3684/3
The following (possibly ill-advised) nesting of ForwardDiff inside ReverseDiff fails:
f = (x, w) -> sum(x .* w)
g = (x, w) -> ForwardDiff.gradient(x -> f(x, w), x)
ReverseDiff.jacobian(w -> g(x, w), w)
MethodError: Cannot `convert` an object of type ForwardDiff.Dual{2,Float64} to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods.
in increment_deriv! at /Users/rdeits/.julia/v0.5/ReverseDiff/src/derivatives/propagation.jl:34 [inlined]
in broadcast_increment_deriv!(::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Array{ForwardDiff.Dual{2,Float64},1}, ::CartesianIndex{1}, ::CartesianIndex{1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/derivatives/propagation.jl:142
in special_reverse_exec!(::ReverseDiff.SpecialInstruction{Base.#.*,Tuple{Array{ForwardDiff.Dual{2,Float64},1},ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}},ReverseDiff.TrackedArray{ForwardDiff.Dual{2,Float64},Float64,1,Array{ForwardDiff.Dual{2,Float64},1},Array{Float64,1}},Tuple{CartesianIndex{1},CartesianIndex{1}}}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/derivatives/elementwise.jl:473
in reverse_exec!(::ReverseDiff.SpecialInstruction{Base.#.*,Tuple{Array{ForwardDiff.Dual{2,Float64},1},ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}},ReverseDiff.TrackedArray{ForwardDiff.Dual{2,Float64},Float64,1,Array{ForwardDiff.Dual{2,Float64},1},Array{Float64,1}},Tuple{CartesianIndex{1},CartesianIndex{1}}}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/tape.jl:93
in reverse_pass!(::Array{ReverseDiff.AbstractInstruction,1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/tape.jl:87
in seeded_reverse_pass!(::Array{Float64,2}, ::Array{ReverseDiff.TrackedReal{ForwardDiff.Dual{2,Float64},Float64,Void},1}, ::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}, ::ReverseDiff.JacobianTape{##137#138,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},Array{ReverseDiff.TrackedReal{ForwardDiff.Dual{2,Float64},Float64,Void},1}}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/utils.jl:51
in seeded_reverse_pass! at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/tape.jl:47 [inlined]
in jacobian!(::Array{Float64,2}, ::ReverseDiff.JacobianTape{##137#138,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},Array{ReverseDiff.TrackedReal{ForwardDiff.Dual{2,Float64},Float64,Void},1}}, ::Array{Float64,1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:122
in jacobian! at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:106 [inlined]
in jacobian(::Function, ::Array{Float64,1}, ::ReverseDiff.JacobianConfig{ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},Void}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:25
in jacobian(::Function, ::Array{Float64,1}) at /Users/rdeits/.julia/v0.5/ReverseDiff/src/api/jacobians.jl:23
We want this so that custom Array types that don't subtype AbstractArray can work. Also, arrays of distributions with TapeReal parameters.
Possible API: wrap
and unwrap
to convert Arrays of TapeReals to TapeArrays and vice-versa.
I think this is a mistake (and maybe related to trying to do ReLu in #77?
When I tried to find the gradient for a function that sometimes doesn't use one of its inputs, instead of just that one input's gradient turning to 0, the gradients for all the elements of the input vector turned to 0:
julia> f(X) = sum([if x > 0; x else 0 end for x in X])
>> f (generic function with 1 method)
julia> ReverseDiff.gradient(f, [2,3,-1])
>> 3-element Array{Int64,1}:
0
0
0
julia> ReverseDiff.gradient(f, [2,3,1])
>> 3-element Array{Int64,1}:
1
1
1
I think this is a mistake? Only the third element in the returned array should be 0, since the other two variables do have a linear effect on the output of f
.
Also, though, it currently does get this right for max(0,x)
, which is another way to implement the same thing (I guess it considers max(0,x)
inclusive for x==0
):
julia> f(X) = sum([max(0, x) for x in X])
>> f (generic function with 1 method)
julia> ReverseDiff.gradient(f, [2,3,0])
>> 3-element Array{Int64,1}:
1
1
1
julia> ReverseDiff.gradient(f, [2,3,-1])
>> 3-element Array{Int64,1}:
1
1
0
The following works for gradient!():
using DiffBase, ReverseDiff
f(x) = sum(sin, x)+prod(tan, x)*sum(sqrt, x);
x = rand(4);
result = DiffBase.GradientResult(x);
rcfg = ReverseDiff.GradientConfig(x);
ReverseDiff.gradient!(result, f, x, rcfg);
DiffBase.value(result)
DiffBase.gradient(result)
However, the Hessian analogue of the above fails:
using DiffBase, ReverseDiff
f(x) = sum(sin, x)+prod(tan, x)*sum(sqrt, x);
x = rand(4);
result = DiffBase.HessianResult(x);
rcfg = ReverseDiff.HessianConfig(x);
ReverseDiff.hessian!(result, f, x, rcfg);
DiffBase.value(result)
DiffBase.gradient(result)
DiffBase.hessian(result)
MWE:
using ReverseDiff
f(x) = fma(x...)
ReverseDiff.gradient(f, zeros(3))
julia> f(x)=sum((x-3).^3+2(x+1).^2)
julia> g=ReverseDiff.gradient(f,[.3])
ERROR: DomainError:
in nan_dom_err at ./math.jl:196 [inlined]
From the Discourse announcement I made a while back:
Internally, ReverseDiff contains facilities for recording execution traces of native Julia code to reusable, compilable instruction tapes, as well as mechanisms for propagating values "forwards" and "backwards" through these tapes. Since these tapes can be analyzed as computation graphs, my hope is that this infrastructure can eventually be rendered useful for non-AD purposes, such as performance optimization, scheduled parallel execution, and constraint programming.
I've opened this issue to track this effort. Right now, TrackedArray
s/TrackedReal
s and the instruction tape are written specifically for differentiation purposes. It will require a lot of thought and experimentation to figure out the right way to restructure this code to be generally useful for non-AD purposes. Eventually, this code might be factored out of ReverseDiff and into a different package.
cc @dpsanders (who might someday find this stuff useful for interval constraint programming).
Matrix derivatives have complex indexing, but for defining univariate derivatives, it's nice to have a macro. Currently, most cases can be handled with ForwardDiff internally, but a macro that takes a derivative expression (or function?) and defines record!
and backprop!
would reduce some code repetition.
Taking a page from ReverseDiffSparse, we can get away with graph reuse and faster traversal if we can assume the target function is data-independent and representable as a tree. It would be cool if the user could either annotate the function (with a macro or function wrapper type), or pass an option to API methods that would allow us to make these assumptions so that we can manipulate the computation graph more efficiently.
See, for example, #34.
I think I'm hitting a limit to the number of variables I can have near 100,000, e.g.,
$ julia
_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: http://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.5.0 (2016-09-19 18:14 UTC)
_/ |\__'_|_|_|\__'_| | Official http://julialang.org/ release
|__/ | x86_64-pc-linux-gnu
julia> using ReverseDiff: compile_gradient
julia> compile_gradient(x->norm(x), rand(10000))
(::#301) (generic function with 1 method)
julia> compile_gradient(x->norm(x), rand(100000))
ERROR: syntax: expression too large
in compile(::ReverseDiff.GradientTape{##7#8,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}},ReverseDiff.TrackedReal{Float64,Float64,Void}}) at /home/marius/.julia/v0.5/ReverseDiff/src/api/tape.jl:83
in compile_gradient(::Function, ::Array{Float64,1}, ::Vararg{Array{Float64,1},N}) at /home/marius/.julia/v0.5/ReverseDiff/src/api/tape.jl:104
Is there any way to go beyond this or is it just impractical?
Actually, where I'd be interested if I can get to is about O(10,000,000)... is that just too crazy to even consider?
Hi,
I ported Knet's MNIST example. You can see changes I made by comparing current reversediff branch with the revision two commits before.
Although I am taking advantage of ReverseDiff's tape compilation feature, currently ReverseDiff is too slow compared to AutoGrad on this MLP example. Here are the results,
60.575495 seconds (1.21 M allocations: 2.609 GB, 0.44% gc time) (ReverseDiff.jl - compiled)
32.504943 seconds (5.49 M allocations: 6.813 GB, 1.42% gc time) (AutoGrad.jl)
julia> using AutoGrad
julia> using ReverseDiff
julia> using ReverseDiff: gradient
julia> f(x,y,i) = sumabs2(x[i]-y)
f (generic function with 1 method)
julia> gradient(f, (rand(3,4),1,1))
ERROR: MethodError: objects of type Int64 are not callable
in Type at /mnt/kufs/scratch/ikesen16/.julia/somon/v0.5/ReverseDiff/src/api/Config.jl:46 [inlined]
in Type at /mnt/kufs/scratch/ikesen16/.julia/somon/v0.5/ReverseDiff/src/api/Config.jl:37 [inlined] (repeats 2 times)
in gradient(::Function, ::Tuple{Array{Float64,2},Int64,Int64}) at /mnt/kufs/scratch/ikesen16/.julia/somon/v0.5/ReverseDiff/src/api/gradients.jl:22
julia> gf = grad(f)
(::gradfun) (generic function with 1 method)
julia> gf(rand(3,4),1,1)
3×4 Array{Float64,2}:
-0.916041 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
I was wondering whether there exists a way to avoid having to recreate the pre-recorded tape when extra arguments (hidden within a closure) are changed. In order to explain what I have in mind, here is a simple example script:
using DiffBase, ForwardDiff, ReverseDiff
type Z
a::Float64
end
g(x, z::Z) = sum(sin, z.a)+prod(tan, x)*sum(sqrt, x)*z.a;
z = Z(3.);
f(x) = g(x, z);
x = rand(2);
fresult = DiffBase.GradientResult(x);
rresult = DiffBase.GradientResult(x);
ForwardDiff.gradient!(fresult, f, x);
const ftape = ReverseDiff.GradientTape(f, x);
const cftape = ReverseDiff.compile(ftape);
ReverseDiff.gradient!(rresult, cftape, x);
DiffBase.value(fresult)
DiffBase.value(rresult)
DiffBase.gradient(fresult)
DiffBase.gradient(rresult)
# Change value of z.a, which is part of the extra input argument z of g(x, z), enclosed by the f(x) closure
# ForwardDiff succeeds, ReverseDiff based on cftape fails
z.a = 3.5;
ForwardDiff.gradient!(fresult, f, x);
ReverseDiff.gradient!(rresult, cftape, x);
DiffBase.value(fresult)
DiffBase.value(rresult)
DiffBase.gradient(fresult)
DiffBase.gradient(rresult)
# Use original closure rather than its associated tape
# Both ForwardDiff and ReverseDiff succeed
ForwardDiff.gradient!(fresult, f, x);
ReverseDiff.gradient!(rresult, f, x);
DiffBase.value(fresult)
DiffBase.value(rresult)
DiffBase.gradient(fresult)
DiffBase.gradient(rresult)
Currently, TrackedArray
s only support being overwritten in a few "blessed" situations (e.g. as inputs to specially-cased functions like A_mul_B!
). I believe it may be possible to define record/propagation rules for setindex!
on TrackedArray
that will allow destructive indexing by preserving the pre-overwrite state in the tape.
If using an in-place assignment operator .=
from a tracked vector into a typed vector, no error is reported and an incorrect answer is given. Consider the following:
using ReverseDiff
function test1(x)
out = Vector{Float64}(length(x))
out .= 3.0.*x
out .^= 2
return sqrt(dot(out,out))
end
function test2(x)
out = Vector{Any}(length(x))
out .= 3.0.*x
out .^= 2
return sqrt(dot(out,out))
end
input = rand(3)
test1(input) == test2(input)
g1 = ReverseDiff.gradient(test1, input)
g2 = ReverseDiff.gradient(test2, input)
Output in Julia 0.5:
julia> test1(input) == test2(input)
true
julia> g1 = ReverseDiff.gradient(test1, input)
3-element Array{Float64,1}:
0.0
0.0
0.0
julia> g2 = ReverseDiff.gradient(test2, input)
3-element Array{Float64,1}:
0.0936007
6.53833
13.5873
I was trying out using ReverseDiff with my package, which relies heavily on StaticArrays. Unfortunately, I ran into an issue with broadcast
:
using ReverseDiff: GradientTape
using StaticArrays
function f(x)
T = eltype(x)
v = SVector(one(T), zero(T))
sum(sum(x) * v)
end
f_tape = GradientTape(f, rand(1))
results in the following ambiguity error on 0.6 with StaticArrays 0.0.4 and ReverseDiff 0.1.2:
ERROR: MethodError: broadcast(::Base.#*, ::ReverseDiff.TrackedReal{Float64,Float64,Void}, ::StaticArrays.SVector{2,ReverseDiff.TrackedReal{Float64,Float64,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}}}) is ambiguous. Candidates:
broadcast(f, a::Union{Number, StaticArrays.StaticArray}...) in StaticArrays at /home/twan/code/RigidBodyDynamics/v0.6/StaticArrays/src/broadcast.jl:8
broadcast(::Base.#*, x::ReverseDiff.TrackedReal{X,D,O} where O, y::AbstractArray{T,1} where T) where {X, D} in ReverseDiff at /home/twan/code/RigidBodyDynamics/v0.6/ReverseDiff/src/derivatives/elementwise.jl:344
Stacktrace:
[1] f(::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}) at ./REPL[7]:4
[2] Type at /home/twan/code/RigidBodyDynamics/v0.6/ReverseDiff/src/api/tape.jl:176 [inlined]
[3] ReverseDiff.GradientTape(::Function, ::Array{Float64,1}) at /home/twan/code/RigidBodyDynamics/v0.6/ReverseDiff/src/api/tape.jl:175
The relevant methods are:
How do you think this should be resolved?
Support for treating matrix multiplication, determinants, etc. Much more efficient than recording tapes for the individual elements.
Hello all,
First, thanks for your ForwardDiff and ReverseDiff packages !
Maybe that's not the right place to ask this, but would you know whether a dynamic computational graph package exists for Julia, such as chainer, dynet or PyTorch for Python ?
Have you thought of implementing something like this ? Do you know if that's possible with Julia ?
Best,
Emile
Is there a reason for the 0.5.0 upper limit and could it be lifted? Julia is going to go to v0.6 soon, and currently, the upper limit can cause conflicts between packages such as Optim, NLsolve, ForwardDiff and ReverseDiff.
Hi,
I've been poking around this package and I'm very exited.
However, a missing feature I really want is supporting functions with branches. For example, following functions do not work at the moment:
relu(x) = max(0, x)
elu(x) = x > 0 ? x : expm1(x)
log1pexp(x) = x < 18.0 ? log1p(exp(x)) : x < 33.3 ? x + exp(-x) : x
These are important for performance and numerical stability but I'm ignorant of technologies used in this library. So, I'd like to know whether branching will be supported in the future or at least technically feasible.
Currently, only array types whose linearindexing
trait is LinearFast
are supported. With a bit of generalization, we should be able to support LinearSlow
arrays as well (like sparse arrays). We'd have to:
getindex(t, args...)
on TrackedArray
(this is tricky, since we still want to intercept non-scalar/slicing operations, so we'll have to think a bit about Union
dispatching)index
field on TrackedReal
linearindexing
on TrackedArray
as linearindexing(value(t))
...and, of course, add a bunch of tests.
The docstring of ReverseDiff.@forward
says:
Currently, only
length(args) <= 2
is supported.
However, in a simple function, this macro looks working well even for >3 arguments:
julia> import ReverseDiff
julia> ReverseDiff.@forward f(x, y, z, w) = x + 2y + 3z + 4w
ReverseDiff.ForwardOptimize{##hidden_f}(#hidden_f)
julia> f(1.0, 2.0, 3.0, 4.0)
30.0
julia> ∇f = ReverseDiff.compile_gradient(x -> f(x[1], x[2], x[3], x[4]), zeros(4))
(::#301) (generic function with 1 method)
julia> ∇f(zeros(4), ones(4))
4-element Array{Float64,1}:
1.0
2.0
3.0
4.0
Can I expect this works always or is there any pitfall in the case?
I'm finding the MNIST example on the MNIST branch very insightful.
In the example, the structure of model is hardcoded. What is the best way, in the ReverseDiff framework, to programmatically define the structure? For example, if I wanted the user to be able to specify at runtime the number of hidden layers and the size and activation functions of each.
Thanks!
I already mention this in ReverseDiff's documentation, but I'm opening this issue to increase visibility and track progress.
ReverseDiff has the same perturbation confusion issue as ForwardDiff. My plan to solve this in ReverseDiff is the same as my plan to solve this in ForwardDiff; the tag will be added to the TrackedArray
/TrackedReal
types (and possibly the tape types, I'm not sure yet).
I have runt into what I believe is a bug in ReverseDiff
I have made a gist with a program that should generate a weird behavior, along with the output of the program. I'm simply trying to fit a one layer neural network to some data. The function I differentiate is lossfun
. When running the program, the output of all_results[1].value
which I assume should be equal to lossfun
, in fact diverges from lossfun
after a while, having started out similar.
Program that reproduce bug, along with output: https://gist.github.com/baggepinnen/5c413f9ca12d5853672fd4e2d8dbdaea
Julia Version 0.6.0
julia> f(x) = convert(Float64, 2x[1])
f (generic function with 1 method)
julia> gradient(f, [1])
1-element Array{Int64,1}:
0
julia> f(x) = 2x[1]
f (generic function with 1 method)
julia> gradient(f, [1])
1-element Array{Int64,1}:
2
(apologies if this is not the right place for such discussions)
What exactly can we expect of the performance of ReverseDiff? In particular, under what conditions exactly will it produce the gradient of a R^N -> R function in a time comparable to the evaluation of the function (say within a factor of 3-4)?
Below is a function (a simple arithmetic loop over an array), courtesy of @cortner, for which ReverseDiff is consistently ~30 times slower than the function on my machine. Moreoever, the recording and compilation step takes a large amount of time. I thought this was because the FLOPS/byte ratio is too low, as the manual suggests, but increasing it does not change the factor of ~30.
using ReverseDiff: GradientTape, GradientConfig, gradient, gradient!, compile
φ1(r) = r^2 + r^4 + exp(-r)
φ2(r) = -0.1 * r^3
function F(x)
N = length(x)
f = φ1(x[2]-x[1])
for i = 3:N
f += φ1(x[i]-x[i-1]) + φ2(x[i] - x[i-2])
end
return f
end
function benchmark()
for N in (10000, 20000, 40000)
# pre-record a GradientTape for `F` using inputs with Float64 elements
@time const f_tape = GradientTape(F, rand(N))
# compile `f_tape` into a more optimized representation
@time const compiled_f_tape = compile(f_tape)
# some inputs and work buffers to play around with
gresult = zeros(N)
x = rand(N)
println("N = $N")
println(" > Time for F")
for n = 1:3
@time F(x)
end
println(" > Time for DF")
for n = 1:3
@time gradient!(gresult, compiled_f_tape, x)
end
end
end
benchmark()
Would it be possible to add the feature of letting GradientTape record a tape with an "active" rand() function in it? Currently, if I run twice the same tape such as
f = (a,b) -> sum((a+rand()).*(b+rand()))
input = (rand(4),rand(4))
f_tape = ReverseDiff.GradientTape(f,input)
output = similar.(input)
print("Out #1, tape = ")
print(ReverseDiff.gradient!(output,f_tape,input),"\n")
print("Out #2, tape = ")
print(ReverseDiff.gradient!(output,f_tape,input),"\n")
print("Out #1, no tape = ")
print(ReverseDiff.gradient!(output,f,input),"\n")
print("Out #2, no tape = ")
print(ReverseDiff.gradient!(output,f,input),"\n")
returns
Out #1, tape = ([1.15379, 0.991579, 0.564506, 0.968936], [1.16635, 1.27959, 0.955812, 0.927267])
Out #2, tape = ([1.15379, 0.991579, 0.564506, 0.968936], [1.16635, 1.27959, 0.955812, 0.927267])
Out #1, no tape = ([0.909424, 0.747213, 0.32014, 0.72457], [1.50359, 1.61683, 1.29305, 1.26451])
Out #2, no tape = ([1.76153, 1.59931, 1.17224, 1.57667], [0.904445, 1.01769, 0.693907, 0.665362])
One solution is to provide the rand() as an external input to my function, but is it a good way to go, since I don't need the derivative wrt this input?
I have the following code
using ReverseDiff
m,g = 1, 9.8
t = 1
p = [5.,6]
q = [1.,2]
L(t,q,q̇) = m/2 * dot(q̇,q̇) - m*g*q[2]
function Legendre_transformation(F, w)
wv = a->ForwardDiff.gradient(F, a)
z = zeros(w)
M = ForwardDiff.jacobian(wv, z)
b = wv(z)
v = M\(w-b)
w'v - F(v)
end
function Lagrangian2Hamiltonian(Lagrangian, t, q, p)
L = q̇ -> Lagrangian(t, q, q̇)
Legendre_transformation(L, p)
end
H = (q, p)->Lagrangian2Hamiltonian(L, t, q, p)
ṗ(p, q) = ReverseDiff.gradient(a->-H(a, p), q)
ṗ(p, q)
that will produce the following depwarn
WARNING: `invoke(f, (types...), ...)` is deprecated, use `invoke(f, Tuple{types...}, ...)` instead
Stacktrace:
[1] depwarn(::String, ::Symbol) at ./deprecated.jl:70
[2] -(::ForwardDiff.Dual{2,ForwardDiff.Dual{2,Float64}}, ::ReverseDiff.TrackedReal{Float64,Float64,Void}) at /home/arch/.julia/v0.6/ReverseDiff/src/derivative
s/scalars.jl:20
...
If not mistaken, the function gradient!(result, f, input)
accepts a result
input argument of type Tuple{Array{Float64,2},Array{Float64,2}}
. Is it also possible to accept a result
input argument of type DiffBase.GradientResult
, in a way that ForwardDiff does?
I checked the documentation, where it seems that result
can be of type DiffBase.DiffResult
, and wasn't sure what is the relation between DiffBase.DiffResult
and DiffBase.GradientResult
.
Moreover, would it be possible to provide a simple example of calling gradient!()
using a result
of type DiffBase.GradientResult
or DiffBase.DiffResult
?
julia> using ReverseDiffPrototype; const RDP = ReverseDiffPrototype;
julia> function f(x)
a = one(eltype(x))
return sum(map(i -> a * i, x))
end
f (generic function with 1 method)
julia> RDP.gradient!(zeros(3), f, rand(3)) # wrong, should be all ones
3-element Array{Float64,1}:
0.823614
0.459833
0.581877
Note that we could replace a = one(eltype(x))
with something like a = x[1]
, and the bug would persist.
The closed-over TraceReal
is causing each application of the closure to be written to the trace, laying down incorrect nodes that get accumulated in the reverse pass. It's a perturbation-confusion-esque bug, but one that we're inducing on ourselves instead of it being induced by competing differentiation operators.
Not only does it give the wrong answer, but it also kind of defeats the point of the optimization (to elide the tracing of the closure application).
I'm unsure how we can resolve this short of scrubbing the closure for TraceReal
s, just as we'd have to do to fix JuliaDiff/ForwardDiff.jl#83.
@mlubin any ideas?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.