Comments (6)
Finally had time to look at this, sorry for the late response. There's a very large amount of extra, un-Julian (e.g. unnecessarily type unstable) code in your example. If we're only comparing ReverseDiff and AutoGrad, then we don't need any of the code besides the loss function (since that's the only place these packages are involved). Here's a much more palatable benchmark:
using BenchmarkTools
import AutoGrad
import ReverseDiff
#########
# Setup #
#########
function loss(w, b, x, ygold)
ypred = tanh.(w*x .+ b)
ynorm = ypred .- log.(sum(exp.(ypred), 1))
-(sum(ygold .* ynorm)) / size(ygold, 2)
end
const w, b, x, y = 0.1 * rand(10,28^2), zeros(10), rand(28^2), zeros(10);
const input = (w, b, x, y);
const output = map(copy, input);
################
# benchmarking #
################
const agrad_loss∇ = AutoGrad.grad((ws, x, ygold) -> loss(ws[1], ws[2], x, y))
@btime agrad_loss∇($((w, b)), $x, $y)
const rdiff_loss∇ = ReverseDiff.compile(ReverseDiff.GradientTape(loss, input))
@btime ReverseDiff.gradient!($output, $rdiff_loss∇, $input)
A list of the changes I've made from your original benchmark:
- use an actual BenchmarkTools harness
- used dummy initial values and removed all the unnecessary data mangling code (since it has nothing to do with gradient performance)
- Minimized it to a single layer evaluation. Note that ReverseDiff handles the additional layers the same way as AutoGrad, but benchmarking more than one layer for the sake of comparing gradient evaluation is pointless when a) gradient performance just scales linearly with the number of layers and b) this benchmark used a fixed number of layers anyway.
- doing the above has the nice side effect of fixing your type-unstable layer evaluation loop
- fixed deprecation warnings in the
loss
function
The output on my machine for ReverseDiff is:
julia> @btime ReverseDiff.gradient!($output, $rdiff_loss∇, $input)
46.903 μs (4 allocations: 352 bytes)
I could not get AutoGrad to work on Julia v0.6; maybe I messed something up? The latest AutoGrad master works now:
julia> @btime agrad_loss∇($((w, b)), $x, $y)
452.553 μs (816 allocations: 98.89 KiB)
Onto addressing your points:
ReverseDiff has not the ability to work with KnetArrays.
ReverseDiff attempts to support arbitary A<:AbstractArray
, and the KnetArray
type is not an AbstractArray
. Knet gets away with this by not caring about AD-unaware, non-generic Julia code (which is reasonable). In contrasts, ReverseDiff tries to be able to differentiate as much code as possible, even if it's not perfectly type generic, as long as the code works with a reasonable set of standard Julia types.
In Knet, we use ReLU activation (max(0,x) actually), but ReverseDiff is not able to take derivative of this operation currently.
This doesn't make sense. ReverseDiff should easily be able to do this via forward-mode AD. Maybe you found a bug - can you show me an example?
In AutoGrad, we have a loss function and its first input parameter is weights bundle. It can be an array, a tuple, a dictionary or a combination of these structures. Actually, this is good, because we can use same loss function for different networks (e.g. 1 hidden layer MLP and 2 hidden layer MLP, both uses same loss function). Unlike AutoGrad, in ReverseDiff, we need to pass all parameters to the loss function.
ReverseDiff tries to expose a simple, general API rather than a "magical", use-case specific one. The idea is that it's easier to build the latter kind of API on top of the former than it is to do the reverse. For example, AutoGrad is focused on ML, so it's just munging whatever container type it sees for differentiable state in a way ML folks are used to. ReverseDiff doesn't assume an ML use case, but somebody could easily make a container munging layer on top of ReverseDiff for ML purposes.
Additionally, it's already generally accepted in the Julia AD world that nondifferentiated parameters get passed via closures. This is actually far cleaner than the AD APIs supporting additional parameters. The only pitfall here for reverse-mode is that you can do better static optimization if you have placeholder parameters, however this isn't an arena in which AutoGrad can compete since it doesn't do any static optimization anyway.
I think indexing does not work for ReverseDiff.
This is false, you're just not using ReverseDiff's API correctly. Here's an example that mirrors what you're doing with AutoGrad:
julia> gradient(x -> f(x, 1, 1), rand(3, 4))
3×4 Array{Float64,2}:
-1.09042 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
Note that for functions containing performance-intensive scalar indexing, ReverseDiff will generally outperform AutoGrad, since ReverseDiff does some clever persistence tricks rather than naively record getindex
operations to the tape.
Optional arguments is not supported by ReverseDiff.
This is also false - you even use ReverseDiff with optional arguments in your original example! Maybe you meant something different besides "optional arguments"?
In softmax operation, we have a safer version which prevents float overflow and it takes advantage of maximum operation. However, ReverseDiff does not support maximum/minimum functions.
Once again, this doesn't make sense - AFAICT from your description, ReverseDiff should be able to handle this. It'd be great to see an example so I can debug it.
Finally, I should note that I'm not working much on the current ReverseDiff version. All my efforts are going towards Cassette, which is a new prototype of a native Julia execution tracer. Once it's done, both ReverseDiff and ForwardDiff will be totally rebuilt on top of it. If ReverseDiff doesn't meet your needs right now, we might be better off waiting until Cassette is released than to spend time enhancing ReverseDiff as it is.
from reversediff.jl.
@ilkerkesen @denizyuret Any response to my above comment? I'm specifically interested seeing code for the softmax/ReLU problems that were reported. I want to make sure some new work I'm doing for the Julia 1.0 timeframe will be usable w.r.t. Knet (keeping denizyuret/Knet.jl#144 in mind).
from reversediff.jl.
Maybe its best to split these problems into separate issues, along with expected/actual behavior.
from reversediff.jl.
I am a bit late here but I've run the benchmark. It works for me on AutoGrad's master branch and julia 0.6.
This is what I get:
@btime agrad_loss∇($(w, b), $x, $y)
432.291 μs (748 allocations: 97.39 KiB)
@btime ReverseDiff.gradient!($output, $rdiff_loss∇, $input)
41.052 μs (4 allocations: 352 bytes)
I also tested increasing the array sizes from 10 to 1000:
@btime agrad_loss∇($(w, b), $x, $y)
2.811 ms (748 allocations: 6.15 MiB)
@btime ReverseDiff.gradient!($output, $rdiff_loss∇, $input)
9.314 ms (4 allocations: 15.91 KiB)
from reversediff.jl.
AutoGrad errored for me, but switching to the master branch fixed the problems. I have about the same results:
julia> @btime agrad_loss∇($((w, b)), $x, $y)
466.866 μs (816 allocations: 98.89 KiB)
([0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0])
julia> @btime ReverseDiff.gradient!($output, $rdiff_loss∇, $input)
52.445 μs (4 allocations: 352 bytes)
([0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 … 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [2.30259, 2.30259, 2.30259, 2.30259, 2.30259, 2.30259, 2.30259, 2.30259, 2.30259, 2.30259])
But AutoGrad only found the gradient with respect to (w,b), while ReverseDiff did with respect to w, b, x, and y. Using AutoGrad for all of them slowed it down to a minimum/median of 525/539 microseconds // using ReverseDiff for only w and b took a minimum/median of 47.5/47.7 microseconds.
In the example on ReverseDiff's readme, ReverseDiff was 20x faster than AutoGrad when I tried it a few days ago on another computer.
I am excited for the Cassette-based overhaul (jrevels' youtube video from JuliaCon 2017 is great), especially because of ambiguity errors of the sort:
julia> const rdiff∇f = ReverseDiff.compile(ReverseDiff.GradientTape(f, randn(80)))
ERROR: MethodError: *(::RowVector{ReverseDiff.TrackedReal{Float64,Float64,ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}},ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}}, ::ReverseDiff.TrackedArray{Float64,Float64,1,Array{Float64,1},Array{Float64,1}}) is ambiguous. Candidates:
*(x::AbstractArray{T,2} where T, y::ReverseDiff.TrackedArray{V,D,N,VA,DA} where DA where VA where N) where {V, D} in ReverseDiff at /home/chris/.julia/v0.6/ReverseDiff/src/derivatives/linalg/arithmetic.jl:193
*(x::AbstractArray, y::ReverseDiff.TrackedArray{V,D,N,VA,DA} where DA where VA where N) where {V, D} in ReverseDiff at /home/chris/.julia/v0.6/ReverseDiff/src/derivatives/linalg/arithmetic.jl:193
*(rowvec::RowVector{T,V} where V<:(AbstractArray{T,1} where T), vec::AbstractArray{T,1}) where T<:Real in Base.LinAlg at linalg/rowvector.jl:170
Possible fix, define
*(::RowVector{ReverseDiff.TrackedReal{V,D,ReverseDiff.TrackedArray{V,D,1,VA,DA}},V} where V<:(AbstractArray{T,1} where T), ::ReverseDiff.TrackedArray{V,D,1,VA,DA})
But I'll try to work around this in the meantime.
from reversediff.jl.
Bump: Any news or decisions taken regarding the future of this?
from reversediff.jl.
Related Issues (20)
- Error when using scalar vs. vector to operate on tracked inupt HOT 1
- Record `Broadcast.broadcasted` instead of `Broadcast.broadcast`
- MethodError: ReverseDiff.TrackedReal ... is ambiguous.
- double free crash with multi-threaded code only when using multiple threads
- @grad_from_chainrules macro fails when using multi-output functions HOT 2
- ReverseDiff documentation shows issue that has been fixed? Nested differentiation of a closure? HOT 1
- `MethodError: *(::Diagonal, ::ReverseDiff.TrackedArray)` is ambiguous.
- `@grad_from_chainrules` hygiene: cannot use custom types in method signature HOT 3
- Define `typemin` for tracked reals.
- ReverseDiff defines a huge number of methods. HOT 3
- Nested differentiation of closures yields incorrect results. Any news on the fix?
- Enhancement proposal: Modular tape caching HOT 16
- Bug: Derivative of transposed-vector times matrix is incorrect. HOT 5
- Strange bug when deferring to ChainRules HOT 1
- Add ChainRulesCore RuleConfig? HOT 1
- mean BigFloat precision
- MethodError: vcat(::ReverseDiff.TrackedArray{Float32, Float32, 2, Matrix{Float32}, Matrix{Float32}}, ::Matrix{Float32}) is ambiguous. HOT 4
- Method ambiguities reported by Aqua
- DiffResults objects are not re-aliased properly HOT 2
- ERROR: LoadError: Some tests did not pass: 146 passed, 0 failed, 1 errored, 0 broken. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reversediff.jl.