coreylowman / dfdx Goto Github PK
View Code? Open in Web Editor NEWDeep learning in Rust, with shape checked tensors and neural networks
License: Other
Deep learning in Rust, with shape checked tensors and neural networks
License: Other
Currently using the matrixmultiply crate, but I think performance could be much improved with using the actual BLAS library. Unclear how compiling/including that works since it has to be compiled per machine.
pytorch's sgd page has pseudocode for this: https://pytorch.org/docs/stable/generated/torch.optim.SGD.html
This function would stop using alloc_zeroed()
and Box::from_raw()
https://github.com/coreylowman/dfdx/blob/main/src/devices/allocate.rs#L11:
let layout = Layout::new::<T>();
debug_assert_eq!(layout.size(), T::NUM_BYTES);
unsafe {
let ptr = alloc_zeroed(layout) as *mut T;
Box::from_raw(ptr)
}
E.g.
Since everywhere its used is with data the callee function owns for both. This will also allow it to reduce allocations for derivatives
Something that takes a usize
length of the dataset, and you can:
Each of these would return a [usize; M]
where M is a const M: usize
pytorch's adam page has psuedo code for this: https://pytorch.org/docs/stable/generated/torch.optim.Adam.html?highlight=adam#torch.optim.Adam
something like
fn select<const S: usize>(self, inds: &[usize, N]) -> Self<S, ...>;
I imagine the gradients for this would just be 1 if i is in inds, otherwise 0
There's a lot of work to be done here. Very rough list of todos:
Preparation
Devices
Cuda
device that wraps cudarc::CudaDevice and an rngCpu
DeviceArc
and DeviceRng
Arc<T>
and Arc<Cpu>
Tensors
Device
to all tensor structs&Device
as parameter, and remove Rng since that will be accessed through devicenn
trait ModuleCreator
ModuleCreator::zeros(Device)
ModuleCreator::default(Device)
which calls zeros & reset paramsKernels
trait LaunchKernel<K, Args>
impl LaunchKernel<...> for Cpu
and trait <Kernel>CpuImpl
/impl <Kernel>CpuImpl for <Kernel>
. See cudarc/examples/kernels.rskernel!(|a, b, c| { *a = b + c })
(#185)Testing
#[cfg(feature="test-cuda"]
) that when specified uses cuda instead of cpu?build_test_device!()
to use that uses testing features to create the deviceDone:
Examples where this would remove an allocation:
target_probs
is duplicatedb
calculation where max_value
is duplicated.Would like to add an small example of using a transformer architecture. This will likely involve new features such as batch mat mul and maybe some others.
This would be variable sized head where the input to the module is duplicated and the same input is passed to all sub modules.
Unclear how this would work since we are already using tuples. Perhaps something like:
impl Module<I> for MultiHead<(A, B)> {}
impl Module<I> for MultiHead<(A, B, C)> {}
impl Module<I> for MultiHead<(A, B, C, D)> {}
...
?
Related to #6
It currently:
It'd be nice to reuse one, but because of the order of operations it may be impossible; to compute lhs derivative you need rhs & result, and to compute rhs derivative you need lhs & result.
This would accept an array of T::Reduced::ArrayType
, where Dtype is usize, and select the items from last dimension that match up. It would return a Tensor::Reduced
.
Example:
let t: Tensor2D<2, 3> = Tensor2D::new([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]);
let r: Tensor1D<2> = gather_last_dim(t, [0, 1]);
assert_eq!(r.data(), &[1.0, -2.0]);
This would reduce last dim to the maximum value in that dimension. It can use T::Device::reduce_last_dim(..., &mut f32::max)
(see logsumexp for example using that).
Example:
let t: Tensor2D<2, 3> = Tensor2D::new([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]);
let r: Tensor1D<2> = max_last_dim(t);
assert_eq!(r.data(), &[3.0, -1.0]);
This will be another generic parameter of all tensors. Most existing operations will likely require float generic.
Related to #9 since it involves an additional generic parameter
For safety & clarity reasons. If you clone a tensor for backprop, more often than not you want that to be a different tensor and for it to be treated separately during backprop.
For cases where you do want to keep the id the same, .duplicate()
should be used.
The only place this really occurs is in kl_div_with_logits_loss
where target_probs
is cloned sicne it's used twice.
Needed for transformers #34
Current only works for actual probability distributions. hard cross entropy only has 1 non zero entry in inner dimension, so sum across that before taking mean
add/sub and mul/div are all slightly different, and they are kinda hard to read.
E.g. for xavier uniform initialization you need to know the in size & out size.
This will likely require a different trait than Randomize, and I'm still inclined to keep randomize. It'll also be slightly easier to use since the user won't have to pass in a distribution.
Options:
model.reset_params(&mut rng);
model.init_params(&mut rng);
model.randomize_params(&mut rng);
This should use Tensor::randomize()
under the hood.
This will need:
This will also slightly reduce the required movement of tape when using these functions.
One of the arguments can be reused as the storage for the gradient.
While this would force functions to allocate space for derivatives inside, it would be cleaner from an api perspective.
E.g.
fn matmul_ref(a: &..., b: &..., tape: H) {}
Needed for transformers #34
Ideally we'd have p
be a const parameter. unfortunately f32 cannot be const in stable.
Many uses cases make p 1 / N
, where N is just an integer.
Dropout1In<N>
would set p to be 1.0 / N as f32
for now.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.