Code Monkey home page Code Monkey logo

dense_mats's People

Contributors

vbarrielle avatar

Stargazers

 avatar

Watchers

 avatar  avatar

dense_mats's Issues

API thoughts

I'm mixing multiple issues here for discussion, we can break out things that need real change and close this later. We can move a discussion elsewhere, like the subreddit, as you wish.

zeros, ones etc ask for StorageOrder, and default to C if given Unordered. Having to always specify order is a pretty serious inconvenience, I would much prefer the library chooses a default that allows me to not pass StorageOrder at all.

in fn strides why not return &DimArray? anyone else can clone, so no need for strides_ref that I see. Same for fn shape. Probably missing a subtlety.

ordering returns F,C even if inner_stride is not 1, so that data is not contiguous, that is pretty surprising to me.

outer_block_iter is an interesting choice of iterator to have. I think the block_size=1 version whose items are of order smaller by one is the important one, so we can have nested loops that eventually consider elements. The general case is iter_subarrays(along: Axis), I think.

data_index does not do the bounds checking. I'm not saying that is suboptimal, but I wonder.

rows stands for number of rows, but could also stand for an iterator over rows or the Range of legal row indices (with the number as n_rows etc). I think we should prefer the language of the API to use higher level objects than usize as often as possible.

Anyway, this is clearly a sketch, most of the above will get fixed while adding stuff to flesh it out. Some larger issues:

  • Using arrays as indices is a bit of a pain in the butt:

    let first = myarr[vec!(0,0,0)];

is kind of unwieldy. Wish Rust had nicer array syntax, but have to admit the bar of being about as lightweight as [] is high. Of course the matrix and vector special cases can be provided with 2-tuple and number instances of Index mostly solving the problem.

  • I think that supporting generally strided tensors is very important for interoperability with external ecosystem including numpy. That said, in my usage I expect the contiguous dimension to be part of the type, so that I know that my innermost loops are cache friendly. If we have such types, we can simply convert to them (testing the stride and returning None if incompatible, and creating a contiguous copy are both reasonable options).

  • This returns us to the question of writing algorithms abstractly over (most aspects of) tensor representation. I wonder if we need something like IntoTensor and FromTensor (as in IntoIterator, FromIterator), which allows a thing to be accessed with an appropriately sized array. Then binops can be implemented once in terms of IntoTensor, with return value organization determined externally. I am guessing the following:

    let c: RowMajorMatrix = a + b;

can be made to work even when a, b, c have different organizations, and efficiently when they do.

  • For general strided arrays, it seems natural to provide also Index<(usize,Axis)> and iter_subarrays(along: Axis).

Type checked contiguous data

Have some marker in the Tensor type to be able to assert at compile time whether an array is contiguous or not.

If done using a phantom data marker, this should be easily propagated through methods:

  • constructors create contiguous arrays
  • outer slicing creates contiguous views
  • other slicing create non-contiguous views

New type aliases would be needed then.

Design for inter-language operation

I think the important point we might want to think about now is what data can be used and how. For example, since Python can collect its NumPy arrays (I don't know if they do any compacting) then we should not allow views into passed data to persist beyond the call (unless we learn to increment the reference count). I dealt with this by accepting a closure accepting a matrix (here view) for the code needs it. A little awkward when receiving more than one matrix, but not too bad.

Just for information, my experience calling Rust from both Python (using NumPy's ctypes interface) and R:

  • Python's NumPy through ctypes approach is flexible, powerful and a bit complicated. Python itself is slow enough that doing conversions and checks in Python is much slower than FFI. Specifically, we need to interpret at least the dimensions, the strides, and the datatype. Hence one day we should probably wrap their interface file [1].
  • R's .C calling method approach is primitive and simple. You only get to pass in pointers. IIUC, R's matrix is F ordered and contiguous. So I passed in the data pointer and a pointer to the i32's for the dims. I didn't get into some newer R interfaces such as .Call and .External. The first point of documentation is [2], also relevant are the chapters [3,4] in Advanced R.

[1] dist-packages/numpy/core/include/numpy/ndarraytypes.h
[2] https://cran.r-project.org/doc/manuals/R-exts.html
[3] http://adv-r.had.co.nz/Rcpp.html
[4] http://adv-r.had.co.nz/C-interface.html

Relicense under dual MIT/Apache-2.0

This issue was automatically generated. Feel free to close without ceremony if
you do not agree with re-licensing or if it is not possible for other reasons.
Respond to @cmr with any questions or concerns, or pop over to
#rust-offtopic on IRC to discuss.

You're receiving this because someone (perhaps the project maintainer)
published a crates.io package with the license as "MIT" xor "Apache-2.0" and
the repository field pointing here.

TL;DR the Rust ecosystem is largely Apache-2.0. Being available under that
license is good for interoperation. The MIT license as an add-on can be nice
for GPLv2 projects to use your code.

Why?

The MIT license requires reproducing countless copies of the same copyright
header with different names in the copyright field, for every MIT library in
use. The Apache license does not have this drawback. However, this is not the
primary motivation for me creating these issues. The Apache license also has
protections from patent trolls and an explicit contribution licensing clause.
However, the Apache license is incompatible with GPLv2. This is why Rust is
dual-licensed as MIT/Apache (the "primary" license being Apache, MIT only for
GPLv2 compat), and doing so would be wise for this project. This also makes
this crate suitable for inclusion and unrestricted sharing in the Rust
standard distribution and other projects using dual MIT/Apache, such as my
personal ulterior motive, the Robigalia project.

Some ask, "Does this really apply to binary redistributions? Does MIT really
require reproducing the whole thing?" I'm not a lawyer, and I can't give legal
advice, but some Google Android apps include open source attributions using
this interpretation. Others also agree with
it
.
But, again, the copyright notice redistribution is not the primary motivation
for the dual-licensing. It's stronger protections to licensees and better
interoperation with the wider Rust ecosystem.

How?

To do this, get explicit approval from each contributor of copyrightable work
(as not all contributions qualify for copyright, due to not being a "creative
work", e.g. a typo fix) and then add the following to your README:

## License

Licensed under either of

 * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
 * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

### Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any
additional terms or conditions.

and in your license headers, if you have them, use the following boilerplate
(based on that used in Rust):

// Copyright 2016 dense_mats developers
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.

It's commonly asked whether license headers are required. I'm not comfortable
making an official recommendation either way, but the Apache license
recommends it in their appendix on how to use the license.

Be sure to add the relevant LICENSE-{MIT,APACHE} files. You can copy these
from the Rust repo for a plain-text
version.

And don't forget to update the license metadata in your Cargo.toml to:

license = "MIT/Apache-2.0"

I'll be going through projects which agree to be relicensed and have approval
by the necessary contributors and doing this changes, so feel free to leave
the heavy lifting to me!

Contributor checkoff

To agree to relicensing, comment with :

I license past and future contributions under the dual MIT/Apache-2.0 license, allowing licensees to chose either at their option.

Or, if you're a contributor, you can check the box in this repo next to your
name. My scripts will pick this exact phrase up and check your checkbox, but
I'll come through and manually review this issue later as well.

default storage order

constructors should have a default storage order (probably C order if we want to be non-surprising to numpy users)

Some specific constructors for F order should also be added.

Ordering and contiguity

Currently fn ordering() only cares about relative ordering of strides, not about data coniguity (ie having an inner stride of 1).

We need to expose contiguity, either by providing an is_contiguous() method, or by providing a more detailed ordering() enum.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.