tskit-dev / tskit-rust Goto Github PK
View Code? Open in Web Editor NEWRust bindings for the tskit library
Home Page: https://docs.rs/tskit
License: MIT License
Rust bindings for the tskit library
Home Page: https://docs.rs/tskit
License: MIT License
Make new function, deprecate the old. This name makes it more clear what is being iterated over.
This should include fundamental type definitions, constants, and public traits.
To save manual iteration over the node table and checking flags, we should return a vector of sample nodes.
Also, a function taking a filter would be handy, too, to allow only a subsample of nodes to pass some criterion
The functions to return "row" structs have an option called decode_metadata
. This name is wrong. It should be include_metadata
. It may be preferable to simply remove the option entirely if we can figure out a way to safely obtain a slice based on the metadata.
Note: renaming the argument type does not break API. Removing it does.
Should be simple with iterator adapters and row proxy classes. The metadata may be a challenge?
Audit the macros for this -- the latter is correct in case of rename-after-import.
I guess it is time to let clippy be part of CI....
This type wraps the raw C arrays and is used by NodeIterator objects derived from Tree. However, the type is also public, resulting in a potential safety issue. If the parent Tree is dropped, the array content is invalidated.
Potential solutions:
Currently, we require that metadata decoding return an object. However, metadata are not strictly required for all rows of a table. An optional return type make sense, then, instead of forcing some kind of default return value.
It's currently unclear how to do this. No effort has been made to test it. It seems quite likely that any metadata encoded using JSON would work, but we should actually show that. It is anyone's guess for the other encoding methods, as it's unclear that we can predict the layout.
Right now, these are private. Make them public, document them.
The "new" function, number of rows, and metadata are repeated for all the table types. What does it take to encapsulate these concepts into a trait?
These are currently re-exported from lib.rs
:
pub use bindings::TSK_NO_BUILD_INDEXES;
pub use bindings::TSK_SAMPLE_LISTS;
They can be safely removed.
Custom iteration algorithms could make good use of this feature. The slices will be non-owning due to from_raw_parts
.
This would be more idiomatic than the current deepcopy fn, which should probably be deprecated.
Adding to the prelude lets client code quickly import symbol names that are hard to live without. I think these ID types would count?
The current TableCollection relies on a heap-allocated tsk_table_collection_t stored in a Box. Thus, we have to do all this "maybe uninitialized" stuff. It would be an improvement to take a cue from rust-GSL and offload the necessary boiler plate to a macro:
The nice thing is that this can all be done w/o breaking user-facing API and we can re-use the macro for other types.
This will clean up the main files and should also lead to better coverage calculations?
There's a good bit of code duplication implementing Iterator
for all the Table/TableRef row types.
This implementation code should just be a macro: the types backing the iterators are only accessed by their Iterator
behavior.
Several interfaces return objects implementing Iterator
. Currently, the functions return a type for which Iterator
is implemented, giving the mistaken impression that there's a struct
whose implementation details need to be considered in client code. For most of these, however, all client code needs to worry about is that they impl Iterator
. (In fact, I think this is true for all of them...)
We can make these types pub(crate)
, preventing the names from leaking from the library. Change the function declarations to reflect returning impl Iterator<type=X>
(or Box<dyn ..>
when the function is declared in a trait).
These changes won't break API as the interface remains unchanged.
The current docs are wrong. The iterator currently will not error (because iteration cannot). Thus, and error during iteration will panic!
instead.
Right now, we don't get full back trace support, but it should be possible. See here.
In order to release 0.1.1 w/the new tskit crate name, the docs needs to be gone over for how they discuss metadata.
We probably need more examples, too?
This issue is a placeholder/reminder.
Currently, a single trait defines metadata. This leads to a potential problem for external APIs writing "export to tskit" functions. One could send the wrong metadata to the wrong table.
Since all metadata need the same API, then a simple set of marker traits will suffice to distinguish Node from Individual metadata, etc..
In #64, I discovered that you can share code b/w test modules. Should expand on that idea and collect stuff into lib/test_fixtures.rs
. One thing to put in there is a simple forward sim, but there's a question about how to do that but not have rand
be a dependency all the time.
The macro argument $T
is never used and should be removed.
tskit uses char *
for mutation ancestral/derived states and metadata.
We are currently using CString to convert from Vec<u8>
to char *
. This rust type cannot contain any internal \0
, which are 0 as u8
.
An undesirable corollary of this design is that some types of metadata become excluded--any integer that is exactly 0, for example. (A u32
equal to 0 encodes as a sequence of 4 0 in u8
.)
AFAIK, the tskit spec says nothing about null-termination of strings, as the intent is to use char *
to store raw bytes if desired.
At the very least, some power user scenarios need:
a..b
works...The trees API uses tsk_id_t
throughout. This should change to the new NodeId
type.
TableCollection and TreeSequence dump
both need flag support?
Also see #69.
Since nightly
can be "quirky", we should separate testing the two tool chains and only require that stable
be required to pass.
Currently, adding provenance with an empty record raises an error. This behavior differs from tskit-python. Should we change it? IMO, it is unclear what the client intended re: provenance here, signalling an error happening somewhere in the work flow?
Related: #114
The Item
type of all these iterators is tsk_id_tt
, but it should be Result
.
Getting started block refers to submodules, which are not currently used.
Named arguments are handy. Rust only really allows them for struct construction. So, we can "fake it" with an API like:
tables.add_node_table_row(NodeTableRow{ flags: 0, time: 0.1243, population: 2, individual: TSK_NULL, metadata: None }).unwrap()
Just have to check what the visibility of the row objects is. They are public, but maybe not currently from the crate root?
This is certainly undesirable to do "everywhere", but it is handy for the row adding fxns that are a bit long. The other functions that are long on the Python side are often shorter on the C side to to bit flags.
Constant source of error--should document the correct flags, etc..
@MomoLangenstein suggested that it might be a good idea to enforce strong typing on integer IDs in the Rust API, so that you don't, for example accidentally pass an individual ID to a function that expects a node ID. There was some discussion in the Slack channel about this: apparently some similar type constraints were previously implemented in the tskit
C API but then removed because it required typing too much boilerplate, for not much discernible benefit. But @molpopgen has just said:
Interestingly (because you can't do this in C), you seem to be able to do this in rust in a way that may not break API and would still allow the plain tsk_id_t to be passed to functions.
Perhaps @MomoLangenstein could weigh in here with what they were thinking about?
ID types needed for this to be closed:
The docs say to use this trait, but I don't think actually is necessary. Should confirm with doctests, etc..
The std and bincode file names in examples contain the wrong code--the file names need to be swapped.
With no crossing over, the code is always passing on the first parental node rather than doing segregation.
It is quite annoying that they do not!
We should provide a means for mutable and immutable access to these via slices.
The various add_foo_with_metadata
are a verbose API due to Option
. For readability purposes, _with_metadata
and Option
don't quite align.
A fairly simple solution is to have a third (non-pub) fn that does the real work to handle the Option
.
The preorder traversal is processing children in the "backwards" order (left to right).
We should do right-to-left so that they come out from the stack in the expected order.
It should be:
) -> Box<dyn Iterator<Item = tsk_id_t> + '_> {
Else, there's a problem for future enum values.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.