The tskit-rust's discuss from tskit-dev

Tree::path_to_root should be Tree::parents

Make new function, deprecate the old. This name makes it more clear what is being iterated over.

Add a prelude.

This should include fundamental type definitions, constants, and public traits.

TableCollection should be able to return a list of samples.

To save manual iteration over the node table and checking flags, we should return a vector of sample nodes.
Also, a function taking a filter would be handy, too, to allow only a subsample of nodes to pass some criterion

Table "row" object interface

The functions to return "row" structs have an option called decode_metadata. This name is wrong. It should be include_metadata. It may be preferable to simply remove the option entirely if we can figure out a way to safely obtain a slice based on the metadata.

Note: renaming the argument type does not break API. Removing it does.

Add table iterators

Should be simple with iterator adapters and row proxy classes. The metadata may be a challenge?

Macros crate vs $crate

Audit the macros for this -- the latter is correct in case of rename-after-import.

Clippy

I guess it is time to let clippy be part of CI....

TskWrappedArray life times

This type wraps the raw C arrays and is used by NodeIterator objects derived from Tree. However, the type is also public, resulting in a potential safety issue. If the parent Tree is dropped, the array content is invalidated.

Potential solutions:

Make the arrays private
Try to figure out lifetime annotations.

Consider Option<Self> for return type of MetadataRoundtrip::decode

Currently, we require that metadata decoding return an object. However, metadata are not strictly required for all rows of a table. An optional return type make sense, then, instead of forcing some kind of default return value.

Reading metadata using Python?

It's currently unclear how to do this. No effort has been made to test it. It seems quite likely that any metadata encoded using JSON would work, but we should actually show that. It is anyone's guess for the other encoding methods, as it's unclear that we can predict the layout.

Make "raw" tree arrays pub

Right now, these are private. Make them public, document them.

"Table"-ness should be a trait?

The "new" function, number of rows, and metadata are repeated for all the table types. What does it take to encapsulate these concepts into a trait?

Some flags are also re-exports

These are currently re-exported from lib.rs:

pub use bindings::TSK_NO_BUILD_INDEXES;	
pub use bindings::TSK_SAMPLE_LISTS;

They can be safely removed.

Add access to table indexes via slices

Custom iteration algorithms could make good use of this feature. The slices will be non-owning due to from_raw_parts.

TableCollection should impl Clone

This would be more idiomatic than the current deepcopy fn, which should probably be deprecated.

The newtypes for table row IDs should be in the prelude.

Adding to the prelude lets client code quickly import symbol names that are hard to live without. I think these ID types would count?

Abstract type wrapping to a macro?

The current TableCollection relies on a heap-allocated tsk_table_collection_t stored in a Box. Thus, we have to do all this "maybe uninitialized" stuff. It would be an improvement to take a cue from rust-GSL and offload the necessary boiler plate to a macro:

Pass in the type name, "tskit init" function name, and "tskit free" function name.
Heap allocate the type.
Init.
Implement Drop using the free function.
Provide boiler plate for owning and non-owning wrappers, etc..

The nice thing is that this can all be done w/o breaking user-facing API and we can re-use the macro for other types.

Move tests to tests/

This will clean up the main files and should also lead to better coverage calculations?

Table row Iterator trait pattern should be a macro

There's a good bit of code duplication implementing Iterator for all the Table/TableRef row types.
This implementation code should just be a macro: the types backing the iterators are only accessed by their Iterator behavior.

Collect all public traits in traits.rs

Type erasure needed for TableAccess trait and all table "iter" methods

Several interfaces return objects implementing Iterator. Currently, the functions return a type for which Iterator is implemented, giving the mistaken impression that there's a struct whose implementation details need to be considered in client code. For most of these, however, all client code needs to worry about is that they impl Iterator. (In fact, I think this is true for all of them...)

We can make these types pub(crate), preventing the names from leaking from the library. Change the function declarations to reflect returning impl Iterator<type=X> (or Box<dyn ..> when the function is declared in a trait).

These changes won't break API as the interface remains unchanged.

Docs error: total_branch_length will not return an error.

The current docs are wrong. The iterator currently will not error (because iteration cannot). Thus, and error during iteration will panic! instead.

TskitError should back trace

Right now, we don't get full back trace support, but it should be possible. See here.

Update docs re: metadata

In order to release 0.1.1 w/the new tskit crate name, the docs needs to be gone over for how they discuss metadata.
We probably need more examples, too?

This issue is a placeholder/reminder.

Need marker traits for metadata.

Currently, a single trait defines metadata. This leads to a potential problem for external APIs writing "export to tskit" functions. One could send the wrong metadata to the wrong table.

Since all metadata need the same API, then a simple set of marker traits will suffice to distinguish Node from Individual metadata, etc..

Create test fixtures module

In #64, I discovered that you can share code b/w test modules. Should expand on that idea and collect stuff into lib/test_fixtures.rs. One thing to put in there is a simple forward sim, but there's a question about how to do that but not have rand be a dependency all the time.

macro metadata_to_vector has unused argument

The macro argument $T is never used and should be removed.

Vec<u8> to c_str round trips and '\0'

tskit uses char * for mutation ancestral/derived states and metadata.
We are currently using CString to convert from Vec<u8> to char *. This rust type cannot contain any internal \0, which are 0 as u8.
An undesirable corollary of this design is that some types of metadata become excluded--any integer that is exactly 0, for example. (A u32 equal to 0 encodes as a sequence of 4 0 in u8.)

AFAIK, the tskit spec says nothing about null-termination of strings, as the intent is to use char * to store raw bytes if desired.

Extra traits needed for NodeId

At the very least, some power user scenarios need:

Add
Subtract
Whatever is needed so that a..b works...

Convert trees.rs to use NodeId

The trees API uses tsk_id_t throughout. This should change to the new NodeId type.

Need flags for file output

TableCollection and TreeSequence dump both need flag support?

Also see #69.

Separate nightly tests from stable

Since nightly can be "quirky", we should separate testing the two tool chains and only require that stable be required to pass.

Provenance table behavior with empty records

Currently, adding provenance with an empty record raises an error. This behavior differs from tskit-python. Should we change it? IMO, it is unclear what the client intended re: provenance here, signalling an error happening somewhere in the work flow?

Add forward simulation example

Iterators over nodes should be allowed to error?

Related: #114

The Item type of all these iterators is tsk_id_tt, but it should be Result.

Error in README

Getting started block refers to submodules, which are not currently used.

Add support for Provenance table

Add id to table row types

Add TableCollection functions to consume "row" objects

Named arguments are handy. Rust only really allows them for struct construction. So, we can "fake it" with an API like:

tables.add_node_table_row(NodeTableRow{ flags: 0, time: 0.1243, population: 2, individual: TSK_NULL, metadata: None }).unwrap()

Just have to check what the visibility of the row objects is. They are public, but maybe not currently from the crate root?

This is certainly undesirable to do "everywhere", but it is handy for the row adding fxns that are a bit long. The other functions that are long on the Python side are often shorter on the C side to to bit flags.

Tree sequence initialization from unindexed tables

Constant source of error--should document the correct flags, etc..

Strong typing for IDs

@MomoLangenstein suggested that it might be a good idea to enforce strong typing on integer IDs in the Rust API, so that you don't, for example accidentally pass an individual ID to a function that expects a node ID. There was some discussion in the Slack channel about this: apparently some similar type constraints were previously implemented in the tskit C API but then removed because it required typing too much boilerplate, for not much discernible benefit. But @molpopgen has just said:

Interestingly (because you can't do this in C), you seem to be able to do this in rust in a way that may not break API and would still allow the plain tsk_id_t to be passed to functions.

Perhaps @MomoLangenstein could weigh in here with what they were thinking about?

ID types needed for this to be closed:

    ) -> Box<dyn Iterator<Item = tsk_id_t> + '_> {

Else, there's a problem for future enum values.

Fix LICENSE bits

Main LICENSE fail has wrong info.
tskit/kastore should have LICENSE files
Note all of this in the README.

tskit-dev / tskit-rust Goto Github PK

tskit-rust's Issues

Recommend Projects

Recommend Topics

Recommend Org