Code Monkey home page Code Monkey logo

tskit-rust's People

Contributors

benjeffery avatar dependabot-preview[bot] avatar dependabot[bot] avatar juntyr avatar molpopgen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tskit-rust's Issues

Add TableCollection functions to consume "row" objects

Named arguments are handy. Rust only really allows them for struct construction. So, we can "fake it" with an API like:

tables.add_node_table_row(NodeTableRow{ flags: 0, time: 0.1243, population: 2, individual: TSK_NULL, metadata: None }).unwrap()

Just have to check what the visibility of the row objects is. They are public, but maybe not currently from the crate root?

This is certainly undesirable to do "everywhere", but it is handy for the row adding fxns that are a bit long. The other functions that are long on the Python side are often shorter on the C side to to bit flags.

Need marker traits for metadata.

Currently, a single trait defines metadata. This leads to a potential problem for external APIs writing "export to tskit" functions. One could send the wrong metadata to the wrong table.

Since all metadata need the same API, then a simple set of marker traits will suffice to distinguish Node from Individual metadata, etc..

Abstract type wrapping to a macro?

The current TableCollection relies on a heap-allocated tsk_table_collection_t stored in a Box. Thus, we have to do all this "maybe uninitialized" stuff. It would be an improvement to take a cue from rust-GSL and offload the necessary boiler plate to a macro:

  1. Pass in the type name, "tskit init" function name, and "tskit free" function name.
  2. Heap allocate the type.
  3. Init.
  4. Implement Drop using the free function.
  5. Provide boiler plate for owning and non-owning wrappers, etc..

The nice thing is that this can all be done w/o breaking user-facing API and we can re-use the macro for other types.

Some flags are also re-exports

These are currently re-exported from lib.rs:

pub use bindings::TSK_NO_BUILD_INDEXES;	
pub use bindings::TSK_SAMPLE_LISTS;

They can be safely removed.

Provenance table behavior with empty records

Currently, adding provenance with an empty record raises an error. This behavior differs from tskit-python. Should we change it? IMO, it is unclear what the client intended re: provenance here, signalling an error happening somewhere in the work flow?

"Table"-ness should be a trait?

The "new" function, number of rows, and metadata are repeated for all the table types. What does it take to encapsulate these concepts into a trait?

Add a prelude.

This should include fundamental type definitions, constants, and public traits.

Adding tables rows with metadata

The various add_foo_with_metadata are a verbose API due to Option. For readability purposes, _with_metadata and Option don't quite align.

A fairly simple solution is to have a third (non-pub) fn that does the real work to handle the Option.

Type erasure needed for TableAccess trait and all table "iter" methods

Several interfaces return objects implementing Iterator. Currently, the functions return a type for which Iterator is implemented, giving the mistaken impression that there's a struct whose implementation details need to be considered in client code. For most of these, however, all client code needs to worry about is that they impl Iterator. (In fact, I think this is true for all of them...)

We can make these types pub(crate), preventing the names from leaking from the library. Change the function declarations to reflect returning impl Iterator<type=X> (or Box<dyn ..> when the function is declared in a trait).

These changes won't break API as the interface remains unchanged.

Table "row" object interface

The functions to return "row" structs have an option called decode_metadata. This name is wrong. It should be include_metadata. It may be preferable to simply remove the option entirely if we can figure out a way to safely obtain a slice based on the metadata.

Note: renaming the argument type does not break API. Removing it does.

Fix LICENSE bits

  1. Main LICENSE fail has wrong info.
  2. tskit/kastore should have LICENSE files
  3. Note all of this in the README.

Table row Iterator trait pattern should be a macro

There's a good bit of code duplication implementing Iterator for all the Table/TableRef row types.
This implementation code should just be a macro: the types backing the iterators are only accessed by their Iterator behavior.

Vec<u8> to c_str round trips and '\0'

tskit uses char * for mutation ancestral/derived states and metadata.
We are currently using CString to convert from Vec<u8> to char *. This rust type cannot contain any internal \0, which are 0 as u8.
An undesirable corollary of this design is that some types of metadata become excluded--any integer that is exactly 0, for example. (A u32 equal to 0 encodes as a sequence of 4 0 in u8.)

AFAIK, the tskit spec says nothing about null-termination of strings, as the intent is to use char * to store raw bytes if desired.

Add table iterators

Should be simple with iterator adapters and row proxy classes. The metadata may be a challenge?

Error in README

Getting started block refers to submodules, which are not currently used.

Create test fixtures module

In #64, I discovered that you can share code b/w test modules. Should expand on that idea and collect stuff into lib/test_fixtures.rs. One thing to put in there is a simple forward sim, but there's a question about how to do that but not have rand be a dependency all the time.

Reading metadata using Python?

It's currently unclear how to do this. No effort has been made to test it. It seems quite likely that any metadata encoded using JSON would work, but we should actually show that. It is anyone's guess for the other encoding methods, as it's unclear that we can predict the layout.

TskWrappedArray life times

This type wraps the raw C arrays and is used by NodeIterator objects derived from Tree. However, the type is also public, resulting in a potential safety issue. If the parent Tree is dropped, the array content is invalidated.

Potential solutions:

  1. Make the arrays private
  2. Try to figure out lifetime annotations.

Move tests to tests/

This will clean up the main files and should also lead to better coverage calculations?

Update docs re: metadata

In order to release 0.1.1 w/the new tskit crate name, the docs needs to be gone over for how they discuss metadata.
We probably need more examples, too?

This issue is a placeholder/reminder.

Strong typing for IDs

@MomoLangenstein suggested that it might be a good idea to enforce strong typing on integer IDs in the Rust API, so that you don't, for example accidentally pass an individual ID to a function that expects a node ID. There was some discussion in the Slack channel about this: apparently some similar type constraints were previously implemented in the tskit C API but then removed because it required typing too much boilerplate, for not much discernible benefit. But @molpopgen has just said:

Interestingly (because you can't do this in C), you seem to be able to do this in rust in a way that may not break API and would still allow the plain tsk_id_t to be passed to functions.

Perhaps @MomoLangenstein could weigh in here with what they were thinking about?

ID types needed for this to be closed:

  • Node, done via #129
  • Individual, done via #133
  • Population, done via #135
  • Edge, via #138
  • Site, via #136
  • Mutation, via #136
  • Provenance, via #137
  • Migration, via #137

Clippy

I guess it is time to let clippy be part of CI....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.