For teaching purposes, it's nice to be able to do <div class="highlight highlight-

Another option is <div class="snippet-clipboard-content notranslate position-relat

Pass a numpy array of booleans to python simplify? about tskit HOT 7 CLOSED

hyanwong commented on September 16, 2024

Pass a numpy array of booleans to python simplify?

from tskit.

Comments (7)

petrelharp commented on September 16, 2024 2

FWIW, I can't think of a single function in R that overloads arguments like that - i.e., takes either a vector of indices or a vector of booleans. Indexing works like that, but only indexing. Overloading can lead to nasty corner cases.

from tskit.

jeromekelleher commented on September 16, 2024

It's a bunch of work to do, and not that much different? Seems to me that you'd have to explain more by adding the boolean selector for nodes as well as the list of nodes option.

from tskit.

hyanwong commented on September 16, 2024

It's only a 5 line addition at the top of tables.simplify(), isn't it?

try:
    if len(samples) == ts.num_nodes and samples.dtype == bool:
        samples = np.where(samples)[0]
except AttributeError:
   pass

You're right that it's not that different, and it's not really a priority, but I'm finding that every extra barrier to new tskit users puts some of them off (and takes an extra few minutes to explain). There's a reason why numpy allows both boolean and numerical indexing. I'm not sure I would actually explain in a practical that you can use both: it's obvious from the context, right?

from tskit.

jeromekelleher commented on September 16, 2024

As usual with these things, implementation is by far the easiest thing and something like 5% of the actual effort. Testing, documenting and making sure there are no regressions against existing code make the majority of the work.

from tskit.

petrelharp commented on September 16, 2024

I agree. Higher on my list would be adding an individuals argument.

from tskit.

hyanwong commented on September 16, 2024

I guess part of the problem is nothing to do with tskit, but the very unintuitive np.where syntax which requires the [0] at the end (which I always forget). The other option is

ts.simplify(np.flatnonzero(ts.nodes_time == 0))

but that's hardly more intuitive, IMO. Or alternatively

ts.simplify((ts.nodes_time == 0).nonzero()[0])

Which is equally cryptic, but at least doesn't need you to use the np prefix, which again requires explanation when it's in the first few lines of a beginner's tutorial.

Anyway, if no-one thinks it's a good idea, I'll close this. However, it's worth pointing out that I'm finding it hard to introduce tskit to new users (especially from R, and if they don't know numpy).

from tskit.

petrelharp commented on September 16, 2024

Another option is

today_nodes = np.arange(ts.num_nodes)[ts.nodes_time == 0]
ts.simplify(today_nodes)

from tskit.

Pass a numpy array of booleans to python simplify? about tskit HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent