cytomining / cytominer-eval Goto Github PK
View Code? Open in Web Editor NEWCommon Evaluation Metrics for DataFrames
License: BSD 3-Clause "New" or "Revised" License
Common Evaluation Metrics for DataFrames
License: BSD 3-Clause "New" or "Revised" License
@gwaygenomics reported this in https://github.com/broadinstitute/lincs-profiling-complementarity
Additionally, because we had 5 replicates of Cell Painting and 3 replicates of L1000 per treatment, we performed a subsampling experiment with Cell Painting to match the number of L1000 replicates. We actually observed slightly increased percent replicating scores in the subsampled Cell Painting data with fewer replicates indicating that the two additional Cell Painting replicates did not artificially inflate percent replicating scores (Supplementary Figure 3).
When reading the source code for cytominer_eval.operations.precision_recall()
I noticed that the similarity_melted_df variable counts each replicate pair twice, e.g. A1 --> A2 and A2 --> A1.
This becomes a problem because only the first replicate_group_col in lines 49-52 is subsequently used for grouping:
49 replicate_group_cols = [
50 "{x}{suf}".format(x=x, suf=pair_ids[list(pair_ids)[0]]["suffix"]) # [0] keeps only the first of two grouping columns
51 for x in replicate_groups
52 ]
In the next step, each group is passed to calculate_precision_recall()
:
59 precision_recall_df_at_k = similarity_melted_df.groupby(
60 replicate_group_cols
61 ).apply(lambda x: calculate_precision_recall(x, k=k_))
62 precision_recall_df = precision_recall_df.append(precision_recall_df_at_k)
With the effect that all samples from within a group are counted twice. However, samples from outside the group are only counted once because group_by
will filter out one direction.
Let me clarify this with an example. Consider 5 samples, the first 3 from group 'A', the second 2 from group 'B', both with greater within-group than between group correlations:
Then what calculate_precision_recall
will see is this:
For example, one can see that the sample_pair_a
column has a row for A1-->A2
and one for A2-->A1
but only one for A1-->B1
. B1-->A1
is missing because of the way the melted data frame is generated and the grouping is performed. One can also see that the similarity metrics for within group connections appear in duplicates.
Accordingly the outcome for precision and recall at k=4 is the following:
Precision: all 4 closest connections are from within group for A but only 2 for group B.
Recall: 4/6 connections found for A but all 2 found for B.
In summary, the computations are not entirely correct, especially for smaller groups. Also consider that with odd values for k only one of the two connections of the symmetric pair is used.
Admittedly, this is a bit mind-boggling. I recommend using a debugger if you want to trace all the steps in detail by yourself.
Proposed solution: I would suggest to count each pair only once when creating the melted data frame.
also, I'm wondering if you can optimize this (probably don't need to change for this PR - it is likely beyond scope), but if you split out the two queries, is it faster?
e.g.
nongroup_replicate_query = replicate_truth_df.query("not group_replicate")
group_replicate_query = replicate_truth_df.query("group_replicate")v11 = group_replicate_query("similarity_metric > @Threshold")
and so on...
optmize enrichment
@shntnu notes in #21 (comment) that there are two ways to report the calculation of percent_strong
.
We should add a boolean argument named median_coherence
and set the default to be False.
This argument will perform the second version as described in #21
This is an optimization issue.
When running metrics such as precision-recall or enrichment, like in the demo book: https://github.com/cytomining/cytominer-eval/blob/master/demos/CellPainting_Demo.ipynb
You call evaluate several times to calculate the metric at different value of p e.g.
for p in np.arange(0.99, 0.96, -0.01):
r = evaluate(
profiles=df,
features=features,
meta_features=meta_features,
replicate_groups=["Metadata_gene_name"],
operation="enrichment",
similarity_metric="pearson",
enrichment_percentile = p,
)
result.append(r)
However, this calls the function cytominer_eval.transform.get_pairwise_metric
several times (once per call of evaluate). This is not necessary since the metrics can be retrieved from the same similarity_melted_df
!!
We need to adapt the function evaluate such that it either only calculates the pairs once, when called several times. Or we maybe just need to change the demos to show that when calculating several values, you must not use evaluate for them.
What are your thoughts @gwaygenomics
(Stubs for now, so we can add this documentation to code later)
Precision@k can be computed in many different contexts:
Recall@k is defined similarly e.g. the fraction of all the replicates that are among the k most-similar wells.
Here is a list of ideas on what tests to add to the precision recall function. This is needed because #63 is only focussed on results and not on good tests.
Test which is about replicate groups, not about groupby_cols
We need to implement an enrichment analysis
in this repo.
The goal is to be able to apply this metric to other datasets in the future, but a good example of the functionality we want can be found here: https://github.com/broadinstitute/DeepProfilerExperiments/blob/0dd69555d5dc4244e9c8b66bec509fa4399cef35/profiling/quality.py
@gwaygenomics just a heads up that the code I've been using to test things out here (private) and here is evolving towards getting a consistent API. I'm wary of creating a package – it will be yet another bifurcation in our efforts – so I am actively avoiding doing so (as you can see, a single R file!) while still following R package principles (docstrings, explicit namespaces).
Let's make a decision on what to do once we've written up the benchmark paper. Definitely no R package for now, and we can see where we are post-submission and figure out plans then. Ideally, I'd just switch over to Python and life will be simpler :D But that will really slow me down!
The mp_value
function operates on the input profiles, and not the similarity melted dataframe.
cytominer-eval/cytominer_eval/evaluate.py
Lines 59 to 66 in fdda99e
We should skip this calculation when operation == "mp_value"
:
cytominer-eval/cytominer_eval/evaluate.py
Lines 31 to 37 in fdda99e
The only option is to define based on mean. I need to add an option to define based on median.
Hi there!
@gwaygenomics as we discussed I am looking on adding the multidimensional perturbation value (mp-value, DOI:
10.1177/1087057112469257) as another metric to which you could compare the grit score. This actually sounds like solving #23 (the mp-value is basically the Mahalanobis distance to the negative controls on the PCA-reduced space).
To clarify and make sure this would fit in this project, could you please confirm that these are the steps it would require?
cytominer_eval.operations
define a new operation mp_value
.test-evaluate.py
, add a method test_evaluate_mp_value
.evaluate.py
, add a case for operation == "mp_value"
.Cheers!
(Stubs for now, so we can add this documentation to code later)
Percent strong is reported in two ways. We should distinguish between these ways of reporting (they are similar but not the same)
The second version can be a bit confusing so here is an example:
(Stubs for now, so we can add this documentation to code later)
Grit can be computed in many different contexts:
A replicate's grit is its average similarity to other replicates, z-scored against its similarity to negative control replicates.
A guide's grit is its average similarity to other guides targeting the same gene, z-scored against its similarity to negative control guides.
A compound's grit (in the context of MOA) is its average similarity to other compounds in the same MOA class, z-scored against its similarity to DMSO replicates.
Currently, the only option in evaluate()
is to provide a pandas dataframe with metadata and profile features. The function then calculates and melts the similarity matrix.
We should provide an option to evaluate precomputed melted similarity matrices to speed up computation.
Soo,
For prec_recall and Hitk we know have the case that the input groupby columns determines by which columns the similarity df is sorted. This has a important impact on your solution. If you for example sort by something that is not unique, ie not unique in the input df - then you will get internal connections in the sub dataframe that you are grouping.
Lets say you have for example a df with Sampels and different dosages. If you then have groupby_columns = Metadata_broad_sample, then you will sort into sub groups that have several connections within each other (all the different doses). And your precision will have the weird effects that @FloHu described in #62 for example. Similarly, hitk will have weird results because you are now looking at internal connections and not only the nearest neighbors of one sample.
Either we keep it all this way and make users aware of this or we find some workaround here? Maybe the solution is to not allow anything other than unique groupby_cols ?
At some point in the future, we should consider making the minimum pandas version >=1.2 in the requirements.txt file.
I debated performing this in #55 because of one silly deprecating in pandas.testing.assert_frame_equal
(rtol
argument, see 03c09ee). However, I decided that this version upgrade and deprecation should probably be thought through in more detail.
One idea is to mirror the version requirements of pycytominer since both of them are likely going to be used together often. We should also think about certain projects that might no longer be able to use cytominer-eval after this decision, although I think that should be a non-issue since those other projects are probably using a specific cytominer-eval hash.
In #55 @michaelbornholdt noticed that black
was making unanticipated changes, and I thought that it could be some version difference.
To make it consistent (and to avoid these confusions in the future), we should add an auto-black github action: https://github.com/psf/black/actions/runs/17913292/workflow
Nyffeler J, Haggard DE, Willis C, Setzer RW, Judson R, Paul-Friedman K, Everett LJ, & Harrill JA. (2020). Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data. SLAS Discov, 2472555220950245. doi: 10.1177/2472555220950245
https://paperpile.com/shared/SaNURy
h/t @AnneCarpenter
In operations.py calculate_precision_recall() is called in line 67. When the replicate_groups has just one member, this fails. Before I investigate this in more detail: should we return NA in such cases with a warning or simply throw a more informative exception? At the moment a division by zero error is raised without additional information.
One functionality we want in this repo is the ability to split profiles into specific feature subgroups. @niranjchandrasekaran performed a split on channels in broadinstitute/lincs-cell-painting#51
It would be great to replicate similar functionality in this package
(Stub) Mahalanobis distance of each data point to the negative controls
Enrichment Analysis is a method currently used by Juan. His code can be found here:
https://github.com/broadinstitute/DeepProfilerExperiments/blob/master/ta-orf/04-downstream-analysis.ipynb
I would like to add the enrichment operation to the list of operations while using the given tools and adapting the code from Juan.
Something weird is going on, need to see why more than one audit groups doesn't appear to be working...
This update should be done soon after #63 since it is fast to implement and important for me to use for my thesis.
I will simply add the option to do precision at R which will set k to the number of correct neighbors such that the precision can always be between 0 and 1 no matter how many MOA replicas exist for a given class.
In the readme front page the link for the most recent cytominer-eval installation is given as
pip install git+git://github.com/cytomining/cytominer-eval@f7f5b293da54d870e8ba86bacf7dbc874bb79565
However, my company server only allows https access. For such cases the link should be
pip install git+https://github.com/cytomining/cytominer-eval@f7f5b293da54d870e8ba86bacf7dbc874bb79565
Please add this to the readme file and include it also in other similar installation instructions. :)
Currently, enrichment.py
is lacking a step in which we assert the melt was performed correctly.
We need to add this assertion in a future pull request, which will also add an "enrichment" option to the assert_melt function.
We currently calculate a pairwise correlation matrix, and we often want to also calculate median pairwise correlations per replicate.
This should be fairly straightforward to implement. I already calculate median pairwise correlation per replicate using cytominer-eval
in one of the grit-benchmark analyses.
In grit()
and mp_value()
specifically, we can add support for a list of columns indicating replicates vs. just a single string (so one column)
When no replicates are present, denom = 0
and we receive the following error:
/home/ubuntu/efs/Periscope_Calico/workspace/software/Periscope_Calico/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/cytominer_eval/operations/percent_strong.py:50: RuntimeWarning: invalid value encountered in long_scalars
).sum() / denom
We should add an AssertionError
when denom = 0
and provide an error message that "no replicate groups identified".
The line is
cytominer-eval/cytominer_eval/operations/percent_strong.py
Lines 45 to 50 in 56bd9e5
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.