Code Monkey home page Code Monkey logo

Comments (5)

sjspielman avatar sjspielman commented on July 18, 2024

I've spent some time familiarizing myself with the workflow, which was good to do in-and-of-itself! I'm thinking...

  • Let's just grab label.ont from R. We can also add an argument to the R script indicating which label to return with a default of "label.ont"
    • (Edit) - The R script could also be updated such that it only checks for the given label to return. Doesn't need to check for labels that won't be used.
  • In terms of providing a single model, there are 2 main ways I can see to do this:
    • We could directly updated the scpca-project-celltype-metadata.tsv file itself, but with just one line per scpca_project_id. This seems to be the right move imo? Or at least better than below..
    • We could keep scpca-project-celltype-metadata.tsv as-is, but also somehow provide information to the workflow about which model in that file we'd want to use. That would probably mean modifying the library metadata file, which I'm rather hesitant to do...

@allyhawkins, any feedback here? Is there another approach here I'm not immediately seeing that you are thinking of? Thanks!

from scpca-nf.

allyhawkins avatar allyhawkins commented on July 18, 2024

Let's just grab label.ont from R. We can also add an argument to the R script indicating which label to return with a default of "label.ont"

  • (Edit) - The R script could also be updated such that it only checks for the given label to return. Doesn't need to check for labels that won't be used.

Yes, the main change I would like to see is that we only output label.ont here. So that means the models that get read in should either only have label.ont, or we have to read them in and only grab the label.ont model.
For some additional context, when we create the models they are stored as a list of models within a single rds file. This means when we read in the reference model for HumanPrimaryCellAtlasData we are reading in three models. This is why there is a lot of purrr happening in the classify_SingleR.R script. I think we could do one of the following:

  • update the original reference building to contain a label_type option and only train a model using one label, not all three. By default, this should be label.ont. Then the only changes that need to be made in classify_SingleR.R will be to accommodate reading in rds files that contain a single model object rather than 3 of them.
  • Keep the references the same and then specifically only grab the label.ont model and use that for classifying cells.

I think I favor changing it at the stage where we build the reference models. I think this would simplify everything so that we only work with the label.ont moving forward, which is what we want.

An additional comment related to this is to change our output to have two columns - one with the ontology id and one where we grab the full cell ontology label using ontoProc.

We could directly updated the scpca-project-celltype-metadata.tsv file itself, but with just one line per scpca_project_id. This seems to be the right move imo? Or at least better than below..

I agree that we can keep the overall nextflow part the same and just change the number of references per project. This can be done as a last step. I think the main focus here would be only to get label.ont and remove the other labels since we are less interested in that.

from scpca-nf.

sjspielman avatar sjspielman commented on July 18, 2024

I think this would simplify everything so that we only work with the label.ont moving forward, which is what we want.

👍

Thanks for the feedback here! I'm going to split this issue up into 2 and assign myself to them -

  1. update workflow to take only 1 label
  • involves update reference building itself.
  • probably still want (at least it can't hurt) to have an opt in the R script for this with default label.ont, given the use of a label_type param in the reference building workflow (and I suppose also use label_type param in the annotation workflow too!)
    • I wonder if it would be helpful to have a separate celltyping config file where label_type and celltype_refs_metafile could live?
  1. blocked by ^, updating the workflow to take only 1 model

from scpca-nf.

allyhawkins avatar allyhawkins commented on July 18, 2024

I wonder if it would be helpful to have a separate celltyping config file where label_type and celltype_refs_metafile could live?

Right now we have the celltype_refs_metafile defined in the ccdl profile. I think we could leave it like that for now and add in label_type there?

celltype_refs_metafile = "s3://ccdl-scpca-data/sample_info/celltype_annotation/scpca-project-celltype-metadata.tsv"

from scpca-nf.

sjspielman avatar sjspielman commented on July 18, 2024

This issue was broken out into #380 and #381, which were respectively closed by #382 and #383. All set!

from scpca-nf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.