Comments (5)
I've spent some time familiarizing myself with the workflow, which was good to do in-and-of-itself! I'm thinking...
- Let's just grab
label.ont
from R. We can also add an argument to the R script indicating which label to return with a default of"label.ont"
- (Edit) - The R script could also be updated such that it only checks for the given label to return. Doesn't need to check for labels that won't be used.
- In terms of providing a single model, there are 2 main ways I can see to do this:
- We could directly updated the
scpca-project-celltype-metadata.tsv
file itself, but with just one line perscpca_project_id
. This seems to be the right move imo? Or at least better than below.. - We could keep
scpca-project-celltype-metadata.tsv
as-is, but also somehow provide information to the workflow about which model in that file we'd want to use. That would probably mean modifying the library metadata file, which I'm rather hesitant to do...
- We could directly updated the
@allyhawkins, any feedback here? Is there another approach here I'm not immediately seeing that you are thinking of? Thanks!
from scpca-nf.
Let's just grab
label.ont
from R. We can also add an argument to the R script indicating which label to return with a default of"label.ont"
- (Edit) - The R script could also be updated such that it only checks for the given label to return. Doesn't need to check for labels that won't be used.
Yes, the main change I would like to see is that we only output label.ont
here. So that means the models that get read in should either only have label.ont,
or we have to read them in and only grab the label.ont
model.
For some additional context, when we create the models they are stored as a list of models within a single rds
file. This means when we read in the reference model for HumanPrimaryCellAtlasData
we are reading in three models. This is why there is a lot of purrr
happening in the classify_SingleR.R
script. I think we could do one of the following:
- update the original reference building to contain a
label_type
option and only train a model using one label, not all three. By default, this should belabel.ont
. Then the only changes that need to be made inclassify_SingleR.R
will be to accommodate reading inrds
files that contain a single model object rather than 3 of them. - Keep the references the same and then specifically only grab the
label.ont
model and use that for classifying cells.
I think I favor changing it at the stage where we build the reference models. I think this would simplify everything so that we only work with the label.ont
moving forward, which is what we want.
An additional comment related to this is to change our output to have two columns - one with the ontology id and one where we grab the full cell ontology label using ontoProc
.
We could directly updated the scpca-project-celltype-metadata.tsv file itself, but with just one line per scpca_project_id. This seems to be the right move imo? Or at least better than below..
I agree that we can keep the overall nextflow part the same and just change the number of references per project. This can be done as a last step. I think the main focus here would be only to get label.ont
and remove the other labels since we are less interested in that.
from scpca-nf.
I think this would simplify everything so that we only work with the
label.ont
moving forward, which is what we want.
👍
Thanks for the feedback here! I'm going to split this issue up into 2 and assign myself to them -
- update workflow to take only 1 label
- involves update reference building itself.
- probably still want (at least it can't hurt) to have an
opt
in the R script for this with defaultlabel.ont
, given the use of alabel_type
param in the reference building workflow (and I suppose also uselabel_type
param in the annotation workflow too!)- I wonder if it would be helpful to have a separate celltyping config file where
label_type
andcelltype_refs_metafile
could live?
- I wonder if it would be helpful to have a separate celltyping config file where
- blocked by ^, updating the workflow to take only 1 model
from scpca-nf.
I wonder if it would be helpful to have a separate celltyping config file where label_type and celltype_refs_metafile could live?
Right now we have the celltype_refs_metafile
defined in the ccdl profile. I think we could leave it like that for now and add in label_type
there?
scpca-nf/config/profile_ccdl.config
Line 16 in 4b4de3c
from scpca-nf.
This issue was broken out into #380 and #381, which were respectively closed by #382 and #383. All set!
from scpca-nf.
Related Issues (20)
- Include instructions for specifying `merge_run_ids` when merging projects in external instructions
- Skip creation of merged objects HOT 1
- Fix column name typos HOT 4
- Future idea: Create merged objects for projects with multiplexed libraries containing all non-multiplexed single-cell libraries
- Prepare for scpca-nf release v0.7.3
- Make sure CellAssign is skipped for any objects with just 1 cell HOT 1
- [BUG] Age in sample_metadata is inconsistently typed HOT 3
- Discussion: Rename AnnData objects with .h5ad extension HOT 6
- [BUG] Account for grabbing estimated demux cell counts for libraries with no genetic demultiplexing HOT 1
- `project_celltype_metafile` parameter is missing from scpca-nf schema
- Test workflow with Bioc3.19 images HOT 1
- Use more specialized docker images for processes HOT 2
- Prepare for scpca-nf release `v0.8.1`
- Consider using nf-schema plugin to validate inputs
- Use new smaller images in processes HOT 1
- Test use of smaller Docker images in workflow HOT 1
- Re-order bulk metadata to match order for overall sample metadata HOT 1
- Add sample metadata table to QC and cell type reports HOT 5
- Output bulk data to a `bulk` folder rather than individual files within project directory HOT 1
- Prepare for scpca-nf release `v0.8.2` HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scpca-nf.