Comments (13)
@sjspielman, you probably want to use the release checklist template to track this: https://github.com/AlexsLemonade/scpca-nf/issues/new?assignees=&labels=release&projects=&template=release-checklist.md&title=Prepare+for+scpca-nf+release+vX.X.X
from scpca-nf.
Noting also we have updated approaches for references which would get included in a new release containing everything in development. I would think this feature would need to be more fully documented vs celltyping/integration. I see changes towards internal docs in #310 and changes for external docs in #306; just confirming here whether there are any more docs needed for that feature (it seems set to me?).
These updates have been accounted for in the docs, so we are good to release those.
I'm also good either way, so this is helpful since my original plan was to not include cell-typing or integration! Since we'd call these features experimental for now, I imagine they would get loosely documented in release notes (my note to self..), but I wonder if there's anything else we'd want to add to internal or external instructions just reminding folks that they are experimental. That said, I don't think this is critical; any other opinions?
I think including them as experimental still in the release notes is good.
from scpca-nf.
I'm not sure we should try to pick out changes at this point. There is enough interleaved that I think it may simply be better to include all updates to this point, unless there are specific changes we think should not go in.
from scpca-nf.
I'm not sure we should try to pick out changes at this point. There is enough interleaved that I think it may simply be better to include all updates to this point, unless there are specific changes we think should not go in.
If we do that it includes the cell type workflow and integration workflows. Right now they are separate so don't get run by the main workflow, but just noting that they do exist so maybe we don't want to include those?
from scpca-nf.
I'm not sure we should try to pick out changes at this point. There is enough interleaved that I think it may simply be better to include all updates to this point, unless there are specific changes we think should not go in.
If we do that it includes the cell type workflow and integration workflows. Right now they are separate so don't get run by the main workflow, but just noting that they do exist so maybe we don't want to include those?
Maybe not ideal, but I think I would rather have them in main
as "experimental" than worry about the morass of manually picking out commits. The longer we keep development
and main
on separate paths, the more trouble we are setting ourselves up for.
Another option (which I don't know how I feel about), is branching off "development" with cell typing and integration as their own branches, then removing those workflows from development. I think this is likely to end up with some pretty error-prone merges too though (unless we remove them first, then branch and revert the removal commits in the new branches). No great options here, I guess.
from scpca-nf.
Argh, the template of course...
Let's continue to chat about plans for release here re-"which commits/features" and I'll open a better issue from template for the release itself.
from scpca-nf.
Maybe not ideal, but I think I would rather have them in main as "experimental" than worry about the morass of manually picking out commits. The longer we keep development and main on separate paths, the more trouble we are setting ourselves up for.
I'm okay with including them and avoiding errors and issues with keeping things in sync. I just wanted to make sure everyone knew that those will also be present if we merge in all changes.
from scpca-nf.
I'm okay with including them and avoiding errors and issues with keeping things in sync. I just wanted to make sure everyone knew that those will also be present if we merge in all changes.
I'm also good either way, so this is helpful since my original plan was to not include cell-typing or integration! Since we'd call these features experimental for now, I imagine they would get loosely documented in release notes (my note to self..), but I wonder if there's anything else we'd want to add to internal or external instructions just reminding folks that they are experimental. That said, I don't think this is critical; any other opinions?
from scpca-nf.
Noting also we have updated approaches for references which would get included in a new release containing everything in development
. I would think this feature would need to be more fully documented vs celltyping/integration. I see changes towards internal docs in #310 and changes for external docs in #306; just confirming here whether there are any more docs needed for that feature (it seems set to me?).
from scpca-nf.
To get a sense of whether there are any sneaky changes that releasing all development
commits will incur, I went ahead and did a quick-and-dirty comparison of a non-CITE-seq project (SCPCL000001
) when run fresh (no -resume
) on development
code vs what's in the portal right now. There are some differences (often resulting from inherent stochasticity) that are worth being aware of:
salmon
version differences (is this really minimal?)- different numbers of filtered/processed SCE cells! (blame empty drops filtering for this; portal has 7 more than
development
)- as a consequence, we have some minor differences in QC stats
- most interesting to me was the HVG vector, which differs for 14 of the genes. this seems important!
So, the question is, do these changes raise to the level of needing to re-process all projects with this new release? If so, would it make more sense to cherry-pick such that only CITE-seq projects would be meaningfully affected by the re-release? Would a whole ScPCA refresh make sense at this stage anyways?
I'm attaching here the two QC reports for SCE version as well as a super quick notebook I compared SCEs in.
upload.zip
from scpca-nf.
Thanks for doing this comparison. It seems quite likely that most of the changes are to do with updated containers. Notably the salmon
change happened a little while ago and is already in the newer releases, and the EmptyDrops changes are probably due to updates in scpcaTools
, which also are already in main
.
As we do record the pipeline version in the outputs (though apparently not in the QC report), and these changes appear quite minor (I expect the changes in HVG are for genes at the lower end of the vector, which you can confirm) I am not concerned that we need to reprocess everything at this stage. As we add functionality (cell typing), everything will get rerun through at least the later part of the pipelin (not including salmon
as the design is not to require that).
from scpca-nf.
I expect the changes in HVG are for genes at the lower end of the vector, which you can confirm
Yep -
which(!(metadata(dev)$highly_variable_genes %in% metadata(portal)$highly_variable_genes))
> [1] 1911 1921 1938 1944 1953 1955 1958 1961 1968 1972 1975 1976 1977 1980 1981 1985 1986 1988 1990 1991 1992 1994
[23] 1995 1997
from scpca-nf.
Going to close out this discussion since we have a plan now moving forward. See #354 for the actual release issue.
from scpca-nf.
Related Issues (20)
- Ideas for reducing file sizes HOT 6
- Add is_xenograft & is_cell_line to example metadata HOT 6
- Create nextflow_schema.json file HOT 1
- Include instructions for specifying `merge_run_ids` when merging projects in external instructions
- Skip creation of merged objects HOT 1
- Fix column name typos HOT 4
- Future idea: Create merged objects for projects with multiplexed libraries containing all non-multiplexed single-cell libraries
- Prepare for scpca-nf release v0.7.3
- Make sure CellAssign is skipped for any objects with just 1 cell HOT 1
- [BUG] Age in sample_metadata is inconsistently typed HOT 3
- Discussion: Rename AnnData objects with .h5ad extension HOT 6
- [BUG] Account for grabbing estimated demux cell counts for libraries with no genetic demultiplexing HOT 1
- `project_celltype_metafile` parameter is missing from scpca-nf schema
- Test workflow with Bioc3.19 images HOT 1
- Use more specialized docker images for processes HOT 2
- Prepare for scpca-nf release `v0.8.1`
- Consider using nf-schema plugin to validate inputs
- Use new smaller images in processes HOT 1
- Test use of smaller Docker images in workflow HOT 1
- Re-order bulk metadata to match order for overall sample metadata
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scpca-nf.