Comments (6)
Tagging @allyhawkins, @sjspielman and @jaclyn-taroni for comments
from scpca-nf.
This can be implemented with a change in write_rds() from compress='gz' to compress='bz2', and will be completely transparent to users as far as code goes, as the result is still an rds file
To me, this seems like a no-brainer - definitely do it.
One other idea I had was to remove the miQC_model object from the "processed" files. At that point, the data in the object can't really be used as the rejected cells have already been removed. This object is also fairly large (100MB or more) and does not appear to compress well, so removing it would save a significant amount of space, and having it in both the "filtered" and "processed" data seems redundant. This change would require a docs update as well.
I wasn't sure how I felt about this at first, but you've convinced me with At that point, the data in the object can't really be used as the rejected cells have already been removed.
As long as we keep it around somewhere (in filtered
) I'm ok removing from processed
.
from scpca-nf.
In my testing, this takes about 50% longer (e.g. 35 seconds vs 21 seconds, though so not much really), but results in files ~50% smaller. However, the read time increases substantially, from ~1.4 to 9.4 seconds in the example I tested. I tend to think the tradeoff is likely worth it for our use case, but others may disagree, so we should discuss this!
My concern would be read time for the larger merged objects, not necessarily the smaller individual objects. I think at least for the individual RDS files then this proposal makes sense.
One other idea I had was to remove the miQC_model object from the "processed" files. At that point, the data in the object can't really be used as the rejected cells have already been removed. This object is also fairly large (100MB or more) and does not appear to compress well, so removing it would save a significant amount of space, and having it in both the "filtered" and "processed" data seems redundant. This change would require a docs update as well.
I'm totally onboard with removing it from the processed objects.
from scpca-nf.
My concern would be read time for the larger merged objects, not necessarily the smaller individual objects. I think at least for the individual RDS files then this proposal makes sense.
Good middle ground!
from scpca-nf.
closed by #712
from scpca-nf.
Though docs updates are still pending in AlexsLemonade/scpca-docs#273
from scpca-nf.
Related Issues (20)
- [BUG] CellAssign process getting killed with `OutOfMemoryError` HOT 10
- Add is_xenograft & is_cell_line to example metadata HOT 6
- Create nextflow_schema.json file HOT 1
- Include instructions for specifying `merge_run_ids` when merging projects in external instructions
- Skip creation of merged objects HOT 1
- Fix column name typos HOT 4
- Future idea: Create merged objects for projects with multiplexed libraries containing all non-multiplexed single-cell libraries
- Prepare for scpca-nf release v0.7.3
- Make sure CellAssign is skipped for any objects with just 1 cell HOT 1
- [BUG] Age in sample_metadata is inconsistently typed HOT 3
- Discussion: Rename AnnData objects with .h5ad extension HOT 6
- [BUG] Account for grabbing estimated demux cell counts for libraries with no genetic demultiplexing HOT 1
- `project_celltype_metafile` parameter is missing from scpca-nf schema
- Test workflow with Bioc3.19 images HOT 1
- Use more specialized docker images for processes HOT 2
- Prepare for scpca-nf release `v0.8.1`
- Consider using nf-schema plugin to validate inputs
- Use new smaller images in processes HOT 1
- Test use of smaller Docker images in workflow HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scpca-nf.