Code Monkey home page Code Monkey logo

Comments (6)

jashapiro avatar jashapiro commented on June 22, 2024

Tagging @allyhawkins, @sjspielman and @jaclyn-taroni for comments

from scpca-nf.

sjspielman avatar sjspielman commented on June 22, 2024

This can be implemented with a change in write_rds() from compress='gz' to compress='bz2', and will be completely transparent to users as far as code goes, as the result is still an rds file

To me, this seems like a no-brainer - definitely do it.

One other idea I had was to remove the miQC_model object from the "processed" files. At that point, the data in the object can't really be used as the rejected cells have already been removed. This object is also fairly large (100MB or more) and does not appear to compress well, so removing it would save a significant amount of space, and having it in both the "filtered" and "processed" data seems redundant. This change would require a docs update as well.

I wasn't sure how I felt about this at first, but you've convinced me with At that point, the data in the object can't really be used as the rejected cells have already been removed. As long as we keep it around somewhere (in filtered) I'm ok removing from processed.

from scpca-nf.

allyhawkins avatar allyhawkins commented on June 22, 2024

In my testing, this takes about 50% longer (e.g. 35 seconds vs 21 seconds, though so not much really), but results in files ~50% smaller. However, the read time increases substantially, from ~1.4 to 9.4 seconds in the example I tested. I tend to think the tradeoff is likely worth it for our use case, but others may disagree, so we should discuss this!

My concern would be read time for the larger merged objects, not necessarily the smaller individual objects. I think at least for the individual RDS files then this proposal makes sense.

One other idea I had was to remove the miQC_model object from the "processed" files. At that point, the data in the object can't really be used as the rejected cells have already been removed. This object is also fairly large (100MB or more) and does not appear to compress well, so removing it would save a significant amount of space, and having it in both the "filtered" and "processed" data seems redundant. This change would require a docs update as well.

I'm totally onboard with removing it from the processed objects.

from scpca-nf.

sjspielman avatar sjspielman commented on June 22, 2024

My concern would be read time for the larger merged objects, not necessarily the smaller individual objects. I think at least for the individual RDS files then this proposal makes sense.

Good middle ground!

from scpca-nf.

jashapiro avatar jashapiro commented on June 22, 2024

closed by #712

from scpca-nf.

jashapiro avatar jashapiro commented on June 22, 2024

Though docs updates are still pending in AlexsLemonade/scpca-docs#273

from scpca-nf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.