Code Monkey home page Code Monkey logo

Comments (15)

bwlang avatar bwlang commented on June 25, 2024

great idea - could be a big speed improvement for slow IO environments

from galaxy-hackathon.

timothom avatar timothom commented on June 25, 2024

So are you talking about uploading and storeing/manageing the new uploaded data as compressed data? Or just uploading a compressed fastq file and then extacting it once the upload is complete server-side?

Or do you want the data to remain compressed in storage on the server?

from galaxy-hackathon.

apetkau avatar apetkau commented on June 25, 2024

This is a good idea. @timothom I think Galaxy already decompresses fastq files that are uploaded, so I assume this would involve storing as compressed data? Would this involve disabling the decompression of uploaded fastq files?

It would be nice for us in our lab to be able to operate directly on compressed fastq files though.

from galaxy-hackathon.

dpryan79 avatar dpryan79 commented on June 25, 2024

I'd love if even linking in compressed files worked seemlessly. That wouldn't require mucking with the upload tool to not autodecompress stuff.

from galaxy-hackathon.

abretaud avatar abretaud commented on June 25, 2024

This would make very happy several french galaxy admins!
We have a kind of patch/hack that works on our instances, but I'm sure there would be a more elegant way to do it: https://www.e-biogenouest.org/wiki/ManArchiveGalaxy

from galaxy-hackathon.

mvdbeek avatar mvdbeek commented on June 25, 2024

There has also been some work by @yhoogstrate in galaxyproject#2535 with a different approach.

from galaxy-hackathon.

pvanheus avatar pvanheus commented on June 25, 2024

@mvdbeek: @ashvark and I reviewed @yhoogstrate's work and it requires this squashfs thing to be installed all over the cluster, no?

from galaxy-hackathon.

pvanheus avatar pvanheus commented on June 25, 2024

@ashvark has started some work on a compressed Fastq type, see #38. @bgruening: how would this work with tools that do not support compressed fastq? And how would compressing existing datasets work - would set_meta() compress / decompress if that key changed?

Finally, see the issue @frederikcoppens mentioned on #38 - something to look out for.

from galaxy-hackathon.

frederikcoppens avatar frederikcoppens commented on June 25, 2024

@pvanheus With a new compressed fastq datatype, this would require updating the wrappers to also allow this datatype I assume? Then tools that do not support it require a conversion to use it as input.
Would adding a "convert" tool to uncompress (and compress) be an option?

from galaxy-hackathon.

ashvark avatar ashvark commented on June 25, 2024

Yes. I am planning to trying to add converters but i am afraid that would not be good idea for larger fastq files

from galaxy-hackathon.

bgruening avatar bgruening commented on June 25, 2024

Why a new format, just annotate the old format and convert tools that do not support compressed fastq to react on the metadata. This should be compatible and doable without much effort. I'm assuming here that most of the tools already have native support for gzipped fastq.

from galaxy-hackathon.

pvanheus avatar pvanheus commented on June 25, 2024

@bgruening because metadata is per-user not per-dataset. However, how about we make a new type: uncompressed fastq. So Fastq is compressed fastq. I'm just thinking of a way to convert existing datasets... @natefoo also pointed out to me that the correct way to handle tools that depend on .gz extension is that at job run time the dataset is linked in with the extension as per datatypes_conf.xml.

from galaxy-hackathon.

ashvark avatar ashvark commented on June 25, 2024

@bgruening and @pvanheus . I have created a separate branch (https://github.com/ashvark/galaxy/tree/fastq_enhancements) in my repository for the enhancement of fastq datatype to handle gzipped fastq files as such. I have tested this only with simpe testcases. Below is the explanation of the changes

  • added metadata element 'is_gzipped' for the Fastq datatype in the file datatypes/Sequence.py
  • modified get_headers() method in datatypes/sniff.py to handle zipped file.
  • added a condition in upload.py to avoid the decompression of gzipped fastq files during upload

TO DO

  • test with various scenarios so that it does not disturb any other functionalities

I would like to know your suggestions and improvements.

from galaxy-hackathon.

yhoogstrate avatar yhoogstrate commented on June 25, 2024

+ref: galaxyproject/tools-iuc#354

from galaxy-hackathon.

zipho avatar zipho commented on June 25, 2024

@yhoogstrate that pull request remains open and seems no further development has been done against it.

Another discussing is here: #38

Perhaps we should a combined efforts around this.

@ashvark I briefly tested your changes locally and worked ok.

The other issue is file/dataset extension that sometimes tools use to determine the format of the file, is there any reasons why Galaxy forces the .dat extension. I know it will be a big change, but can files be stored and tracked in their original extension in Galaxy?

from galaxy-hackathon.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.