Comments (8)
I have never built a Conda package before but I've always wanted to find out how.
I just found https://anaconda.org/hcc/aspera-cli and will check it out first.
from fetchngs.
Hi @atongsa,
I haven't yet come across Aspera in a Docker image which is a hard requirement for nf-core pipelines. If you come across a solution for that it would be possible to integrate it. Do you have some benchmarks comparing the speed, though, or is that just an idea that using Aspera would be faster? I think Aspera is also not officially supported by NCBI any longer, although it still works, of course.
from fetchngs.
sorry, just idea, i dont test.
from fetchngs.
Aspera is not officially supported by NCBI because prefetch
and fasterq-dump
additionally download the index files needed to decompress the *.sra into *.fastq. These index files are saved in {sra-toolkit folder}/refseq.
Whereas using the NCBI API we only get a link to an *.sra file, which we then have to decompress via fasterq-dump
anyway. There is no difference in speed, because using ascp, curl, wget, etc we download *.sra from AWS, but prefetch
also downloads *.sra files from AWS. https://github.com/ncbi/sra-tools/wiki/Avoid-using-ascp-directly-for-downloads
But when we download the *.fastq.gz from ENA, the download speed via FTP is very limited, but aspera gives a much higher speed.
For example, I was looking at the execution_report of a test run on AWS: https://nf-co.re/fetchngs/results#fetchngs/results-2d593fb504caf65301c78b8076272f895e364cd7/pipeline_info/execution_report_2021-09-15_16-37-52.html
The NFCORE_FETCHNGS:FETCHNGS:SRA_FASTQ_FTP
process took 5m 39s to download 46 MB via FTP, which means that the FTP download speed on the AWS server was about 150 KB/s.
And about aspera in docker, aspera has its own image in docker hub: https://hub.docker.com/r/ibmcom/aspera-cli/
from fetchngs.
Sounds promising.
from fetchngs.
Hmm, would either of you be interested in contributing this feature? I'm afraid I won't have time to work on this myself.
from fetchngs.
I'm interested in contributing but will need some guidance. The Docker Hub link above for Aspera no longer exists, so I wrote my own Dockerfile and pushed it to Docker Hub.
You can use it by providing the download link and where to output the file.
docker run --rm -u $(id -u):$(id -g) -v $(pwd):$(pwd) -w $(pwd) davetang/aspera_connect:4.2.6.393 [email protected]:vol1/fastq/SRR390/SRR390728/SRR390728_1.fastq.gz .
Using Aspera Connect is substantially faster than any other method on my home Internet connection. For example:
SRR292241_1.fastq.gz 100% 369MB 238Mb/s 00:13
Completed: 378151K bytes transferred in 13 seconds
(227709K bits/sec), in 1 file.
SRR292241_2.fastq.gz 100% 376MB 272Mb/s 00:13
Completed: 385855K bytes transferred in 14 seconds
(218815K bits/sec), in 1 file.
SRR390728_1.fastq.gz 100% 96MB 265Mb/s 00:05
Completed: 99006K bytes transferred in 5 seconds
(141136K bits/sec), in 1 file.
SRR390728_2.fastq.gz 100% 97MB 261Mb/s 00:07
Completed: 99628K bytes transferred in 7 seconds
(105342K bits/sec), in 1 file.
There's a tool called ffq, which you may already be aware of, that can be used to generate download links. I made a request for ffq
to support Aspera Connect but it was ignored. I wrote a script (that needs a bit more work) to generate Aspera Connect download links from ffq
.
Anyway, please let me know how I can contribute! I've just recently been learning about Nextflow and just joined the nf-core Slack workspace.
from fetchngs.
Hi @davetang,
Thank you for speaking up. That seems really cool.
For aspera, I wonder if will be possible to create a bioconda recipe for it? That will ensure a conda package and that BioContainers will be built which can then support Docker and Singularity.
We are aware of ffq, but I also wasn't 100% happy with it so far. I made a prototype of reworking it so that it is much faster, and can output the S3 links https://github.com/Midnighter/ffqf. So maybe I should work on spitting out aspera links, too, and finish up this tool?
Anyway, I think creating a bioconda recipe would be a good starting point, if the license of aspera allows this.
from fetchngs.
Related Issues (20)
- SRAtools download seems to insert paired-end suffix into workdir path HOT 6
- Support .ngc file for dbgap downloads HOT 2
- FEAT: Pass scientific name as input to download the data
- Pipeline fails for large studies HOT 1
- Pipeline crashes if some samples are not available HOT 3
- Use nf-test for input validation
- check out extensions for input files HOT 1
- Add ability to download more than 2 FastQ files via FTP and Aspera HOT 3
- Merge technical replicates (SRR1 + SRR2 -> SRX)
- nf-validation-1.1.3: Operation not supported HOT 20
- `vdb-validate` does not detect file corruption HOT 5
- URGENT: pin nf-validation version HOT 1
- wget host address error HOT 5
- aspera `CONDA_PREFIX` error HOT 1
- Automatic retrieval of input id.csv from test-datasets for test profile HOT 5
- SRA file links deprecated HOT 3
- Support for GSA accessions
- DDBJ study ids (DRP123456) do not map correctly
- Support for CNBG
- SRATOOLS_FASTERQDUMP failing due to missing fastq file HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fetchngs.