Deion of the bug I am currently trying to retrieve 15k sampl

Thanks for the report <a class="user-mention notranslate" data-hovercard-type="user" d

Although I don't know if prepending $id with <code cl

Wrong use of --progress in prefetch command leads to failure of pipeline about fetchngs HOT 17 CLOSED

nf-core commented on August 11, 2024

Wrong use of --progress in prefetch command leads to failure of pipeline

from fetchngs.

Comments (17)

Midnighter commented on August 11, 2024 1

Thanks for the report @dmalzl, will add a fix immediately. I will just remove the flag since most of the time nobody will follow the log output anyway.

from fetchngs.

dmalzl commented on August 11, 2024

On the same issue, there seems to be a problem with configuration of the sra-toolkit as I get this message after resolving the command line parameter issue

This sra toolkit installation has not been configured.
Before continuing, please run: vdb-config --interactive
For more information, see https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/

after which it retries and fails

from fetchngs.

Midnighter commented on August 11, 2024

I had a look and I don't see the same thing when running prefetch from the container. What version of sra-tools are you running?

Output that I see:

prefetch --help

  -p|--progress                    Show progress

"prefetch" version 2.11.0

I'm happy to remove the hardcoded option, though, since it doesn't really serve a purpose.

On the configuration: There is code in the module that generates a configuration for you. Why is that not working for you? Please open a separate issue on that or join us on slack to discuss.

from fetchngs.

dmalzl commented on August 11, 2024

Ah sorry this is my fault. I had the configuration problem before and therefore used the preinstalled module I have on our cluster which is v2.9.6.1. Didn't anticipate the version difference. So I think this was then just a problem on my side. However, the problem with the configuration is unchanged. It now runs with my version but I am really curious why configuration is not working for me. Would you suggest opening another issue here or directly go to slack? Which one would be better for discussion?

from fetchngs.

Midnighter commented on August 11, 2024

I think Slack will be easier if you don't mind there's a fetchngs channel.

from fetchngs.

dmalzl commented on August 11, 2024

Sorry for being annoying but I joined and don't find the fetchngs channel. Just have pipelines is this the one you were referring to?

from fetchngs.

dmalzl commented on August 11, 2024

No worries found it

from fetchngs.

dmalzl commented on August 11, 2024

Last issue connected to this I promise because I fixed it this way. After fixing my ncbi config with the two missing lines

/LIBS/GUID = "bd94a196-a984-494f-909e-14e1ffb250e4"
/libs/cloud/report_instance_identity = "true"

I retried and it worked. To reset the pipeline to the state it was in I checked out all my changes and tried again just to be sure the issue doesn't seem to be fixed just because I worked my voodoo on the code. Unfortunately, this was the case with the reason for this being that the stupid prefetch is not just writing the file in the cwd but rather in some other directory specified in the ncbi_config. Simply adding -o ./$id to the command fixed this. So the complete change to the module should be

    retry_with_backoff.sh prefetch \\
        $args \\
        $id \\
        -o ./$id

This will also prevent others to run into the same issues as I experienced today

from fetchngs.

dmalzl commented on August 11, 2024

Although I don't know if prepending $id with ./ is necessary since this is something I had to do with v2.9.6.1 but v2.11 does not complain about it

from fetchngs.

Midnighter commented on August 11, 2024

Since this seems specific to your case, i.e., the content of your NCBI configuration, I suggest you make use of the args with a local configuration.

process {
        withName: SRATOOLS_PREFETCH {
            ext.args = { "-o ./$id" }
        }
}

from fetchngs.

dmalzl commented on August 11, 2024

Ah that's clever. Thanks for the suggestion

from fetchngs.

dmalzl commented on August 11, 2024

I have identified yet another possible problem. Using the -o ./$id forces the pipeline, at least in my case, to save the data into the cwd of the current process. However, the file is then named $id which is subsequently passed on to fasterq-dump. But passing just a plain SRA id to fasterq-dump will result in fasterq-dump downloading the SRA file again instead of just processing the already existing one. I tried with suffixing the additional arg wit.sra but then the pipeline tells me it could not find the expected output since the output it expects from prefetch is a file named just with an SRA id.
I fixed this by changing the output to path("${id}.sra") which seems to work fine until the process completes where it starts to throw an error but i didn't have the time to look into this yet. In any way I think this should be changed to incorporate the -o argument to force the output to have the .sra suffix in order to avoid futile downloads and data accumulation

from fetchngs.

Midnighter commented on August 11, 2024

The default behavior is that prefetch creates a directory which contains the SRA so $id/$id.sra if you will. This directory is then supplied to fasterq-dump which will look for such a directory. I hadn't used the -o flag before and didn't look at it specifically.

I suggest you use either -O . / --output-directory . to force output in the current directory or -o $id/$id.sra / --output-file $id/$id.sra. Then everything should work as expected.

from fetchngs.

dmalzl commented on August 11, 2024

Okay. Again something I didn't know. I now just deleted my ncbi config file to let the pipeline take care of it. Hope that settles it now. Thanks

from fetchngs.

Midnighter commented on August 11, 2024

I'm working on some improvements for the pipeline based on your troubles but it will still take a while until they are released.

from fetchngs.

dmalzl commented on August 11, 2024

No worries. Since I was not really dependent on the configurations in my file simply deleting it seems to have done the trick for now. But I guess at least some future user will profit from my experiences. Thanks for taking care

from fetchngs.

drpatelh commented on August 11, 2024

Looks like this has been resolved. Will close for now but feel free to re-open if the problem persists.

from fetchngs.

Wrong use of --progress in prefetch command leads to failure of pipeline about fetchngs HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent