ncbi / sra-tools Goto Github PK

SRA Tools

License: Other

Makefile 0.43% C++ 31.42% Shell 0.47% Python 0.53% Perl 1.63% C 63.63% Batchfile 0.01% Java 1.01% Roff 0.03% Lex 0.04% Yacc 0.08% CMake 0.71% Dockerfile 0.01% Raku 0.02% Meson 0.01%

sra-tools's Introduction

The NCBI SRA (Sequence Read Archive)

Contact:

email: [email protected]

Download

Visit our download page for pre-built binaries.

Change Log

Please check the CHANGES.md file for change history.

The SRA Toolkit

The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.

May 21, 2024: SRA Toolkit Release 3.1.1

Improved prefetch error and information messages for users.

Fixed errors and warnings when building on Windows.

March 5, 2024: SRA Toolkit Release 3.1.0

Using prefetch --eliminate-quals will now dowload SRA Lite data or report that a Lite version is not available.

Reduced frequency of global timeouts for cloud users.

vdb-validate will report an error if data (blob) checksums are missing.

Added support for AlmaLinux.

Fixed hanging on macOS and BSD.

December 19, 2023: SRA Toolkit Release 3.0.10

Fixed a bug in using JWT with some cloud storage.

Added build support for arm64 processors.

August 29, 2023: SRA Toolkit 3.0.7

Updated vdb-config to improve AWS credential interface and usage by SRA Toolkit.

Fixed a bug in AWS credentials with prefetch.

Fixed a bug resulting in 'reference not found' messages for reference sequences stored inside a run.

July 10, 2023: SRA Toolkit 3.0.6

Prefetch now supports the latest GCP access tokens.

Fixed a bug in vdb-config for Windows users.

To ensure output of techinical reads, fasterq-dump will now automatically switch to --split-files mode if --include-technical option is used.

May 9, 2023: SRA Toolkit 3.0.5

Added support for PacBio to fasterq-dump.

Added features to output reference sequences to fasterq-dump.

Fixed a bug in dbGaP data access when using ngc files.

January 3, 2023 : SRA Toolkit 3.0.3

Fixed a regression in sra-stat.

December 12, 2022 : SRA Toolkit 3.0.2

Fixed 'buffer insufficient while converting string within text module' failure for prefetch on Mac.

November 15, 2022 : SRA Toolkit 3.0.1

Removed interactive requirement to configure SRA Toolkit.

Changes to the repository structure:

To better serve disparate groups of users, the tools/ directory of the sra-tools repository is divided into several subdirectories:

external/ - the tools that comprise the end user facing sra-toolkit. These are the tools that are installed on a toolkit user's machine. This is the default make target
internal/ - the tools oriented towards the toolkit's developers and NCBI-internal users
loaders/ - the tools used in archive loading pipelines, such as the NCBI SRA
test-tools/ - the tools used in the NCBI-internal testing of the toolkit.

The default 'make' command will now only build the external tools. To build other categories of tools, use these targets/flags:

'make all' - to build everything, including the test projects (located in sra-tools/test/)
'make BUILD_TOOLS_INTERNAL=ON' - to build the external and the internal tools
'make BUILD_TOOLS_LOADERS=ON' - to build the external tools and the loaders
'make BUILD_TOOLS_TEST_TOOLS=ON' - to build the external tools and the test tools
'make TOOLS_ONLY=ON' - to skip building the test projects

The build flags shown above can be combined on the same command line, for instance 'make BUILD_TOOLS_LOADERS=ON BUILD_TOOLS_INTERNAL=ON TOOLS_ONLY=ON' will build everything except the test tools and the test projects.

August 4, 2022 : Security Update

Due to updated security at NCBI, versions of the SRA Toolkit 2.9.6 and older will no longer be able to connect to the NCBI data location service. We advise impacted users to update to the latest version of the SRA Toolkit.

February 10, 2022 : SRA Toolkit 3.0.0

NCBI's SRA changed the source build system to use CMake in toolkit release 3.0.0. This change is an important step to improve developers' productivity as it provides unified cross platform access to support multiple build systems. This change affects developers building NCBI SRA tools from source. Old makefiles and build systems are no longer supported.

This change also includes the structure of GitHub repositories, which underwent consolidation to provide an easier environment for building tools and libraries (NGS libs and dependencies are consolidated). Consolidation of NGS libraries and dependencies provides better usage scope isolation and makes building more straightforward.

Affected repositories

ncbi/ngs

This repository is frozen. All future development will take place in GitHub repository ncbi/sra-tools (this repository), under subdirectory ngs/.

ncbi/ncbi-vdb

This project's build system is based on CMake. The libraries providing access to SRA data in VDB format via the NGS API have moved to GitHub repository ncbi/sra-tools.

Old (base URL: https://github.com/ncbi/ncbi-vdb)	New (base URL: https://github.com/ncbi/sra-tools)
`libs/ngs`	`ngs/ncbi/ngs`
`libs/ngs-c++`	`ngs/ncbi/ngs-c++`
`libs/ngs-jni`	`ngs/ncbi/ngs-jni`
`libs/ngs-py`	`ngs/ncbi/ngs-py`
`libs/vdb-sqlite`	`libs/vdb-sqlite`
`test/ngs-java`	`test/ngs-java`
`test/ngs-python`	`test/ngs-python`

ncbi/sra-tools (This repository)

This project's build system is based on CMake. The project acquired some new components, as listed in the table above.

October 25, 2021. SRA Toolkit 2.11.3:

fixed a bug in fasterq-dump: fasta and fasta-unsorted parameters work correctly.

October 7, 2021. SRA Toolkit 2.11.2:

SRA data are now available either with full base quality scores (SRA Normalized Format), or with simplified quality scores (SRA Lite), depending on user preference. Both formats can be streamed on demand to the same filetypes (fastq, sam, etc.), so they are both compatible with existing workflows and applications that expect quality scores. However, the SRA Lite format is much smaller, enabling a reduction in storage footprint and data transfer times, allowing dumps to complete more rapidly. The SRA toolkit defaults to using the SRA Normalized Format that includes full, per-base quality scores, but users that do not require full base quality scores for their analysis can request the SRA Lite version to save time on their data transfers. To request the SRA Lite data when using the SRA toolkit, set the "Prefer SRA Lite files with simplified base quality scores" option on the main page of the toolkit configuration- this will instruct the tools to preferentially use the SRA Lite format when available (please be sure to use toolkit version 2.11.2 or later to access this feature). The quality scores generated from SRA Lite files will be the same for each base within a given read (quality = 30 or 3, depending on whether the Read Filter flag is set to 'pass' or 'reject'). Data in the SRA Normalized Format with full base quality scores will continue to have a .sra file extension, while the SRA Lite files have a .sralite file extension. For more information please see our data format page.

August 17, 2021: SRA Toolkit 2.11.1.

March 15, 2021: SRA Toolkit 2.11.0.

December 16, 2020: SRA Toolkit 2.10.9.

June 29, 2020: SRA Toolkit 2.10.8.

May 20, 2020: SRA Toolkit 2.10.7.

May 18, 2020: SRA Toolkit 2.10.6.

April 1, 2020: SRA Toolkit 2.10.5.

February 26, 2020: SRA Toolkit 2.10.4.

February 18, 2020: SRA Toolkit 2.10.3.

Release 2.10.2 of sra-tools provides access to all the public and controlled-access dbGaP of SRA in the AWS and GCP environments (Linux only for this release). This vast archive's original submission format and SRA-formatted data can both be accessed and computed on these clouds, eliminating the need to download from NCBI FTP as well as improving performance.

The prefetch tool also retrieves original submission files in addition to ETL data for public and controlled-access dbGaP data.

With release 2.10.0 of sra-tools we have added cloud-native operation for AWS and GCP environments (Linux only for this release), for use with the public SRA. prefetch is capable of retrieving original submission files in addition to ETL data.

With release 2.9.1 of sra-tools we have finally made available the tool fasterq-dump, a replacement for the much older fastq-dump tool. As its name implies, it runs faster, and is better suited for large-scale conversion of SRA objects into FASTQ files that are common on sites with enough disk space for temporary files. fasterq-dump is multi-threaded and performs bulk joins in a way that improves performance as compared to fastq-dump, which performs joins on a per-record basis (and is single-threaded).

fastq-dump is still supported as it handles more corner cases than fasterq-dump, but it is likely to be deprecated in the future.

You can get more information about fasterq-dump in our Wiki at https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump.

For additional information on using, configuring, and building the toolkit, please visit our wiki or our web site at NCBI

SRA Toolkit Development Team

sra-tools's People

Contributors

Stargazers

Watchers

Forkers

zengfengbo yeung678 bioarpit mdshw5 chadchougithub shansabri imawolf wilkox jn7163 dcgenomics rtvt123 ep1983 dmr41 daweih syauwjl baabdella biobuilds dummey cpgme jiywang3 klortho jmrinaldi minzhuxie mbaughn shicheng-guo analyticlaire xtmgah zacshi bhumi28 fabiog28 pedropaz18 jasmine55555 jingyu9 theintuitivesoul chinhbn dayedepps a1aks sdwfrost alenzhao durbrow atdurian hlkfoz lini-828127 mmesbahu nalisarc elucify airbj31 wanggl2017 shawnspei naimmahi loversaber tankmermaid bjt67 weizhiting jis958 inambioinfo jessesiu zhssakura sjackman dmanahan cfsan-biostatistics hrk2109 zexuwu kunju-pitt isaurarreinhold wolf2407 fullstackenviormentss lucycao597 pythseq amhaslam springtan sunnycqcn tw7649116 fengyq whhyh1314 sgnajar miserlou hotliu leixuecynthia xuwei684 vallurumk daxian248 yzywoazzzzz mayoor23 d-smirnov alexpersa7 mosquitocat ciaranwelsh lqsae kaydaramola but-i-play-one-on-tv andrewkowong sebymusundi zengzeyong123 prehensilecode fengzhuo17 xiaxinyi2017 afshin1363 xjyx bowen1992

sra-tools's Issues

support proxy authentication

Currently, only proxy host and port settings can be configured but no authentication.

At least BASIC authentication should be implemented.

node not found while accessing manager within configuration module - project 'XXXX': cannot find protected repository

Version 2.5.7 on linux.

I have a krt file that looks like this:

$ prefetch -l cart_DAR17624_201605221006.krt
version 1.0
4284||SRR617301||
4284||SRR617305||
$end

And when I try to do the download, I get this:

$ prefetch cart_DAR17624_201605221006.krt
Maximum file size download limit is 20,971,520KB
Downloading kart file 'cart_DAR17624_201605221006.krt'
Checking sizes of kart files...

2016-05-22T14:13:47 prefetch.2.5.7 err: node not found while accessing manager within configuration module - project '4284': cannot find protected repository
2016-05-22T14:13:47 prefetch.2.5.7 err: node not found while accessing manager within configuration module - project '4284': cannot find protected repository

These are protected data.

Different fastq output from sam-dump and fastq-dump

sam-dump and fastq-dump can both be used to output a FASTQ formatted file. In trying to figure out the best way to process .sra files, I have been using both tools. I noticed an inconsistency in the quality strings printed out by the 2 tools for the same .sra file.

Example .sra is publicly accessible run SRR3332402

The following 2 command should produce nearly identical output, except for the sequence identifier:

SRR=SRR3332402
sam-dump -p ${SRR} -u -g --fastq ${SRR} > ${SRR}.sam-dump.p.u.g.fastq.fq
fastq-dump --split-3 --defline-seq '@$ac.$si.$sg/$ri' --defline-qual '+' -Z ${SRR} > ${SRR}.fastq-dump.split.defline.z.fq

The corresponding .sam file was created as well

sam-dump -p ${SRR} -u -g ${SRR} > ${SRR}.sam-dump.p.u.g.sam

Both output .fq files are 7805680 lines in length, representing 1951420 reads. fastq-dump reports that 975710 spots were read/written (975710 * 2 = 1951420). The .sam file has 1951420 alignments. So good so far.

However, the quality strings for reverse reads in the ${SRR}.sam-dump.p.u.g.fastq.fq file appear reversed when compared to the quality strings in the ${SRR}.fastq-dump.split.defline.z.fq file. The base sequences are not reversed.

head -8 ${SRR}.sam-dump.p.u.g.fastq.fq ${SRR}.fastq-dump.split.defline.z.fq
@SRR3332402.1.134261/1 primary ref=1 pos=107512 mapq=0
AGGGAAATTTGGACATAGAGAGAGGCACACAGGGAGGATGCCATATGAGAATTGACACTGTGCTGTCACAAGCCAAGGAACTACTGGAAGGAGAGAAAGA
+
CCCFFFFFHHHHHJJJJIJICFHHIJJIJJIIJIIIJJIHIJJIIIIEI@GGHIGEIIIE=DHIEGFFCHEB=?##########################
@SRR3332402.1.134261/2 primary ref=1 pos=107567 mapq=0
AGCTGGCAGGGCTAGGCTGCCTGAAAAGGTGCTAAGGAAGGAACTGTTCCAGTCCTCTTTCTCTCCTTCCAGTAGTACCTTGGCTTGAGACAGCACAGTG
+
#######################################@;=@>@=>FABHGADCCHEDHFIGGGBJIIHGIIJJIIJGIHJJFJJJHGHHHFFFFD?:+

==> SRR3332402.fastq-dump.split.defline.z.fq <==
@SRR3332402.1.134261/1
AGGGAAATTTGGACATAGAGAGAGGCACACAGGGAGGATGCCATATGAGAATTGACACTGTGCTGTCACAAGCCAAGGAACTACTGGAAGGAGAGAAAGA
+
CCCFFFFFHHHHHJJJJIJICFHHIJJIJJIIJIIIJJIHIJJIIIIEI@GGHIGEIIIE=DHIEGFFCHEB=?##########################
@SRR3332402.1.134261/2
AGCTGGCAGGGCTAGGCTGCCTGAAAAGGTGCTAAGGAAGGAACTGTTCCAGTCCTCTTTCTCTCCTTCCAGTAGTACCTTGGCTTGAGACAGCACAGTG
+
+:?DFFFFHHHGHJJJFJJHIGJIIJJIIGHIIJBGGGIFHDEHCCDAGHBAF>=@>@=;@#######################################

The relevant lines are the last ones for each file -- the qualities for read SRR3332402.1.134261/2. All other lines are the same, except for the naming difference.

The output from fastq-dump is more believable -- the quality should fall off at the end of the read. Or these might be masked illumina adapter sequences, since the insert size for this read pair is small (155) relative to the read length (100).

Comparing this with the same read pair in the .sam file:

samtools view ${SRR}.sam-dump.p.u.g.sam | head -2 | cut -f10,11 | tr "\t" "\n"
AGGGAAATTTGGACATAGAGAGAGGCACACAGGGAGGATGCCATATGAGAATTGACACTGTGCTGTCACAAGCCAAGGAACTACTGGAAGGAGAGAAAGA
CCCFFFFFHHHHHJJJJIJICFHHIJJIJJIIJIIIJJIHIJJIIIIEI@GGHIGEIIIE=DHIEGFFCHEB=?##########################
CACTGTGCTGTCTCAAGCCAAGGTACTACTGGAAGGAGAGAAAGAGGACTGGAACAGTTCCTTCCTTAGCACCTTTTCAGGCAGCCTAGCCCTGCCAGCT
#######################################@;=@>@=>FABHGADCCHEDHFIGGGBJIIHGIIJJIIJGIHJJFJJJHGHHHFFFFD?:+

Here the sequence of the reverse read is reverse-complemented relative to the .fq files above, as it should be. The quality string is also reversed, so the base-call and quality paring stays intact.

It seems like the --fastq option in sam-dump reverses the base sequence of the reverse reads, but fails to also reverse the quality sequence, resulting in an incorrect output.

I don't know how many people use sam-dump to create .fq output, but this would be a serious bug for those that do. Am I missing something in how these commands should be run?

kff lib?

I'm having an issue building sra-tools on Ubuntu 14.04. When building /tools/copycat, I get an ld -lkff not found error. I'm not quite sure where the kff lib is generated. Could you point me to the source for that library? I don't see anything in the ngs or ncbi-vdb repos. Thanks

Failing the test download

I downloaded sra-toolkit from sra website for 64bit Windows.
After extracting I tested it from bin folder:
sra-toolkit\bin> fastq-dump --stdout -X 2 SRR390728
but it throws error:
_2015-12-15T09:54:40 fastq-dump.2.5.5 err: item not found while constructing within virtual database module - the path 'SRR390728' cannot be opened as database or table_
also did:
vdb-config --restore-defaults
no use.
what may be the cause and how to get it working.

prefetch issues

Hi, I am using Ubuntu 14.04. I downloaded the newest release of sra-toolkit from the NCBI website. I unzipped and moved all the files from sratoolkit.2.6.3-ubuntu64/bin to usr/bin. When I use prefetch to call a SRR file I get the following error.
2016-07-12T16:58:54 prefetch.2.6.3: Using '/home/hart/.aspera/connect/bin/ascp'
2016-07-12T16:58:54 prefetch.2.6.3 err: path not found while resolving tree within virtual file system module - 'SRR925811' cannot be found.
I have tried this with multiple SRR files, some which I have downloaded before using wget from the ncbi ftp site. Could you tell me what is going wrong with prefetch? Thanks!

issue with samtools-dump

Hello,
I downloaded the binary for sra-tools for Ubuntu 64 bit. I tried to test sam-dump, but its giving me an error

sam-dump -h
/home/software/sratoolkit.2.6.3-ubuntu64/bin/sam-dump: /lib64/libc.so.6: version `GLIBC_2.7' not found (required by /home/software/sratoolkit.2.6.3-ubuntu64/bin/sam-dump)

Strange behavior prefetch/dbGAP

After adding a dbGAP project using prj_XXX.ngc file (downloaded from dbGAP web page) via vdb-config -i, I was not able to run the prefetch command due to access denied error. Upon manual inspection, the ~/.ncbi/dbGAP_XXX.enc_key appears to contain a dbGap password that was previously used for my account. After changing it to the new password everything works fine. It seems quite strange for me that the file generated by vdb-config contains the old password that was changed.

impossible to build against existing ncbi-vdb

setup/package.prl declares a dependency on at least one file in the ilib directory of ncbi-vdb. When ncbi-vdb is built and installed, the ilib directory is not installed, so building sra-tools against an existing installation of ncbi-vdb is not possible.

(I'm attempting to package sra-tools and its dependencies for GNU Guix.)

Support Compiling on FreeBSD

SRA-tools compiles on Apple and Linux, but does not compile on FreeBSD. Due to the complex nature of the build system it is difficult to port without knowing the intricacies of the software.

How to run GATK variant calling directly on SRA file ?

Hello, I tried run GATK variant calling directly on a SRA file.

I downloaded GenomeAnalysisTK-3.5 jar file to my computer. My current directory is the folder where my SRA file and key file are located. I tried both these commands:

java -jar /path/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R SRRFileName -I SRRFileName -stand_call_conf 30 -stand_emit_conf 10 -o SRRFileName.vcf 

java -jar /path/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T UnifiedGenotyper -R SRRFileName -I SRRFileName -stand_call_conf 30 -stand_emit_conf 10 -o SRRFileName.vcf

For both these commands, I got this error:
ERROR MESSAGE: Invalid command line: The GATK reads argument (-I, --input_file) supports only BAM/CRAM files with the .bam/.cram extension and lists of BAM/CRAM files with the .list extension, but the file SRR1718738 has neither extension. Please ensure that your BAM/CRAM file or list of BAM/CRAM files is in the correct format, update the extension, and try again.

Should I be using some wrapper or modified version of GATK to get it to work on the SRA file ? I did see the SRA/dbGAP webinar and this is not clear to me.

Please advise.
K

Remove this repository?

This repo is empty and I assume the actual repo for sra toolkit is https://github.com/ncbi/sratoolkit.

Maybe remove this one to avoid confusion? Thanks.

How can i change the download folder when the disk quote exceeded and keep the files which have downloaded?

I have downloaded part of my request dataset, but it is too big to download in the defalt folder. I want to change the folder but it attempt to remove all files which i have downloaded. How could i keep the files and change the folder?

item not found while constructing within virtual database module

It seems that the prefetch routine is not working properly, as none of the toolkit tools are properly fetching runs by accession:

$ fastq-dump -X 5 -Z SRR390728
2015-02-03T17:07:21 fastq-dump.2.4.3 err: item not found while constructing within virtual database module - the path 'SRR390728' cannot be opened as database or table

This seems to apply to all of the examples in the documentation. I've reproduced this on my Mac (10.10.2) and several Linux machines in different parts of the country, using toolkit version 2.4.3.

In fact there seems to be a biostars thread about this.

Discrepancy between SRA Toolkit website and Wiki?

In terms of downloading SRA files...

NCBI's handbook documents:

For multiple simultaneous downloads of SRA data, or for high-volume downloads, we recommend using command line utilities such as wget, FTP, or Aspera’s ‘ascp’ utility.

It also gives an example to download SRA data using the SRA Toolkit using fastq-dump not prefetch.

Wiki:

At the risk of starting this page off on a negative note, please do not download data using generic tools such as ftp, wget, etc. Doing so can create incomplete images and complicate problem diagnosis.

The supported means of downloading SRA data is to use the tool prefetch included in the SRA Toolkit.

So which is it, CL utilities or SRA Toolkit?

What's the difference in downloading using `fastq-dump` vs `prefetch`?

I thought fastq-dump was to be used once the SRA file was downloaded.

sam-dump does not add .SPOT_GROUP to QNAME for unaligned reads

When running sam-dump on an .sra file, using the options -u (--unaligned) or -g (--spot-group) in isolation is not problematic. The former creates a .sam with all aligned and unaligned reads, and the latter appends the spot-group (ReadGroup) to the QNAME for each read.

However, when both options are specified, the unaligned reads are printed without the SPOT_GROUP appended. This creates the potential for a QNAME conflict when combining .sam files from multiple SRR runs of the same sample. I do not believe this is the expected behavior from combining these options.

If the .sra file maintained the original QNAME of the read, this would not be a problem as these are generally universally unique. If there is a way to print this with either sam-dump or fastq-dump, I'm not aware of it.

Interestingly, using adding the --fastq option to sam-dump fixes this issue -- the SPOT_GROUP is appended. However, this is an inconvenient format for additional downstream steps, however (such as splitting by SPOT_GROUP).

ngs-sdk package not found during configure

Hello!

I am trying to install the latest sra-tools, but during the configure command it fails saying it can't find the ngs-sdk package.

I have built ngs from source using the configure command:
./configure --prefix=/mnt/data1/bin/ngs --build-prefix=/mnt/data1/bin/ncbi-outdir

I then followed the rest of the instructions accordingly and ngs-sdk seemed to install fine. (Except that it told me to Use $NGS_LIBDIR in your link commands and I'm not sure what that means)

I then tried to run the configure command for sra-tools as such:
./configure --prefix=/mnt/data1/bin/sra-tools --build-prefix=/mnt/data1/bin/ncbi-outdir

But then i get the configure: error: required ngs-sdk package not found

Sorry, I'm very new to this. Appreciate the help!

fastq-dump cache

I noticed that if I specify an accession number with fastq-dump, it will download the intermediate SRA files into ~/ncbi/public/sra. Is there a way to change that directory? Also, is there a way to automatically delete those files upon successful FASTQ generation?

Some tests inside test suite are failing

Hi,
I've updated the Debian packaging of sra-toolkit to version 2.7.0. The Debian build is running the test suite but unfortunately I needed to ignore some tests since these are failing. In this patch you can find all those tests I needed to exclude to pass the test suite. I would have a way better feeling if this could be sorted out properly than just closing the eyes by commenting out the failing bits.
Can you verify these failures?
Just let me know if you need more detailed information.
Kind regards, Andreas.

clarify configure options

The configure script takes the argument --with-ncbi-vdb-sources, but I'm not sure what it should be set to. I've tried pointing it to the ncbi-vdb root directory, every sub-directory of it, the install directory of ncbi-vdb, but none work. The configuration always ends with an error like:

checking for ncbi-vdb package source files and build results...
    includes... /home/aa1/tmp/ncbi-vdb-2.5.0
    libraries... /home/aa1/tmp/ncbi-vdb-2.5.0/_install/lib64
no
configure: error: required ncbi-vdb package not found.

I think I'm using --with-ngs-sdk-prefix and --with-ncbi-vdb-build correctly, but more clarification on those would be helpful too. Thank you.

Configure step does not work, can't find NCBI VDB

Not really sure what the right combination of options is, here's the closest I can get:

sudo ./configure --prefix=/software/apps/sratoolkit/gcc/64/2.5.8 --with-ngs-sdk-prefix=/software/apps/ngs-sdk/gcc/64/1.2.3 --with-ncbi-vdb-sources=/software/builds/ncbi-vdb/gcc/64/2.5.8/ncbi-vdb-2.5.8/ --with-ncbi-vdb-build=/software/apps/ncbi-vdb/gcc/64/2.5.8/
Configuring SRA-TOOLS package
checking system type... Linux
checking machine architecture... x86_64
checking SRA-TOOLS version... 2.5.8
checking for supported architecture... x86_64 (64 bits) is supported
checking for supported OS... Linux (linux) is supported
checking for supported tool chain... gcc tool chain is supported
checking for g++... yes
checking whether gcc accepts -Wno-array-bounds... yes
checking for fuse library... no
checking for hdf5 library... no
checking for magic library... no
checking for xml2 library... yes
checking for ngs-sdk package...
includes... /software/apps/ngs-sdk/gcc/64/1.2.3
libraries... /software/apps/ngs-sdk/gcc/64/1.2.3/lib64
includes: /software/apps/ngs-sdk/gcc/64/1.2.3/include
libraries: /software/apps/ngs-sdk/gcc/64/1.2.3/lib64
checking for ncbi-vdb package source files and build results...
includes... /software/builds/ncbi-vdb/gcc/64/2.5.8/ncbi-vdb-2.5.8
libraries... /software/apps/ncbi-vdb/gcc/64/2.5.8/lib64
no
configure: error: required ncbi-vdb package not found.

NCBI VDB is installed like this:

2518 > ls /software/apps/ncbi-vdb/gcc/64/2.5.8/lib64/
total 18340
0 libncbi-ngs-c++.a@ 0 libncbi-vdb.a.2@ 0 libncbi-vdb-static.a@ 0 libncbi-wvdb.so.2@
0 libncbi-ngs-c++.a.2@ 5752 libncbi-vdb.a.2.5.8 0 libncbi-wvdb.a@ 3512 libncbi-wvdb.so.2.5.8*
8 libncbi-ngs-c++.a.2.5.8 0 libncbi-vdb.so@ 0 libncbi-wvdb.a.2@ 0 libncbi-wvdb-static.a@
0 libncbi-ngs-c++-static.a@ 0 libncbi-vdb.so.2@ 5216 libncbi-wvdb.a.2.5.8
0 libncbi-vdb.a@ 3852 libncbi-vdb.so.2.5.8* 0 libncbi-wvdb.so@

2519 > ls /software/apps/ncbi-vdb/gcc/64/2.5.8/include/ncbi-vdb/
total 4
4 NGS.hpp*

No problem finding NGS SDK, as you can see.

fastq-dump fails on local SRA file

I downloaded an SRA file manually via wget:

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/ERR/ERR161/ERR161917/ERR161917.sra

Then I perform fastq-dump on the local file:

fastq-dump ./ERR161917.sra

And it fails with error message:

2016-10-19T14:52:50 fastq-dump.2.8.0 err: binary large object corrupt while reading binary large object within virtual database module - failed /tmp/740660.1.standard.q/485/data/ERR161917.sra

An error occurred during processing.
A report was generated into the file '/home/map2085/ncbi_error_report.xml'.
If the problem persists, you may consider sending the file

to '[email protected]' for assistance.

This happens for dozens of Run accessions. However, for hundreds of other run accessions, the exact same above commands complete successfully.

Disk space is not an issue (4 TB free, while these .sra files are < 5 GB each). Memory is also very large ( > 8 GB)

Do not respond with the following:

"just use fastq-dump <accession> "
"just use prefetch <accession> "

fastq-dump is supposed to work on local SRA files, as it says in the fastq-dump help page. Do not tell me to do it another way. It should work as it claims to.

vdb-dump for complete genomics reads is extremely slow

It's already two weeks I'm trying to dump a single SRA file from Complete Genomics WGS data: SRR833540. The .sra file size is 180 Gb, and the number of reads is about 3 bln. I'm using Linux machine, with 16 Gb of RAM and 7 cores, and the vdb-dump program.

There are two main issues here:

The process is extremely slow: if run in one thread, it will take 75 weeks to dump this single run.
Dumping the file simultaneosly from different regions is also an issue, since vdb-dump is memory-intensive, and it consumes more than 2-4 Gb RAM per dump. Also, memory consumption is not constant during time.
And lastly, a question: vdb-dump performs slow in the beginning of the file, but is much faster when performed on the second part of the file. What can be the reason?

Is there any solution to this issue?
Thanks!
Lilit

what does "clip" mean?

fastq-dump has the following option:

-W|--clip Apply left and right clips

This is not elaborated anywhere on the sratoolkit page on github.

Does clip mean "clip low quality bases" or does it mean "clip adapter sequence" ?

Trying to decrypt data using Windows

Hello,

I am using Windows 7 and have downloaded the SRA Toolkit version 2.5.4-win64. I am trying to decrypt recently downloaded data. I have also saved the decryption key to my desktop. According to the instructions on the NCBI webpage I need to access vdb-config which I do but when I double click nothing appears. How can I go about decrypting using the windows version of the toolkit?

Thank you

100bp size limitation to blastn_vdb

Using blastn_vdb to search for short sequences, I only get reads of >100bp as hits, whereas if I blast two sequences locally with same SRA read file downloaded, I get hits to the small fragments (which I want). Is blast_vdb fitlering the reads into is db? Is there a way to remove this size filtering?

recipe for target 'adapter' failed

My script for installing sra-tools from the github repository used to work (last month on the same computer it worked). Now it fails as follows (<username> is for anonymization):

make[1]: *** No rule to make target '/home/<username>/src/ngs/ngs-sdk/adapter/unix/x86_64/atomic32.h', needed by '/home/<username>/ncbi-outdir/ngs-sdk/linux/gcc/x86_64/rel/obj/adapter/Refcount.pic.o'.  Stop.
make[1]: Leaving directory '/home/<username>/src/ngs/ngs-sdk/adapter'
/home/<username>/src/ngs/ngs-sdk/Makefile.rules:31: recipe for target 'adapter' failed
make: *** [adapter] Error 2
get_sra-tools.sh: ngs-sdk build failed

(I was trying to update after noticing strange error messages using fastq-dump (reporting being version 2.4.3), that did not seem to prevent the fastq data from being dumped from a pre-downloaded .sra archive)

ARM Support

I'm doing some testing on a large ARM box. Am I correct that SRA Toolkit doesn't currently support ARM? If so, how involved do you believe adding ARM support would be? We could possibly do it and issue a pull request if it doesn't appear to be too involved.

./vdb-config --set doesn't seem to do anything?

dbGap access controlled fastq-dump not working

Using version 2.5.4 binaries on MacOS 64 bit

$ for f in prefetch fastq-dump; do $f --version; done  | grep 2
prefetch : 2.5.4
fastq-dump : 2.5.4

I can download public SRA fastq files:

$ fastq-dump -X 5 -Z SRR390728 2> /dev/null | head -1
@SRR390728.1 1 length=72

But somehow access controlled dbGaP data is not working:

$ [~/ncbi/dbGaP-8463] fastq-dump -X 5 -Z SRR988442
2015-09-30T21:12:14 fastq-dump.2.5.4 err: item not found while constructing within virtual database module - the path 'SRR988442' cannot be opened as database or table

The prefetching is working:

$ [~/ncbi/dbGaP-8463] prefetch -v SRR988442
2015-09-30T20:26:41 prefetch.2.5.4: 1) Downloading 'SRR988442'...
2015-09-30T20:26:41 prefetch.2.5.4:  Downloading via http...
[..]
2015-09-30T20:27:39 prefetch.2.5.4: 1) 'SRR988442' was downloaded successfully

But I can't fastq-dump the resulting file:

$ [~/ncbi/dbGaP-8463] fastq-dump -Z -X 5 ./sra/SRR988442.sra
2015-09-30T21:30:05 fastq-dump.2.5.4 err: item not found while constructing within virtual database module - the path './sra/SRR988442.sra' cannot be opened as database or table

Any clue?

Prefetch "Redirected!!!" error (acsp)

Hi,
prefetch is failing and I can't understand why.
I'm using prefetch with the following command line:
prefetch -t ascp -a "~/.aspera/connect/bin/ascp|~/.aspera/connect/etc/asperaweb_id_dsa.openssh" SRR765378

I'm getting this error:

Redirected!!!

2016-11-07T09:14:41 prefetch.2.3.5 err: name incorrect while evaluating path within network system module - Scheme is 'https'
2016-11-07T09:14:41 prefetch.2.3.5 err: path not found while resolving tree within virtual file system module - 'SRR765378' cannot be found.

Rather mysteriously, the same line worked for the same accession one day before. Could you help me with this issue?

Error when run fastq-dump with --split-3

Trying to use fastq dump to separate pair end reads file from SRA files.

fastq-dump --split-3 xxx.sra

it worked for a few files but for some file, it throw the error:

locate_module.c(591):ERROR:106: Magic cookie '#%Module' missing in '/fastq-dump' ModuleCmd_Load.c(208):ERROR:105: Unable to locate a modulefile for '/sratoolkit/bin'

No problem if don't use --split-3 parameter

Thanks

Prefetch on windows ignores all arguments passed to ascp with --ascp-options

The entire flag --ascp-options is broken on windows.

prefetch --ascp-options -l1M SRRxxxxxxx
and
prefetch --ascp-options '-l1M' SRxxxxxxx

has no effect on the actual parameters passed to the ascp-binary.
Works fine on linux and mac.

compile of pacbio-load fails due to missing libkdf5

I'm stumped on this one. I've compiled and installed ncbi-vdb and ngs from here, and make under sra-tools goes pretty far, but fails here:

make[3]: Entering directory `/root/ncbi-outdir/sra-tools/linux/gcc/x86_64/rel/obj/tools/pacbio-load'
gcc -static-libstdc++ -static-libgcc -o /root/ncbi-outdir/sra-tools/linux/gcc/x86_64/rel/bin/pacbio-load.2.4.3 -DNDEBUG -m64 pl-context.o pl-tools.o pl-zmw.o pl-basecalls_cmn.o pl-sequence.o pl-consensus.o pl-passes.o pl-metrics.o pl-regions.o pl-progress.o pacbio-load.o -L/root/ncbi-outdir/sra-tools/linux/gcc/x86_64/rel/lib -L/root/ncbi-outdir/sra-tools/linux/gcc/x86_64/rel/ilib -L/usr/local/ngs/ngs-sdk/lib64 -L/usr/local/ncbi/ncbi-vdb/lib64 -L/root/ncbi-outdir/ncbi-vdb/linux/gcc/x86_64/rel/ilib -Wl,-Bstatic -lkapp -lkdf5 -lload -lhdf5 -lncbi-wvdb -Wl,-Bdynamic -ldl -lpthread -lxml2 -Wl,-Bdynamic -lm
/usr/bin/ld: cannot find -lkdf5
collect2: error: ld returned 1 exit status
make[3]: *** [/root/ncbi-outdir/sra-tools/linux/gcc/x86_64/rel/bin/pacbio-load] Error 1
make[3]: Leaving directory `/root/ncbi-outdir/sra-tools/linux/gcc/x86_64/rel/obj/tools/pacbio-load'
make[2]: *** [std] Error 2
make[2]: Leaving directory `/usr/local/src/sra-tools/tools/pacbio-load'
make[1]: *** [pacbio-load_std] Error 2
make[1]: Leaving directory `/usr/local/src/sra-tools/tools'
make: *** [tools_std] Error 2

I've sourced /etc/profile.d/ncbi-vdb.sh, ngs-java.sh, ngs-sdk.sh to no avail. Solution?

New instance of SRA

Hi all,
We are currently working on SRA tool. Our client need a separate instance to view the advancement in the project more or less like a sandbox. Could you please suggest the way to create a new instance.

Please do the needful. Kindly let me know if you need any further details.

Regards,
Sudheer

Different read/write cache locations: docker containers

I am experimenting running prefetch within docker containers which are executed across multiple HPC environments.

Due to permissions issues across the shared file system and the Docker user set up, ideally I would like to be able to check from within the container if an SRR & its requirements are present in the cache location (which lives outside the container and is mounted) and if it is not, write them to a 'within' container cache location.

I have currently set up the first part but cannot understand how to have a different write location using vdb-config. Any insight would be much appreciated.

sam-dump and fastq-dump are unbearably slow

Even though I'm working with pre-fetched files on our Lustre filesystem with plenty of I/O capacity, the *-dump commands take forever to process a file. 12 hours and fastq-dump is not even 1/4 through one .sra file. sam-dump is even slower: 10 hours and it's done 2GB out of approx. 70GB.

I understand that compression is important for SRA, given the large data sets, but with these run-times, the data become borderline unusable. Isn't there any way to extract the data faster?

Homebrew recipe for sra-tools

I opened a PR to get make sra-tools available as a homebrew-science recipe: https://github.com/Homebrew/homebrew-science/pull/1691

It builds fine on linux. Unfortunately, it looks like it does not get through on MacOS. For e.g. see: http://bot.brew.sh/job/Homebrew%20Science%20Pull%20Requests/759/version=mavericks/testReport/junit/brew-test-bot/mavericks/install_sratoolkit/

I do not have access to a Mac to look into this further. Would someone from the core team be interested in looking into this?

openssh key for ascp

when I use sratools in linux, it looks like the sratools is not update with the ascp settings. i.e when I prefetch something, it will use asperaweb_id_dsa.putty as private key. But I remember the new ascp connect program has changed to use asperaweb_id_dsa.openssh in linux. So if I am not misunderstanding something, it should be a issue.

fastq-dump.2.4.1 err: binary large object corrupt

Hello,

When I try to download/convert an SRA to fastq (http://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR847533) using:

fastq-dump --split-files SRR847533

I get:

2014-12-14T11:18:36 fastq-dump.2.4.1 err: binary large object corrupt while reading binary large object within virtual database module - failed SRR847533

An error occurred during processing.
A report was generated into the file '/home/sbarberakis/ncbi_error_report.xml'.
If the problem persists, you may consider sending the file

to sra @ ncbi.nlm.nih.gov for assistance.

Xml file is here: http://cineserver.3p.tuc.gr/d/ncbi_error_report.xml

dbGaP Access Denied with (supposedly) valid permissions

The dbGaP key is fresh from today. I swear this same process worked a week ago, so maybe it's a problem on the dbGaP web site?

>>rm -rf .ncbi/ ncbi/
>>vdb-config --import ~/Downloads/prj_3904.ngc
>>prefetch SRR363541

Maximum file size download limit is 20,971,520KB

2015-03-18T17:45:59 prefetch.2.4.3 err: path not found while resolving tree within virtual file system module - 'SRR363541' cannot be found.
>>vdb-config --restore-defaults
ok
Fixed default configuration
>>prefetch SRR363541
Maximum file size download limit is 20,971,520KB

2015-03-18T17:46:18 prefetch.2.4.3 err: query unauthorized while resolving tree within virtual file system module - failed to resolve accession 'SRR363541' - Access denied - please request permission to access phs000281/HMB-IRB in dbGaP ( 403 )
2015-03-18T17:46:19 prefetch.2.4.3 err: path not found while resolving tree within virtual file system module - 'SRR363541' cannot be found.
>>vdb-config --import ~/Downloads/prj_3904.ngc
2015-03-18T17:46:48 vdb-config.2.4.3 int: constraint violated while creating manager within configuration module - cannot import ngc file: /gscuser/rng/Downloads/prj_3904.ngc
>>prefetch SRR363541
Maximum file size download limit is 20,971,520KB

2015-03-18T17:47:02 prefetch.2.4.3 err: query unauthorized while resolving tree within virtual file system module - failed to resolve accession 'SRR363541' - Access denied - please request permission to access phs000281/HMB-IRB in dbGaP ( 403 )
2015-03-18T17:47:03 prefetch.2.4.3 err: path not found while resolving tree within virtual file system module - 'SRR363541' cannot be found.

Missing blastn_vdb binary after succesful build

Hi,

I am trying to build this package because I want to use blastn_vdb. I first succesfully installed ngs-sdk and ncbi-vdb.

I am then able to configure and build the master branch without getting any error, however the binary blastn_vdb mentioned in the documentation in nowhere to be found! See below the list of binaries I get in build-prefix/mac/clang/x86_64/rel/bin : everything seems to be there expect blastn_vdb (and tblastn_vdb).

Moreover I do not find any reference to blastn_vdb in any of the Makefiles. Actually the only reference I find in the source code are in test/tarballs/test-tarballs.sh and in the README.

Am I missing something obvious? Any help would be much appreciated!

List of built binaries:

abi-dump
abi-load
align-cache
align-info
bam-load
cache-mgr
ccextract
cg-load
check-corrupt
copycat
fastdump
fastq-dump
fastq-load
general-loader
helicos-load
illumina-dump
illumina-load
kar
kdbmeta
kget
latf-load
md5cp
pacbio-load
pacbio-loadxml
pileup-stats
prefetch
rcexplain
read-filter-redact
sam-dump
sff-dump
sff-load
sra-pileup
sra-sort
sra-stat
srapath
srf-load
test-sra
vdb-config
vdb-copy
vdb-decrypt
vdb-diff
vdb-dump
vdb-encrypt
vdb-lock
vdb-passwd
vdb-unlock
vdb-validate

fastq-dump does not respect caching configuration

I'm working on a cluster with a very small $HOME partition, so I recently learned the hard way about the caching behavior of fastq-dump. I'm sure this behavior is very useful in some cases, but I think the vast majority of users would be better served if this were an opt-in feature rather than the default. But I digress...

While I was searching the documentation to figure out how to change the default cache directory, I noticed that caching can be disabled. At least, that's the claim. However, even after disabling caching with vdb-config, fastq-dump still creates large cache files.

I'm using version 2.6.2. This is the ~/.ncbi/user-settings.mkfg generated by vdb-config.

## auto-generated configuration file - DO NOT EDIT ##

/config/default = "false"
/krypto/pwfile = "$(NCBI_HOME)/vdb-passwd"
/repository/user/cache-disabled = "true"
/repository/user/default-path = "/home/standage/ncbi"
/repository/user/main/public/root = "/scratch/standage/sra-cache"

Unable to convert fastq to sra

I am trying to convert fastq to sra using these xml files and using this command.

$ fastq-load  -r run.xml -e experiment.xml -o out

But it is throwing this error

2015-12-24T06:03:46 fastq-load.2.5.5 err: tag not found while constructing formatter - EXPERIMENT/.../READ_SPEC

I tried to add random READ_SPEC tag but it is still throwing same error.

item not found on CentOS6 with configured ./fastq-dump

I downloaded CentOS (64 bit) binaries and ran the configuration file to set the output directory.

I ran

  ./bin/fastq-dump -X 5 -Z SRR390728
  2015-12-12T22:11:26 fastq-dump.2.5.5 err: item not found while constructing within virtual database module - the path 'SRR390728' cannot be opened as database or table

I was not sure whether this means a connection error. I tried to set the proxy to one which is used on that machine (echo $ftp_proxy same as echo $http_proxy), with no effect (the same error message).

Is it a connectivity issue?

make error

I'm trying to install the sra toolkit on a clean Debian (Jessie) system and I'm getting an error when I run make:

make[1]: Entering directory '/usr/local/ncbi/sra-tools-2.8.0/shared'
make[1]: *** No rule to make target '/libs/kfg/certs.kfg', needed by '/root/ncbi-outdir/sra-tools/linux/gcc/x86_64/rel/bin/ncbi/certs.kfg'.  Stop.
make: *** [shared] Error 2
make[1]: Leaving directory '/usr/local/ncbi/sra-tools-2.8.0/shared'
Makefile:48: recipe for target 'shared' failed

Below is other information on how I am attempting to run configure and make. I would note that building sra-tools using the steps listed below completes successfully on 2.7.0. I only seem to get this error on 2.8.0.

The following are the steps I took to build and install sra-tools, vdb and the ngs-sdk:

NGS_VERSION=1.3.0
VDB_VERSION=2.8.0
SRA_VERSION=2.8.0

mkdir /usr/local/ncbi
cd /usr/local/ncbi

wget -O ngs.tar.gz https://github.com/ncbi/ngs/archive/${NGS_VERSION}.tar.gz
tar -xvzf ngs.tar.gz
cd ngs-${NGS_VERSION}
./configure --prefix=/usr/local
make
make install
cd ..
rm ngs.tar.gz

wget -O vdb.tar.gz https://github.com/ncbi/ncbi-vdb/archive/${VDB_VERSION}.tar.gz
tar -xvzf vdb.tar.gz
cd ncbi-vdb-${VDB_VERSION}
./configure --prefix=/usr/local
make
make install
cd ..
rm vdb.tar.gz

wget -O sra-tools.tar.gz https://github.com/ncbi/sra-tools/archive/${SRA_VERSION}.tar.gz
tar -xvzf sra-tools.tar.gz
cd sra-tools-${SRA_VERSION}
./configure --prefix=/usr/local --with-ngs-sdk-prefix=/usr/local/ncbi/ngs-${NGS_VERSION}/ngs-sdk --with-ncbi-vdb-build=/root/ncbi-outdir --with-ncbi-vdb-sources=/usr/local/ncbi/ncbi-vdb-${VDB_VERSION}
make
make install
cd ..
rm sra-tools.tar.gz

The following is the output from my ./configure step of sra-tools:

Configuring SRA-TOOLS package
checking system type... Linux
checking OS distributor... 
checking machine architecture... x86_64
checking for supported architecture... x86_64 (64 bits) is supported
checking for supported OS... Linux (linux) is supported
checking for supported tool chain... gcc tool chain is supported
checking for g++... yes
checking whether gcc accepts -Wno-array-bounds... yes
checking whether g++ accepts -static-libstdc++... yes
checking for fuse library... no
checking for hdf5 library... no
checking for magic library... yes
checking for xml2 library... yes
checking for ngs-sdk package...
    includes... /usr/local/ncbi/ngs-1.3.0/ngs-sdk
    libraries... no
    libraries... /root/ncbi-outdir/ngs-sdk/linux/gcc/x86_64/rel/lib
includes: /usr/local/ncbi/ngs-1.3.0/ngs-sdk
libraries: /root/ncbi-outdir/ngs-sdk/linux/gcc/x86_64/rel/lib
checking for ncbi-vdb package source files and build results...
    includes... /usr/local/ncbi/ncbi-vdb-2.8.0
    src... /usr/local/ncbi/ncbi-vdb-2.8.0
    libraries... no
    src...  libraries... /root/ncbi-outdir/ncbi-vdb/linux/gcc/x86_64/rel/ilib
includes: /usr/local/ncbi/ncbi-vdb-2.8.0/interfaces
libraries: /root/ncbi-outdir/ncbi-vdb/linux/gcc/x86_64/rel/lib
ilibraries: /root/ncbi-outdir/ncbi-vdb/linux/gcc/x86_64/rel/ilib

configure: creating 'build/ld.linux.exe_cmd.sh'
configure: creating 'build/Makefile.config.linux.x86_64'
configure: creating 'Makefile.config'
configure: creating 'reconfigure'
build type: release
build prefix: /root/ncbi-outdir/sra-tools
build output path: /root/ncbi-outdir/sra-tools/linux/gcc/x86_64/rel
includedir: /usr/local/include
bindir: /usr/local/bin
libdir: /usr/local/lib
CC = gcc -c
CPP = g++
configured with: "'--prefix=/usr/local' '--with-ngs-sdk-prefix=/usr/local/ncbi/ngs-1.3.0/ngs-sdk' '--with-ncbi-vdb-build=/root/ncbi-outdir' '--with-ncbi-vdb-sources=/usr/local/ncbi/ncbi-vdb-2.8.0'"

fastq-dump requires network access, even with local files

Newer versions of fastq-dump apparently attempt to access NCBI resources via some network port (not standard http), even when all of the required resources are local (retrieved with prefetch). This is not similar to previous behavior (v2.3.3), and makes it difficult to spread out the work of converting SRA files on a large cluster where the compute nodes don't have access to outside networks. Perhaps an option could be added to skip the attempted network access (e.g. --no-network)? If it is absolutely essential that this access take place, please let us know which port needs to be forwarded.

One reason that submitting to a cluster is necessary is that fastq-dump is slow (see issue #24) and grows to use a huge amount of memory (either a memory leak or cachind unnecessary data).

sam-dump produces invalid SAM

I'm not able to ascertain whether this is a problem of the raw data or of sam-dump, but when I do e.g. this:

sam-dump -r SRR2141560 | samtools view -b - > SRR2141560.bam

I get errors of the type:

[E::sam_parse1] SEQ and QUAL are of different length
[W::sam_read1] parse error at line 187
[main_samview] truncated file.

The offending line looks like this:

35      163     1       17649   0       41S110M =       17898   400     GAGCACAAGCACTACTTACTGGCCTAGGTTGTGAGAGAAGTTGATGCTGCTGGGAAGACCCCCAAGTCCCTCTTCTGCATCGTCCTCGGGCTCCGGCTTGGTGCTCACGCACACAGGAAAGTCCTTCAGCTTCTCCTGAGAGGGCCAGGAT   ::;=@@B@CBCACBACABBDC@CAB7DAAADBD@DCDCACCBDCDDCD        RG:Z:4638-MDA-03_CGGCTATG_H0BLEALXX_L004        NH:i:2  NM:i:1

I've tried with sra tools versions 2.3.5 and 2.5.7, with the same result. Any idea what's going on there?

does not load data from fastq files to directory

I'm forwarding a bug from Debian. You can find the full bug log here.
I have build sra-toolkit 2.7.0 and got:
$ latf-load --quality PHRED_33 -o test /usr/share/doc/python-biopython-doc/Tests/Quality/sanger_93.fastq
2016-09-01T13:27:09 latf-load.2.7 warn: file="/usr/share/doc/python-biopython-doc/Tests/Quality/sanger_93.fastq"
2016-09-01T13:27:09 latf-load.2.7 warn: /usr/share/doc/python-biopython-doc/Tests/Quality/sanger_93.fastq:1:12:syntax error, unexpected fqWS
2016-09-01T13:27:09 latf-load.2.7 warn: The file contained no records that were processed.
2016-09-01T13:27:09 latf-load.2.7 err: libs/kproc/queue.c:356:KQueuePop: data empty while reading file within alignment module - accession="test" errors="1" status="failure"
2016-09-01T13:27:09 latf-load.2.7 err: libs/kproc/queue.c:356:KQueuePop: data empty while reading file within alignment module - load failed

So the issue seems to remain that there is some problem with loading data from fastq files.

Kind regards, Andreas.

Invalid conversion prevents build completion

Hi,

I'm trying to build sra-tools from source following the instructions on the wiki. ngs and ncbi-vdb compile fine, but then I run into trouble with sra-tools (paths have been somewhat abbreviated):

/opt/ncbi/sra-tools/tools/align-cache/helper.cpp: In member function ‘const char* KApp::CArgs::GetParamValue(uint32_t) const’:
/opt/ncbi/sra-tools/tools/align-cache/helper.cpp:735:96: error: invalid conversion from ‘const void**’ to ‘const char**’ [-fpermissive]
         rc_t rc = ::ArgsParamValue ( m_pSelf, iteration, reinterpret_cast<const void **>(&ret) );
                                                                                                ^
In file included from /opt/ncbi/sra-tools/tools/align-cache/helper.h:40:0,
                 from /opt/ncbi/sra-tools/tools/align-cache/helper.cpp:29:
/opt/ncbi/ncbi-vdb/interfaces/kapp/args.h:202:9: error:   initializing argument 3 of ‘rc_t ArgsParamValue(const Args*, uint32_t, const char**)’ [-fpermissive]
 rc_t CC ArgsParamValue (const Args * self, uint32_t iteration, const char ** value_string);
         ^

and similarly for ArgsOptionValue. Has this been observed before?

My system is ubuntu 14.04, x86_64, gcc version 4.8.4.

Thanks.

Per