diltheylab / hla-la Goto Github PK

View Code? Open in Web Editor NEW

120.0 11.0 40.0 1.14 MB

Fast HLA type inference from whole-genome data

License: GNU General Public License v3.0

C++ 79.07% Perl 20.06% Shell 0.01% Makefile 0.27% Prolog 0.60%

hla-la's People

Contributors

Stargazers

Watchers

Forkers

kpalin biocodings hanpingchen xtmgah asazonov xchromosome219 bgistone aakrosh moutsian jlrgraham23 hmyh1202 santy-8128 evanbiederstedt 0kd jafors flywind2 miko-798 jlanej shulp2211 lifebit-ai cgpu toastyjt awilcz xiaobo199405 jingquanlim rnaimehaom shicheng-guo p15 thekingofall bsonjia irunonayran libingnan11 yinxx jonaszierer tzhang-nmdp atvicenti suraj-adewale suhrig ningshuang-yao eneritz

hla-la's Issues

Can't start the test

I have indexed the graph and now trying to run the test. But I can't:

root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA/bin# ./HLA-PRG-LA --bwa_bin $BWA --samtools_bin $SAMTOOLS --BAM /outputs
/NA12878.cram --graph ../graphs/PRG_MHC_GRCh38_withIMGT --sampleID NA12878_serge --maxThreads 8 --workingDir ./
HLA-PRG-LA: HLA-PRG-LA.cpp:1350: int main(int, char**): Assertion `Utilities::directoryExists(allGraphsOutputDir)' failed.
Aborted (core dumped)

About memory used

Hi,

my paired-end fastq file:
R1.fastq (250 Million reads, 150bp, ~1.2 GB)
R2.fastq (250 Million reads, 150bp, ~1.2 GB)

run HLA-LA will used about 300~400 GB RAM and ~90GB swap
Have any way to reduce RAM used?

my command is:

HLA-LA.pl --BAM Bwa-mem2/HLA_PM9-0232_S7.sorted.bam --graph PRG_MHC_GRCh38_withIMGT --workingDir . --sampleID HLA_LA --maxThreads 50

total time spend about 24HR~36HR

And another question is how can only typing some gene?
because I just need class I and II gene.

thanks!

yubau

Graph path for Conda install

Hi, I have tried HLA:LA recently, I installed it through bioconda, however, I haven't seen any instruction at the end of the installation.

I downloaded the PRG_MHC_GRCh38_withIMGT graph manually, unzip them into the path ~/anaconda3/opt/hla-la/src/graphs/ (the full path is ~/anaconda3/opt/hla-la/src/graphs/PRG_MHC_GRCh38_withIMGT). However, when I run the command, it still gives error:


HLA-LA.pl --BAM $file --sampleID $sample_name --graph PRG_MHC_GRCh38_withIMGT/ --maxThreads 10 --workingDir ~/test_hlala
HLA-LA.pl

Identified paths:
        samtools_bin: ~/anaconda3/bin/samtools
        bwa_bin: ~/anaconda3/bin/bwa
        java_bin: ~/anaconda3/bin/java
        picard_sam2fastq_bin: ~/anaconda3/bin/picard
        General working directory: ~/test_hlala
        Sample-specific working directory: ~test_hlala/0411_01_01

Graph directory ~/opt/hla-la/src/../graphs/PRG_MHC_GRCh38_withIMGT/ not found - valid graph names are subdirectories of the graphs directory in the HLA-LA root at ~/anaconda3/bin/HLA-LA.pl line 198.

I checked the directory:


file "/home/nguyen/anaconda3/opt/hla-la/src/Graph/PRG_MHC_GRCh38_withIMGT/"
/home/nguyen/anaconda3/opt/hla-la/src/Graph/PRG_MHC_GRCh38_withIMGT/: directory

and the files inside

ls
extendedReferenceGenome  knownReferences  mapping  mapping_PRGonly  PRG  referenceGenomeSimulations  sampledReferenceGenomes  sequences.txt  translation

What did go wrong in my case?

indexing - error

Hi
I did the indexing with the following command
../bin/HLA-PRG-LA --action prepareGraph --PRG_graph_dir ../graphs/PRG_MHC_GRCh38_withIMGT

It began, but then was killed: 31800000Killed
Please, help, what shall I do?

bamtools installation conflicts with HLA-LA makefile settings

The makefile settings doesn't seem to be fully compatible with bamtools (version tested 2.5.1). I installed my bamtools not in the build directory but in a different directory (defined in DCMAKE_INSTALL_PREFIX). If the a user installs it this way (see example below), then where is no src in the installed location and also it seems with version 2.5.1, the lib location is not named lib64 as suggested in the HLA-LA makefile. Note I'm aware there is also the commented out codes in makefile for the older bamtools version as well, but still either way it's not fully compatible, and needs tweaking...

Methods used for bamtools installation, which doesn't quite work with the makefile defined by HLA-LA

git clone --branch v2.5.1 git://github.com/pezmaster31/bamtools.git
cd bamtools
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=$BAMTOOLS_PATH ..
make
make install

Indexing memory usage and other things

Hello!
I am running the indexing process inside the Docker-container on a server with 30 Gb RAM + 8 Gb swap and 8 CPUs.
In first 15 minutes of indexing RAM usage grew up to all RAM (29.3 of 29.5 Gb) and to 4 Gb of swap.

I think it would be better to mention it somewhere in prerequests.
Also the process took only 1 CPU of 8. Is it possible to use more?
As I understand, the indexing process generates 2 files (9.2 Gb):

-rw-r--r--  1 root  502 5491768320 Jun 19 17:41 serializedGRAPH
-rw-r--r--  1 root  502 4197127944 Jun 19 17:41 serializedGRAPH_preGapPathIndex

Does the program need all files from source PRG_MHC_GRCh38_withIMGT.tar.gz or only new 2 files?

All process took 2 hours and 20 minutes.
4. I made a second run and got a bit different files:

-rw-r--r--  1 root  502 5491768268 Jun 20 07:17 serializedGRAPH
-rw-r--r--  1 root  502 4197127912 Jun 20 07:17 serializedGRAPH_preGapPathIndex

How do I have to understand, that indexing is done correctly?

execution not successful

Hi! I installed via conda, added a genome file via idxstats (for hg19 with "chr" in the chromosome names, pulling reads from the MHC region on chr6, the chr6 haplotype contigs, and unmapped reads), and started jobs for a couple of whole exome BAM files. One's still running, but two have stopped with a mysterious error.

STDOUT:

HLA-LA.pl

Identified paths:
        samtools_bin: /home/ubuntu/HLA.LA/miniconda3/bin/samtools
        bwa_bin: /home/ubuntu/HLA.LA/miniconda3/bin/bwa
        java_bin: /home/ubuntu/HLA.LA/miniconda3/bin/java
        picard_sam2fastq_bin: /home/ubuntu/HLA.LA/miniconda3/bin/picard
        General working directory: /home/ubuntu/HLA.LA
        Sample-specific working directory: /home/ubuntu/HLA.LA/primary

Extract reads from 9 regions...
Extract unmapped reads...
Merging...
Indexing...
Extract FASTQ...
        /home/ubuntu/HLA.LA/miniconda3/bin/picard SamToFastq VALIDATION_STRINGENCY=LENIENT I=/home/ubuntu/HLA.LA/primary/extraction.bam F=/home/ubuntu/HLA.LA/primary/R_1.fastq F2=/home/ubuntu/HLA.LA/primary/R_2.fastq FU=/home/ubuntu/HLA.LA/primary/R_U.fastq 2>&1

Now executing:
../bin/HLA-LA --action HLA --maxThreads 5 --sampleID primary --outputDirectory /home/ubuntu/HLA.LA/primary --PRG_graph_dir /home/ubuntu/HLA.LA/miniconda3/opt/hla-la/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQU /home/ubuntu/HLA.LA/primary/R_U.fastq.splitLongReads --FASTQ1 /home/ubuntu/HLA.LA/primary/R_1.fastq --FASTQ2 /home/ubuntu/HLA.LA/primary/R_2.fastq --bwa_bin /home/ubuntu/HLA.LA/miniconda3/bin/bwa --samtools_bin /home/ubuntu/HLA.LA/miniconda3/bin/samtools --mapAgainstCompleteGenome 1 --longReads 0
Set maxThreads to 5
 [ Fri May 17 21:34:02 2019 ] Graph serialization existing and newer than graph file; read from /home/ubuntu/HLA.LA/miniconda3/opt/hla-la/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/serializedGRAPH
 [ Fri May 17 21:39:27 2019 ]   done.

STDERR:

[bam_translate] RG tag "20171025_1728510947" on read "K00277:41:HMCGGBBXX:1:1101:10318:13042" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "K00277:41:HMCGGBBXX:1:1101:10318:13042" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
 [ Fri May 17 21:39:58 2019 ] processBAM::processBAM(..): Start graph gap analysis.
 [ Fri May 17 21:40:01 2019 ] processBAM::processBAM(..) graph gap analysis: have 3494 graph gap stretches; criterion length >= 3
/home/ubuntu/HLA.LA/miniconda3/bin/bwa mem -t5 -M -a /home/ubuntu/HLA.LA/miniconda3/opt/hla-la/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/extendedReferenceGenome/extendedReferenceGenome.fa /home/ubuntu/HLA.LA/primary/R_1.fastq /home/ubuntu/HLA.LA/primary/R_2.fastq | /home/ubuntu/HLA.LA/miniconda3/bin/samtools view -Sb - > /home/ubuntu/HLA.LA/primary/remapped_with_a.bam.unsorted
terminate called after throwing an instance of 'std::runtime_error'
  what():  Command /home/ubuntu/HLA.LA/miniconda3/bin/bwa mem -t5 -M -a /home/ubuntu/HLA.LA/miniconda3/opt/hla-la/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT/extendedReferenceGenome/extendedReferenceGenome.fa /home/ubuntu/HLA.LA/primary/R_1.fastq /home/ubuntu/HLA.LA/primary/R_2.fastq | /home/ubuntu/HLA.LA/miniconda3/bin/samtools view -Sb - > /home/ubuntu/HLA.LA/primary/remapped_with_a.bam.unsorted returned code -1
HLA-LA execution not successful. Command was ../bin/HLA-LA --action HLA --maxThreads 5 --sampleID primary --outputDirectory /home/ubuntu/HLA.LA/primary --PRG_graph_dir /home/ubuntu/HLA.LA/miniconda3/opt/hla-la/src/../graphs/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQU /home/ubuntu/HLA.LA/primary/R_U.fastq.splitLongReads --FASTQ1 /home/ubuntu/HLA.LA/primary/R_1.fastq --FASTQ2 /home/ubuntu/HLA.LA/primary/R_2.fastq --bwa_bin /home/ubuntu/HLA.LA/miniconda3/bin/bwa --samtools_bin /home/ubuntu/HLA.LA/miniconda3/bin/samtools --mapAgainstCompleteGenome 1 --longReads 0

Any idea what this is about?

Thanks,
~Joe

Typing of HLA-DP Locus

Hi there,

Thanks for this tool. Is it possible to use HLA*LA to type the HLA-DP locus? When I was reading the paper (https://academic.oup.com/bioinformatics/article/35/21/4394/5426702), I didn't see any mention of the HLA-DP locus in any of the results/benchmarking.

Kind regards,

Fong

A problem when test pacbio data

When I use pacbio data to test the typing effect of HLA-LA, I can't get the correct answer.
I first searched on https://www.ebi.ac.uk using the registration number PRJNA323611, then selected the Experiment accession which is SRX1837275 to download the data, and the five data files obtained are integrated using the cat command:

I receive the file "pacbio_test.fastq.gz".
Then I use the command "bwa mem -t 7 -M -x pacbio /mnt/data/data/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa pacbio_test.fastq.gz |samtools view -bS | samtools sort -o pacbio_test_sort.bam -" to process the data and get the sorted bam file. And I index the bam file.
Next, I use the command "./HLA-LA.pl --BAM /mnt/data/data/pacbio/pacbio_test_sort.bam --graph PRG_MHC_GRCh38_withIMGT --sampleID pacbio_test --maxThreads 7 --longReads pacbio --samtools_T /mnt/data/data/NA12878_final/GRCh38_full_analysis_set_plus_decoy_hla.fa" to run the HLA-LA, I can only get the result like this:

The file "hla" is empty.
I can't get the right typing result. Would you be so kind and help me to solve this problem? The log of command execution is attached to the attached file.

installation prereq

Hi I am trying to install HLA_LA prereq boost. I downloaded boost here: https://www.boost.org/users/download/. But it doesnt seem to contain the folders "include" and "lib" that seem required from the makefile of HLA-LA.
Any help would be appreciated!
Thanks!

if I have fastq paired read data, what should i do?

I noticed that your code extract bam to fastq first, and mapping with your reference.
if I have fastq already, Should i skip extract bam file and use my fastq as input to your program? How?

Wrong version of samtools

root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA# samtools --version
samtools 1.3.1
Using htslib 1.3.1
Copyright (C) 2016 Genome Research Ltd.

./bin/HLA-PRG-LA --bwa_bin $BWA --samtools_bin /soft/samtools/bin/samtools --BAM /outputs/NA12878.cram --graph ../graphs/PRG_MHC_GRCh38_withIMGT --sampleID NA12878_serge --maxThreads 8 --workingDir ./
....
/home/dilthey/bowtie2-2.2.8/bowtie2 -a -L 25 --omit-sec-seq -x tmp/simulatedGraphs/G1/bowtie2IDX -U tmp/simulatedGraphs/G1/FASTQ/R_1.fq -S tmp/simulatedGraphs/G1/BAM/output_1.bam.sam
1158 reads; of these:
  1158 (100.00%) were unpaired; of these:
    0 (0.00%) aligned 0 times
    29 (2.50%) aligned exactly 1 time
    1129 (97.50%) aligned >1 times
100.00% overall alignment rate
/soft/samtools/bin/samtools view -bS tmp/simulatedGraphs/G1/BAM/output_1.bam.sam > tmp/simulatedGraphs/G1/BAM/output_1.bam.unsorted
/soft/samtools/bin/samtools sort tmp/simulatedGraphs/G1/BAM/output_1.bam.unsorted tmp/simulatedGraphs/G1/BAM/output_1
[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files
Usage: samtools sort [options...] [in.bam]
Options:
  -l INT     Set compression level, from 0 (uncompressed) to 9 (best)
  -m INT     Set maximum memory per thread; suffix K/M/G recognized [768M]
  -n         Sort by read name
  -o FILE    Write final output to FILE rather than standard output
  -T PREFIX  Write temporary files to PREFIX.nnnn.bam
  -@, --threads INT
             Set number of sorting and compression threads [1]
      --input-fmt-option OPT[=VAL]
               Specify a single input file format option in the form
               of OPTION or OPTION=VALUE
  -O, --output-fmt FORMAT[,OPT[=VAL]]...
               Specify output format (SAM, BAM, CRAM)
      --output-fmt-option OPT[=VAL]
               Specify a single output file format option in the form
               of OPTION or OPTION=VALUE
      --reference FILE
               Reference sequence FASTA FILE [null]
terminate called after throwing an instance of 'std::runtime_error'
  what():  Command /soft/samtools/bin/samtools sort tmp/simulatedGraphs/G1/BAM/output_1.bam.unsorted tmp/simulatedGraphs/G1/BAM/output_1 returned code 256
Aborted (core dumped)

bamtools installation problem

Hi I got an error in the mapper/processBAM.cpp:4341 while I was compiling. This is an older version of bamtools, so I did follow the instructions to comment in the include statement that was previously commented out. But the error still persists... Any help is appreciated! Thank you!

~/software/HLA-LA/src/mapper/processBAM.cpp:4341: undefined reference to BamTools::BamReader::GetReferenceID(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const' ../obj/processBAM.o: In function mapper::processBAM::updateBAMSelectors()':

Version of picard

There is one more problem:
https://github.com/broadinstitute/picard/releases/tag/2.9.4
Picard now has one single jar-file! And I can't provide a path to picard_sam2fastq_bin:

root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA# ./src/HLA-PRG-LA.pl --picard_sam2fastq_bin "/soft/picard/picard-2.9.0-all.jar SamToFastq"
Command-line supplied value/file for parameter picard_sam2fastq_bin not existing at ./src/HLA-PRG-LA.pl line 368.

The last version of picard with separate JAR-files is picard-tools is 1.123:
https://github.com/broadinstitute/picard/releases/tag/1.123

prepate graph

I downloaded the docker file and incorporated the commands for downloading the tar file, untar and prepare the index
I got the following error
prepareGraph
[ Wed Oct 14 16:16:25 2020 ] Read graph from /usr/local/bin/HLA-LA/graphs/PRG_MHC_GRCh38_withIMGT
30400000Killed
The command '/bin/sh -c /usr/local/bin/HLA-LA/bin/HLA-LA --action prepareGraph --PRG_graph_dir /usr/local/bin/HLA-LA/graphs/PRG_MHC_GRCh38_withIMGT' returned a non-zero code: 137

add additional reference for Ensembl human assembly 38

Hi, I would like to add this reference. idxstats shows the following:

1 248956422 5335060 0
10 133797422 2096960 0
11 135086622 3007088 0
12 133275309 2220636 0
13 114364328 1559146 0
14 107043718 3120738 0
15 101991189 1346306 0
16 90338345 1713268 0
17 83257441 2357610 0
18 80373285 618100 0
19 58617616 1441954 0
2 242193529 3693058 0
20 64444167 1508904 0
21 46709983 22438086 0
22 50818468 703842 0
3 198295559 3085302 0
4 190214555 1500350 0
5 181538259 2348104 0
6 170805979 3106762 0
7 159345973 2917186 0
8 145138636 2210556 0
9 138394717 2446060 0
MT 16569 147374 0
X 156040895 1218960 0
Y 57227415 37718 0
KI270728.1 1872759 1868 0
KI270727.1 448248 514 0
KI270442.1 392061 232 0
KI270729.1 280839 170 0
GL000225.1 211173 228 0
KI270743.1 210658 322 0
GL000008.2 209709 656 0
GL000009.2 201709 1044 0
KI270747.1 198735 254 0
KI270722.1 194050 82 0
GL000194.1 191469 1550 0
KI270742.1 186739 1588 0
GL000205.2 185591 2370 0
GL000195.1 182896 9600 0
KI270736.1 181920 44 0
KI270733.1 179772 9788058 0
GL000224.1 179693 1092 0
GL000219.1 179198 1092 0
KI270719.1 176845 2440 0
GL000216.2 176608 206 0
KI270712.1 176043 1692 0
KI270706.1 175055 1674 0
KI270725.1 172810 316 0
KI270744.1 168472 794 0
KI270734.1 165050 434 0
GL000213.1 164239 260 0
GL000220.1 161802 9727154 0
KI270715.1 161471 22 0
GL000218.1 161147 3756 0
KI270749.1 158759 206 0
KI270741.1 157432 48 0
GL000221.1 155397 1372 0
KI270716.1 153799 32 0
KI270731.1 150754 414 0
KI270751.1 150742 244 0
KI270750.1 148850 30 0
KI270519.1 138126 18 0
GL000214.1 137718 254 0
KI270708.1 127682 76 0
KI270730.1 112551 12 0
KI270438.1 112505 78 0
KI270737.1 103838 42 0
KI270721.1 100316 1604 0
KI270738.1 99375 70 0
KI270748.1 93321 624 0
KI270435.1 92983 34 0
GL000208.1 92689 30 0
KI270538.1 91309 4 0
KI270756.1 79590 6 0
KI270739.1 73985 4 0
KI270757.1 71251 0 0
KI270709.1 66860 34 0
KI270746.1 66486 154 0
KI270753.1 62944 22 0
KI270589.1 44474 2 0
KI270726.1 43739 24 0
KI270735.1 42811 32 0
KI270711.1 42210 5314 0
KI270745.1 41891 2244 0
KI270714.1 41717 516 0
KI270732.1 41543 20 0
KI270713.1 40745 185894 0
KI270754.1 40191 4 0
KI270710.1 40176 0 0
KI270717.1 40062 1294 0
KI270724.1 39555 6 0
KI270720.1 39050 116 0
KI270723.1 38115 14 0
KI270718.1 38054 966 0
KI270317.1 37690 0 0
KI270740.1 37240 0 0
KI270755.1 36723 54 0
KI270707.1 32032 104 0
KI270579.1 31033 0 0
KI270752.1 27745 2 0
KI270512.1 22689 0 0
KI270322.1 21476 0 0
GL000226.1 15008 0 0
KI270311.1 12399 0 0
KI270366.1 8320 0 0
KI270511.1 8127 0 0
KI270448.1 7992 0 0
KI270521.1 7642 0 0
KI270581.1 7046 0 0
KI270582.1 6504 4 0
KI270515.1 6361 0 0
KI270588.1 6158 0 0
KI270591.1 5796 0 0
KI270522.1 5674 0 0
KI270507.1 5353 0 0
KI270590.1 4685 0 0
KI270584.1 4513 0 0
KI270320.1 4416 0 0
KI270382.1 4215 2 0
KI270468.1 4055 0 0
KI270467.1 3920 16 0
KI270362.1 3530 0 0
KI270517.1 3253 0 0
KI270593.1 3041 0 0
KI270528.1 2983 0 0
KI270587.1 2969 0 0
KI270364.1 2855 0 0
KI270371.1 2805 0 0
KI270333.1 2699 0 0
KI270374.1 2656 0 0
KI270411.1 2646 0 0
KI270414.1 2489 0 0
KI270510.1 2415 0 0
KI270390.1 2387 0 0
KI270375.1 2378 0 0
KI270420.1 2321 0 0
KI270509.1 2318 0 0
KI270315.1 2276 0 0
KI270302.1 2274 4 0
KI270518.1 2186 0 0
KI270530.1 2168 0 0
KI270304.1 2165 0 0
KI270418.1 2145 0 0
KI270424.1 2140 0 0
KI270417.1 2043 0 0
KI270508.1 1951 0 0
KI270303.1 1942 0 0
KI270381.1 1930 0 0
KI270529.1 1899 0 0
KI270425.1 1884 0 0
KI270396.1 1880 0 0
KI270363.1 1803 6 0
KI270386.1 1788 0 0
KI270465.1 1774 0 0
KI270383.1 1750 0 0
KI270384.1 1658 0 0
KI270330.1 1652 20 0
KI270372.1 1650 0 0
KI270548.1 1599 0 0
KI270580.1 1553 6 0
KI270387.1 1537 0 0
KI270391.1 1484 0 0
KI270305.1 1472 0 0
KI270373.1 1451 0 0
KI270422.1 1445 0 0
KI270316.1 1444 0 0
KI270340.1 1428 0 0
KI270338.1 1428 0 0
KI270583.1 1400 0 0
KI270334.1 1368 0 0
KI270429.1 1361 0 0
KI270393.1 1308 0 0
KI270516.1 1300 0 0
KI270389.1 1298 0 0
KI270466.1 1233 12 0
KI270388.1 1216 0 0
KI270544.1 1202 0 0
KI270310.1 1201 0 0
KI270412.1 1179 0 0
KI270395.1 1143 0 0
KI270376.1 1136 0 0
KI270337.1 1121 0 0
KI270335.1 1048 0 0
KI270378.1 1048 0 0
KI270379.1 1045 2 0
KI270329.1 1040 0 0
KI270419.1 1029 0 0
KI270336.1 1026 0 0
KI270312.1 998 0 0
KI270539.1 993 4 0
KI270385.1 990 0 0
KI270423.1 981 0 0
KI270392.1 971 0 0
KI270394.1 970 0 0

however I'm not entirely sure how to format the knownReferences file.

Thanks for the help,
Martin

Slow tests

Hello, thanks for great tool.
We're trying to implement it in our studies.

For now tests, described here, working well and our test results are very similar to your GitHub result. But it takes approximately 40 minutes to pass (on NA12878.mini.cram using server with 32 threads and 128 GB RAM) - near 10 minutes for remapping, near 4 minutes for HLA-typing and 25 minutes for intermediate step (I don't fully understand it).

Is it normal? Other HLA-typers we are using (for example xHLA) take a few (2-3) minutes on full WGS sample for typing - that's why I'm asking here. I understand remapping time, but not intermediate step.

Here is our result Test.R1_bestguess_G.txt if it can help to debug.

For some reason there are threads: 1 in all 136 blocks - looks like here is a code line controlling it.

And one more question: according to this line there are some functionality to use only ClassI and 3 ClassII HLA. How can we use this option from HLA-LA.pl run command?

User-defined path to graph directory

Hi @AlexanderDilthey,
first of all, thanks for your work, HLA-LA works like a charm, but still there is one thing I'd like to suggest:
I use HLA-LA in various projects and have integrated it in a pipeline mostly based on snakemake and conda. So, whenever I use the pipeline on a new analysis, the conda environments are re-built in the specific project folder. Since HLA-LA fixes the directory of the PRGs to the specific location of installation, I would need to download and index the graph for every project even if I wanted to use the same reference in all of them.

I think that the ability of a user-defined graph directory which allows to use already downloaded and indexed graphs would increase the reusability in this scenario and might be a helpful feature for some users.

I already worked a bit on a possible solution, so if you think this could be a helpful addition to HLA-LA, I would be happy to discuss it more deeply and make a PR.

Thanks in advance!
Best,
Jan

Is this tool able to type MICA

Hi, can this software be used to type MICA alleles?

samtools idxstats bam header not right

Hi,
i want to use HLA_LA on nanopore longreads bam file.and i use samtools idxstats on my bam and change the header as existing files. but i still get this bug:
Incorrect header for /bio/HLA-LA/HLA-LA/../graphs/PRG_MHC_GRCh38_withIMGT/knownReferences/XH_nanopore_longreads.txt - expect PartialExtraction_Stop, got PartialExtraction_Stop

and i check for many times ,the header shoule be right.
so can you help me. maybe it is a silly problem.
XH_nanopore_longreads.txt
XH.idxstats.txt

Thanks and best,
masen

I cannot index the graph

So I am trying to set up HLA-LA to work on a ubuntu 18.04 docker machine. I have followed you guide and managed to receive a message "HLA*LA binary functional!".
Now I've downloaded the graph and tried to index it but it always fails at the same point. I though it could be memory related problem so I've pushed memory limit of my image to 64G but it still fails at the same spot. Now please note that I have ignored the "Modifying paths.ini" step until now, but I don't see any connection between these two steps.
Can you think of the reason why my indexing fails?
Thank you!

Installation via conda fails

Hello,

I have attempted to install HLA-LA via conda by running "conda install -c bioconda hla-la". The installation fails with the following output:

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: \
Found conflicts! Looking for incompatible packages.                                                                                  failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

As you can see, there is frustratingly no error message describing which packages are incompatible. Regardless, this is in a fresh conda environment just for this package, so I'm not sure what would be triggering the incompatibility. Any idea what might be going on here? This is using conda v4.8.3.

Only predict class I MHC alleles

Hi,

I am wondering if adding an option of specifying only class I or II alleles to predict at a time will reduce time/resource usage. If it does, I can perform HLA-PRG-LA on a resource-tensive machine with only 32GB memory.

Thanks.

Assertion `sequencesStream.is_open()' failed.

Hi,

I am trying to run HLA-PRG-LA with the test.cram and got Assertion `sequencesStream.is_open()' failed.
I was wondering if any one else have encountered such an issue and knows how it can be fixed?

Thanks,
Wendy

[ Fri Sep 22 15:17:17 2017 ] Remapping done.
HLA-PRG-LA: mapper/processBAM.cpp:1173: void mapper::processBAM::initBAM(std::string, bool, bool): Assertion `sequencesStream.is_open()' failed.
HLA-PRG-LA execution not successful. Command was ../bin/HLA-PRG-LA --action HLA --maxThreads 4 --sampleID test --outputDirectory /HLA.Typing/HLA_PRG_LA/test/test --PRG_graph_dir /hla-prg-la/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQ1 /HLA.Typing/HLA_PRG_LA/test/test/R_1.fastq --FASTQ2 /HLA.Typing/HLA_PRG_LA/test/test/R_2.fastq --bwa_bin /bin/bwa --samtools_bin /SAMtools/1.3/bin/samtools --mapAgainstCompleteGenome 0

Custom path to indexed graph

Hello!
I think it would be more comfortable if user could specify full path to graph dir. It looks like the main place to be changed is:

my $full_graph_dir = $FindBin::RealBin . '/../graphs/' . $graph;

Performance issue

Hello, I am part of Bioinformatics team in Seven Bridges Genomics and we are interested in porting HLA-LA algorithm to our platform. We would appreciate your assistance with some of the issues we encountered.

So far, we have created proper environment via Docker and I successfully ran the algorithm on the provided test .cram file. Nevertheless we are concerned about execution time.
The algorithm was ran on a AWS machine: r4.4xlarge with 16 CPU and 122 GB RAM
Execution lasted for 2 hours, and we've set the maxThreads parameter to 16.
While looking into execution details we found out that our AWS instance used only 1 CPU.

Would you be so kind and help us understand what could be the reason of this multithreading failure? Is there some module or package we are missing that could help us achieve the full performance?

We are providing you a log file that came out of our AWS instance.
Thank you in advance.

job.err.log

errors running test

Hi,

I am running the smaller test file provided with this command: ../src/HLA-PRG-LA.pl --BAM NA12878.mini.cram --graph ../graphs/PRG_MHC_GRCh38_withIMGT --sampleID NA12878 --maxThreads 7

It returns this error message:
HLA-PRG-LA: mapper/bwa/BWAmapper.cpp:126: void mapper::bwa::BWAmapper::mapLong(std::string, std::string, std::string, bool, std::string): Assertion `(longMode == "pacbio") || (longMode == "ont2d")' failed.

I don't have any longMode input and it should not go to mapLong function. Is there any issues with my command or installation ?

Thanks,
Sarah

Improvements to Documentation

In the section titled Basics is a link to a blog post which no longer exists. IMGT is regularly updated. The overview should specify the version of the database used to create the data package (e.g. 3.34.0) to enhance research results reproducibility and identifability. The reference is to the pre-print, but it should instead be to the journal article in Bioinformatics.

Test CRAM files now missing

Hi
Just to flag up the NA12878 CRAM files are gone from the link in README.MD:

https://gembox.cbcb.umd.edu/shared/NA12878.cram

It says the server has been retired??

Andrew

HLA-PRG-LA output on the Exome samples

Hello,

I was just wondering, if there is a way to make the HLA-PRG-LA output generated on exomes for the pre-print https://www.biorxiv.org/content/early/2018/10/26/453555 available.

I am currently testing it out, and got a lower accuracy% for HLA-A and HLA-C from the same 29 samples. So wanted to compare with the results from the pre-print and check which samples are discordant.

Thank you.

A problem about reference?

I downloaded the reference GRCh38 from Ensemble and build the index with bowtie2, but the problem happened like below, anyone can give me some advice?
Have found no compatible reference specifications in /public/home/fanlj/mlf/HLA/HLA-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/knownReferences - create a file for this BAM file and try again. at ../../HLA-LA/src/HLA-LA.pl line 315

only for GRCh38? not Hg19?

I found the graph is named PRG_MHC_GRCh38_withIMGT , but my REF is hg19 ? can i use this soft to anlysis HLA ?

update makefile for current bamtools, boost

Hello,

Thank you for providing this excellent software. Would you consider updating your makefile to work with the current bamtools repo and boost libraries in $HOME, to ease installation for new users?

See bamtools.

BOOST_INCLUDE = {$HOME}/boost/boost_1_64_0/install/include
BOOST_LIB = {$HOME}/boost/boost_1_64_0/install/lib
BAMTOOLS_INCLUDE = {$HOME}/bamtools/include/api

BAMTOOLS_SRC = {$HOME}/bamtools/src
BAMTOOLS_LIB = {$HOME}/bamtools/lib

INCS = -I$(BOOST_INCLUDE) -I$(BAMTOOLS_INCLUDE) -I$(BAMTOOLS_SRC)
LIBS = -L$(BOOST_LIB) -L$(BAMTOOLS_LIB) -lboost_random -lboost_filesystem -lboost_system  -lbamtools -lbamtools-utils -lz -lboost_serialization

and

export LD_LIBRARY_PATH="{$HOME}/bamtools/lib:$LD_LIBRARY_PATH"

Let me know if you would like a PR to update the makefile and README. Thanks!

Andrew

Sample names with strange characters

I am using HLA-LA within the 1,000 genomes project environment and unfortunately sample names within the BAMs are for example LP000000-DNA_A01 and so your program gives an error. I can reheader using samtools to LP00000DNAA01 and then the program works fine (but in order to do this I have to copy the bam into a writeable file by me first...). As I am trying to run this on thousands of bams, this is becoming very time consuming and memory gobbling! Is there anything I can do to modify the code to allow - and/or _ into sample names?

Any help would be very appreciated!

Helen

Allowing for non-alphanumeric characters in sample IDs

Hi there,

Is it possible to allow for the --sampleID argument to accept non-alphanumeric characters? I often work with identifiers that contain a - in it and when I try to run for example something like this:

HLA-LA.pl \
        --BAM tumour-sample.bam \
        --graph PRG_MHC_GRCh38_withIMGT \
        --sampleID tumour-sample \
        --maxThreads 1 \
        --workingDir output/hla_la

I get this error:

Please use only alphanumeric characters - \w+ - for --sampleID at /projects/ace_benchmarking/miniconda3/envs/hla-la/bin/HLA-LA.pl line 147.

I could specify a --sampleID devoid of any non-alphanumeric characters, but this adds post-processing step to marry the results with the original sample identifiers, which I would like to avoid.

Kind regards,

Fong

Need a bit more detail in README.md under Test run section

I'm trying to install HLA-PRG-LA on the NIH Biowulf cluster. I'd like to complete the steps in the 'Test run' section. Can you please include the actual command(s) needed to run HLA-PRG-LA on the NA12878.cram dataset?

I'm also very happy that you have provided a dataset for testing and validation. I wish more developers would do so. But the dataset is very large and takes a long time to download. Is it possible to create a smaller dataset for testing? If it's small enough, perhaps it could be packaged as part of the installation so that a separate download isn't necessary.

Thank you.

support for non-paired short reads?

When I run with unpaired reads, I get "You didn't activate --longReads, but the two files ... (which store paired-end reads) are empty - this is weird, and I will abort".

Is there support for non-paired short reads? If not, is this a fundamental issue, or could it be added as an enhancement? I assume just adding the --longReads option isn't a good idea.

Question about perfectG value and running on HPC cluster

Hello,

Thank you very much for providing this tool. I was able to run the program successfully on my samples, but I am confused about the perfectG value in the output. In the README doc, it is written:
perfectG: Perfect translation from HLA*PRG:LA call to G grop resolution? (1 = Yes)
This would imply that perfectG = 1 indicates no error, but then it is also written:
If perfectG != 0, you might want to check ../working/$mySampleID/hla/R1_bestguess.txt, which contains the un-translated output from the internal model of the algorithm.
Is this maybe a typo? And how does the R1_bestguess.txt file help in the case where the translation is not perfect?

In addition, I just wanted to let you know that this tool isn't fully compatible with a cluster environment. HLA-LA requires indexing of the graph, but since most users on a cluster can't modify any program files, this can cause some issues. I was able to circumvent this problem by having the HPC cluster administrator grant me write permission to the the graphs directory, but it would be great if the graph directory wasn't hardcoded in the script. Thank you in advance for your help!

Why my NA12878 test result is not same with NA12878_example_output_G.txt?

This is my command.
~/HLA-LA/src/HLA-LA.pl --BAM NA12878.mini.cram --graph PRG_MHC_GRCh38_withIMGT --sampleID NA12878 --maxThreads 40

This is my test result.
R1_bestguess_G.txt

Issue with bowtie2

Hello!

Excuse me for possible English mistakes. I've got an issue trying to make HLA typing with HLA-LA on NGS samples, which were aligned to hg38 reference using bowtie2, and HLA-LA gives me following mistake

Have found no compatible reference specifications in somepathonserver/HLA-PRG-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT/knownReferences - create a file for this BAM file and try again. at somepathonserver/HLA-PRG-LA/src/HLA-PRG-LA.pl line 267.

And if I perform HLA typing with HLA-LA on the same samples, which were aligned to the same reference using BWA and it always works correct. Is there any known solution for this problem? I changed chromosome naming from 1 2 3 etc which is used by bowtie2 to chr1 chr2 chr3 used by BWA with samtools reheader but that didn't work out.

Installing HLA*LA with bug

/usr/bin/ld: BFD version 2.20.51.0.2-5.42.el6 20100205 internal error, aborting at reloc.c line 443 in bfd_get_reloc_size

/usr/bin/ld: Please report this bug.

collect2: error: ld returned 1 exit status
make: *** [makefile:109: HLA-LA] Error 1

segmentation fault

I'm getting a segmentation error when I ran HLA-LA at the --action HLA step. The stdout just says run was unsuccessful. If I run the part ./bin/HLA-LA --action HLA... specifically, I get that it failed, it just outputs Segmentation fault. This is with my own bam file. I cannot get the example NA12878.mini.cram to work, as it couldn't even get past the picard step, there I got the SAM validation error bin field of BAM record does not equal value computed based on alignment start and end, and length of sequence to which read is aligned.

Tool versions
HLA-LA 1.0.1
samtools 1.10
picard 1.123
bwa 0.7.17
bamtools 2.5.1
boost 1.59.0

Log

Identified paths:
	samtools_bin: /opt/samtools/bin/samtools
	bwa_bin: /opt/bwa
	java_bin: /usr/bin/java
	picard_sam2fastq_bin: /opt/picard_tools/SamToFastq.jar
	General working directory: /hla-la
	Sample-specific working directory: /hla-la/test

Extract reads from 2 regions...
Extract unmapped reads...
Merging...
[bam_translate] RG tag "1" on read "A00609:45:HW72GDSXX:3:1101:28800:29105" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
[bam_translate] PG tag "MarkDuplicates" on read "A00609:45:HW72GDSXX:3:1101:28800:29105" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.
Indexing...
Extract FASTQ...
	/usr/bin/java -Xmx10g -XX:-UseGCOverheadLimit -jar /opt/picard_tools/SamToFastq.jar VALIDATION_STRINGENCY=LENIENT I=/hla-la/test/extraction.bam F=/hla-la/test/R_1.fastq F2=/hla-la/test/R_2.fastq FU=/hla-la/test/R_U.fastq 2>&1

Now executing:
../bin/HLA-LA --action HLA --maxThreads 1 --sampleID test --outputDirectory /hla-la/test --PRG_graph_dir /opt/HLA-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQU /hla-la/test/R_U.fastq.splitLongReads --FASTQ1 /hla-la/test/R_1.fastq --FASTQ2 /hla-la/test/R_2.fastq --bwa_bin /opt/bwa --samtools_bin /opt/samtools/bin/samtools --mapAgainstCompleteGenome 1 --longReads 0
Set maxThreads to 1
HLA-LA execution not successful. Command was ../bin/HLA-LA --action HLA --maxThreads 1 --sampleID test --outputDirectory /hla-la/test --PRG_graph_dir /opt/HLA-LA/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQU /hla-la/test/R_U.fastq.splitLongReads --FASTQ1 /hla-la/test/R_1.fastq --FASTQ2 /hla-la/test/R_2.fastq --bwa_bin /opt/bwa --samtools_bin /opt/samtools/bin/samtools --mapAgainstCompleteGenome 1 --longReads 0

BWA error

Hi,

I have been trying to use HLA-LA for illumina standard WES data. Unfortunately, I'm running into a BWA error after the edge path generation:

[ Wed Jun 24 19:10:29 2020 ] extensionAligner::extensionAligner(..): Indexing of edge paths; have 16804 completed edge paths.

$PATH/HLALA/bin/bwa mem -t15 -M -a /$PATH/HLALA/opt/hla-la/src/../graphs/PRG_MHC_GRCh38_withIMGT/extendedReferenceGenome/extendedReferenceGenome.fa /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_1.fastq /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_2.fastq | $PATH/HLALA/bin/samtools view -Sb - > /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/remapped_with_a.bam.unsorted
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 776266 sequences (110622870 bp)...
[E::bns_fetch_seq] begin=4104638566, mid=4104638567, end=3150488291, len=954150275, seq=0x7ff48c834010, rid=1141, far_beg=3150476765, far_end=3150488291
bwa: bntseq.c:444: bns_fetch_seq: Assertion `seq && *end - *beg == len' failed.

$PATH/HLALA/bin/samtools sort -o /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/remapped_with_a.bam /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/remapped_with_a.bam.unsorted
$PATH/HLALA/bin/samtools index /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/remapped_with_a.bam
 [ Wed Jun 24 19:10:56 2020 ] Remapping done.

processBAM::initBAM(..): Will examine 914 PRG-mapped intervals.
processBAM::extractSeeds(): examined 0 reads, transformed into 0 seeds.
processBAM::estimateInsertSize(): Deal 0 total proto-seeds (i.e. read pairs), of which 0 are incomplete.
HLA-LA: mapper/processBAM.cpp:1043: std::pair<double, double> mapper::processBAM::calculateInsertSizeFromHistogram(const std::map<int, double>&, bool): Assertion `set_weighted_80 && set_weighted_20 && set_median' failed.

HLA-LA execution not successful. Command was ../bin/HLA-LA --action HLA --maxThreads 15 --sampleID I19D040a04 --outputDirectory /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04 --PRG_graph_dir /$PATH/HLALA/opt/hla-la/src/../graphs/PRG_MHC_GRCh38_withIMGT --FASTQU /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_U.fastq.splitLongReads --FASTQ1 /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_1.fastq --FASTQ2 /scratch/leonb/exomes/mapped_grch38/typing/I19D040a04/R_2.fastq --bwa_bin $PATH/HLALA/bin/bwa --samtools_bin $PATH/HLALA/bin/samtools --mapAgainstCompleteGenome 1 --longReads 0

Thanks and best,
Leon

Data package website not found

Hi Alex,

I was trying to download the data package for HLA-PRG-LA http://www.well.ox.ac.uk/PRG_MHC_GRCh38_withIMGT.tar.gz and unfortunately got this message:

Resolving www.well.ox.ac.uk (www.well.ox.ac.uk)... 129.67.44.50
Connecting to www.well.ox.ac.uk (www.well.ox.ac.uk)|129.67.44.50|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2017-12-07 01:01:03 ERROR 404: Not Found.

It seems that the link to the data package is no longer available. (The same issue happens to the link of HLA-PRG data package http://www.well.ox.ac.uk/HLA-PRG.tar.gz )
Could you please update the data package link?

Discordant results from the same person.

Hi.

I've got discordant results from different tissues of the same person using HLA-LA.
The results are all the same except in DPA1 region: one is DPA102:02:02 and the other is DPA101:11.
Though average coverage is lowest in DPA1 region than other regions, it's above 30X.

I attached the result files.
AA1_P3_R1_bestguess_G.txt
AA1_BM_R1_bestguess_G.txt

I investigated the intermediate files and found that there are summary stats prioritizing likely pairs in "R1_pileup_DPA1.txt" file.

ClusterID P LL Mismatches_avg
DPA101:03:01:01;DPA101:03:01:02;DPA101:03:01:03;DPA101:03:01:04;DPA101:03:01:05/DPA102:02:02 1 -97.8373 143
DPA101:03:01:01;DPA101:03:01:02;DPA101:03:01:03;DPA101:03:01:04;DPA101:03:01:05/DPA102:05 1.88852e-21 -145.556 151
DPA101:03:01:01;DPA101:03:01:02;DPA101:03:01:03;DPA101:03:01:04;DPA101:03:01:05/DPA102:02:06 4.64625e-24 -151.563 151.5
DPA101:03:01:01;DPA101:03:01:02;DPA101:03:01:03;DPA101:03:01:04;DPA101:03:01:05/DPA102:04 4.61435e-25 -153.873 152
DPA101:03:04/DPA102:02:02 1.73353e-32 -170.97 155.5
DPA101:12/DPA102:02:02 7.28223e-38 -183.35 153
DPA101:11/DPA102:02:02 1.9045e-42 -193.902 141
DPA101:10/DPA102:02:02 4.47384e-52 -216.073 159
DPA101:03:04/DPA102:05 3.27271e-53 -218.689 163.5
DPA101:03:04/DPA102:02:06 8.0459e-56 -224.697 164
DPA101:03:04/DPA102:04 7.99056e-57 -227.006 164.5
......

I also attached "R1_pileup_DPA1.txt" files.
AA1_BM_R1_PP_DPA1_pairs.txt
AA1_P3_R1_PP_DPA1_pairs.txt

Can you explain the meaning of the column names (ClusterID, P, LL, and Mismatches_avg) and possible answers for the discordant results?

Thank you!

bowtie2 is needed but not listed

root@5f5e7a4b6c25:/outputs/soft/HLA-PRG-LA# ./bin/HLA-PRG-LA --bwa_bin $BWA --samtools_bin /soft/samtools/bin/samtools --BAM /outputs/NA12878.cram --graph ../graphs/PRG_MHC_GRCh38_withIMGT --sampleID NA12878_serge --maxThreads 8 --workingDir ./
....
Directory: tmp/simulatedGraphs/G1
[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.02 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.01 sec
[main] Version: 0.7.13-r1126
[main] CMD: /soft/bwa.kit/bwa index tmp/simulatedGraphs/G1/referenceGenome/ref.fa
[main] Real time: 0.047 sec; CPU: 0.032 sec
terminate called after throwing an instance of 'std::runtime_error'
  what():  Utilities::findFileFromAlternatives(): No alternative present from C:/Users/AlexanderDilthey/bowtie2-2.2.8, /home/dilthey/bowtie2-2.2.8
Aborted (core dumped)

But bowtie2 is not listed in Prerequisites! And I can't set it's path through an argument! And it is not used from PATH!

How to get help?

How to get help from the program? I have tried:

root@5f5e7a4b6c25:/outputs/PRG_MHC_GRCh38_withIMGT# /soft/HLA-PRG-LA/bin/HLA-PRG-LA --help
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check
Aborted (core dumped)
root@5f5e7a4b6c25:/outputs/PRG_MHC_GRCh38_withIMGT# /soft/HLA-PRG-LA/bin/HLA-PRG-LA -h
terminate called after throwing an instance of 'std::runtime_error'
  what():  Please specify arguments --bwa_bin
Aborted (core dumped)
root@5f5e7a4b6c25:/outputs/PRG_MHC_GRCh38_withIMGT# /soft/HLA-PRG-LA/bin/HLA-PRG-LA help
terminate called after throwing an instance of 'std::runtime_error'
  what():  Please specify arguments --bwa_bin
Aborted (core dumped)
root@5f5e7a4b6c25:/outputs/PRG_MHC_GRCh38_withIMGT# /soft/HLA-PRG-LA/bin/HLA-PRG-LA
terminate called after throwing an instance of 'std::runtime_error'
  what():  Please specify arguments --bwa_bin
Aborted (core dumped)

What is wrong?

when I Compilation I get Error

Very slow process (5 hours)

Hello,

I'm using HLA-LA and everything seems to be fine (so far, we processed 50 exomes). We run tests with NA12878, NA12155, NA19128, NA11892, NA19127, NA19700, NA12400 (samples sequenced here with Illumina) and predictions considering 4 digits were good (the HLA-DRB3/4 were the most problematic as described in the README of this repo).

The problem is: there is one sample that does not finish the process... when I inspect its stdout I found that it took ~5 hours for one step, as pasted below:

processBAM::alignReads_postSeedExtraction_andStoreInto(): Deal 10000 total read pairs with seeds, of which 1676 are incomplete.
 [ Sun Mar  8 00:36:27 2020 ] Read pair 0 of 8324
 [ Sun Mar  8 00:37:22 2020 ] .. done. Processed  112339 in total (total read IDs: 516844.
 [ Sun Mar  8 00:37:22 2020 ] Process 14/51
 [ Sun Mar  8 00:37:22 2020 ] 	Start seed extraction
 [ Sun Mar  8 00:37:36 2020 ] 			Done extractSeeds2
processBAM::extractSeeds2(): examined 105480 reads, transformed into 10000 seeds.
 [ Sun Mar  8 00:37:36 2020 ] 	Alignment
 [ Sun Mar  8 00:37:36 2020 ] Proto-seed statistics:
	Incomplete: 1679
	Complete: 8321
		Average chains per read: 5.44021
		Average chain length: 98.9056
		Average primary chain length: 99.1694

processBAM::alignReads_postSeedExtraction_andStoreInto(): Deal 10000 total read pairs with seeds, of which 1679 are incomplete.
 [ Sun Mar  8 00:37:36 2020 ] Read pair 0 of 8321
 [ Sun Mar  8 05:41:04 2020 ] .. done. Processed  120660 in total (total read IDs: 516844.
 [ Sun Mar  8 05:41:04 2020 ] Process 15/51
 [ Sun Mar  8 05:41:04 2020 ] 	Start seed extraction
 [ Sun Mar  8 05:41:18 2020 ] 			Done extractSeeds2
processBAM::extractSeeds2(): examined 102128 reads, transformed into 10000 seeds.
 [ Sun Mar  8 05:41:18 2020 ] 	Alignment
 [ Sun Mar  8 05:41:18 2020 ] Proto-seed statistics:
	Incomplete: 1722
	Complete: 8278
		Average chains per read: 5.25562
		Average chain length: 99.2039
		Average primary chain length: 99.1174

had anybody seen this problem already?

Here is the whole stdout file. I killed the process because it was not going forward.

stdout.txt

Software versions:

hla_la_version 1.0.1
samtools_version 1.9
bwa_version 0.7.17
picard_version 1.123
bamtools_version 2.5.1

Edit: the analysis took 22 hours to successfully end.

No compatible reference

Hello

I'm having issues with "no compatible reference" as I think the reference my CRAM files are aligned to is slightly non standard. I wonder if you could generate an additional known reference file from my samtools index?

Many thanks
Seth
idxstats.txt

diltheylab / hla-la Goto Github PK

hla-la's People

Contributors

Stargazers

Watchers

Forkers

hla-la's Issues

Recommend Projects

Recommend Topics

Recommend Org