roseorenbuch / arcashla-quant Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 1.0 7.54 MB

License: GNU General Public License v3.0

Shell 2.43% Python 95.86% Dockerfile 1.70%

arcashla-quant's People

Contributors

Stargazers

Watchers

Forkers

andreyto

arcashla-quant's Issues

Quantification throwing error - missing reference?

I'm trying to get the allele specific expression of arcasHLA-quant to work. So far I've used arcasHLA to extract all reads that map to chr6, and genotyped my sample using genotype (from the original arcasHLA repo).

I'm now trying to run the quantification command of arcasHLA-quant. I've tried this by running

arcasHLA quant -o /my_output_folder/ /extracted_fastqs/sample1.extracted.1.fq.gz /extracted_fastqs/sample1.extracted.2.fq.gz

This results in the following error:

Traceback (most recent call last): File "/home/arcasHLA-quant/scripts/quant.py", line 178, in <module> indv_idx = args.ref + '.idx' TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Looking at the quant.py script, it seems to me that this is because I did not supply a --ref flag, and the algorithm then tries to do None + '.idx'.

My first question: what file do I need to provide for the --ref flag and how can I create this?

I'm guessing this should come from the quant_ref command referenced in the readme, however, it appears as though this has been renamed to customize? I've tried running customize instead of quant_ref as described in the readme:

arcasHLA customize ~/arcas/sample1.genotype.json -o ~/arcas/ref

However, this results in the following error:

usage: arcasHLA customize [options] arcasHLA customize: error: unrecognized arguments: /arcas/sample1.genotype.json

/arcas/sample1.genotype.json is the output from my genotyping call.

Can you advise? In particular, I am confused about whether or not I need a reference and what the purpose of this is? Also, if I have multiple samples, do I need to create one reference per sample or one reference for all samples?

Puzzled results of genotype and quant

Hi，

I firstly ran "arcasHLA genotype" and got a typing result:
"A": ["A*01:210:02", "A*26:130"]

Then I ran "customize" and "qunat", the quant result for the heterozygous alleles of geneA is:

gene	allele1	allele2	allele1_count	allele2_count	allele2_tpm	allele1_tpm	baf
A	A*01:210	A*26:130	0	6014	28402		0

So if the count for allele1 is 0, why it can be genotyped instead of Homozygous ?

Thanks!

Error thrown when running quant

Hello, thanks for this tool. This issue was first described in #6 . I was wondering if there was any update on davetangs error and why it might be occurring? I'm getting the same error thrown when running the quantification step.

Traceback (most recent call last):
  File "scripts/quant.py", line 300, in <module>
    baf = allele_results[gene]['allele1_count'] / (allele_results[gene]['allele1_count'] + allele_results[gene]['allele2_count'])  
ZeroDivisionError: division by zero

Thanks again

Which arcasHLA repo?

Hi there,

I was just wondering what the difference is between this repository and https://github.com/RabadanLab/arcasHLA?

Thanks!
Dave

Issues with arcasHLA customize & quant

Hi all,

I've been trying to run some of my samples through the arcasHLA toolkit and I've been encountering some errors when attempting to run my samples using both the customize and quant steps. For both, I've received the following traceback:

Traceback (most recent call last):
  File "/rnd/users/sdea/XXX/software/arcasHLA/scripts/customize.py", line 299, in <module>
    build_custom_reference(subject, genotype, args.grouping, args.transcriptome, temp)
  File "/rnd/users/sdea/XXX/software/arcasHLA/scripts/customize.py", line 77, in build_custom_reference
    dummy_HLA_dict = SeqIO.to_dict(SeqIO.parse(dummy_HLA_fa, 'fasta')) 
  File "/home/sdea/anaconda3/envs/arcashla_env/lib/python3.6/site-packages/Bio/SeqIO/__init__.py", line 627, in parse
    i = iterator_generator(handle)
  File "/home/sdea/anaconda3/envs/arcashla_env/lib/python3.6/site-packages/Bio/SeqIO/FastaIO.py", line 181, in __init__
    super().__init__(source, alphabet=alphabet, mode="t", fmt="Fasta")
  File "/home/sdea/anaconda3/envs/arcashla_env/lib/python3.6/site-packages/Bio/SeqIO/Interfaces.py", line 42, in __init__
    self.stream = open(source, "r" + mode)
FileNotFoundError: [Errno 2] No such file or directory: '/rnd/users/sdea/XXX/software/arcasHLA/scripts/../dat/ref/GRCh38.chr6.HLA.fasta'

Following the path to where this file is supposed to be in the arcasHLA directory shows that there is no such file present. Am I missing a step prior to generate this file?

Additionally, the "wget https://www.dropbox.com/s/pcmwc4o1dxw7owk/arcasHLA-quant_reference.zip" returns an error stating that the dropbox location has since been deleted.

Thank you for your time!

Different genotypes acquired from arcasHLA and Optitype

I input my original bam file (which have the whole genome mapped reads) to arcasHLA and input chr6-splited bam file to optitype for genotyping. I notice that arcasHLA first splits the chr6 bam and then goes for the rest. So the input of arcasHLA and optitype should be the same.

But when I got genotyping results, I found they are largely different.

I'll paste one of my sample results as an example:

From arcasHLA:
Sample1: HLA-A\*24:361, HLA-A\*24:361,HLA-B\*59:10N, HLA-B\*59:01:01,HLA-C\*01:185Q, HLA-C\*01:185Q

From Optitype:
Sample1: A*02:06 A*24:02 B*51:01 B*59:01 C*01:02 C*14:02

I'm not very familiar with programming and HLA genotyping. I'm not sure what cause these differences. Any suggestions?

BTW, should the first four number extracted from 6-digit genotype equals the 4-digit genotye for the same patient?

Thank you.

Quant command not available

Hello,

The quant command doesn't appear to be available to run from either the conda install or github clone of arcasHLA

Initially I installed using conda install arcas-hla to my current python environment. Then ran extraction and genotyping successfully on 42 BAM files. Finally, I merged all typed results to genotypes.tsv

Now i'd like to quantify the allele reads for each HLA type in each sample but only the following commands are available:
` Usage: arcasHLA [options]

extract        extracts chromosome 6 reads from bam
genotype       types HLA genes from extracted reads
partial        types partial HLA genes from extracted reads
merge          processes results into a tab-separated table
convert        converts HLA nomenclature/resolution
reference      check or update HLA reference

Note: run any command with --help to view required fields, options `

If I clone this GH repo I can see that there is a quant script in the scripts dir but can't seem to make it work manually by doing python scripts/quant.py ../genotypes.tsv

(The error msg is:
Traceback (most recent call last): File "arcasHLA/scripts/quant.py", line 178, in <module> indv_idx = args.ref + '.idx' TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' )

Please could you explain how to run the quant / quant_ref commands of this package? Or perhaps update the readme to be slightly clearer?
@tpereachamblee

Problems running `customize`: cannot create dummy HLA dict?

I have no issue running extract and genotype as part of my Snakemake pipeline. I am now testing adding quant to it.

My testing environment is a simple mamba env with only arcasHLA and its requirements. git-lfs is installed and up to date, accessible everywhere including the environment.

I got this error when trying to run customize according to how it's suggested in #6 .

$arcasHLA customize --transcriptome chr6 --genotype A0508148.genotype.json -o ./ref
None
{<genotyped output removed for privacy>}
Traceback (most recent call last):
  File "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/scripts/customize.py", line 317, in <module>
    build_custom_reference(subject, genotype, args.grouping, args.transcriptome, temp)
  File "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/scripts/customize.py", line 92, in build_custom_reference
    transcriptome.append(dummy_HLA_dict[transcript])

I went ahead and took a look at your code and what's going on at this line.
I ran the offending line from the error manually:

$ python
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) 
[GCC 9.4.0] on linux

$ from Bio import SeqIO
$ dummy_hla_fa = "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/dat/ref/GRCh38.chr6.HLA.fasta"
$ dummy_HLA_dict = SeqIO.to_dict(SeqIO.parse(dummy_hla_fa, 'fasta')) 
$ dummy_HLA_dict
{}

So only an empty dictionary is returned. Maybe it's an issue with git-lfs but as I said I made sure to check that it's installed and working, and I had no issue genotyping.

The fasta file in question looks like this:

$ cat ~/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/dat/ref/GRCh38.chr6.HLA.fasta 
version https://git-lfs.github.com/spec/v1
oid sha256:b0df533500a0b7418214bb36a05368155148415f1a557f0a05dde78afb248b00
size 1693517

Any thoughts on what could be going wrong?

Analyze_Reads() Function No Defined

I was running this tool from the extraction step through the quantification step and I noticed that when trying to run arcasHLA quant, I received the following error:

arcasHLA quant --ref XXX -t 8 XXX.extracted.fq.gz Traceback (most recent call last): File "/rnd/users/sdea/software/arcasHLA/scripts/quant.py", line 162, in <module> num, avg, std = analyze_reads(args.file, paired, reads_file, False) NameError: name 'analyze_reads' is not defined

After looking into the source code to see if perhaps the function declaration was commented out, I noticed that it is seemingly not declared in quant.py nor in arcas_utilities.

roseorenbuch / arcashla-quant Goto Github PK

arcashla-quant's People

Contributors

Stargazers

Watchers

Forkers

arcashla-quant's Issues

Quantification throwing error - missing reference?

Puzzled results of genotype and quant

Error thrown when running quant

Which arcasHLA repo?

Issues with arcasHLA customize & quant

Different genotypes acquired from arcasHLA and Optitype

Quant command not available

Problems running `customize`: cannot create dummy HLA dict?

Analyze_Reads() Function No Defined

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent