Code Monkey home page Code Monkey logo

arcashla-quant's Issues

Quantification throwing error - missing reference?

I'm trying to get the allele specific expression of arcasHLA-quant to work. So far I've used arcasHLA to extract all reads that map to chr6, and genotyped my sample using genotype (from the original arcasHLA repo).

I'm now trying to run the quantification command of arcasHLA-quant. I've tried this by running

arcasHLA quant -o /my_output_folder/ /extracted_fastqs/sample1.extracted.1.fq.gz /extracted_fastqs/sample1.extracted.2.fq.gz

This results in the following error:

Traceback (most recent call last): File "/home/arcasHLA-quant/scripts/quant.py", line 178, in <module> indv_idx = args.ref + '.idx' TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Looking at the quant.py script, it seems to me that this is because I did not supply a --ref flag, and the algorithm then tries to do None + '.idx'.

My first question: what file do I need to provide for the --ref flag and how can I create this?

I'm guessing this should come from the quant_ref command referenced in the readme, however, it appears as though this has been renamed to customize? I've tried running customize instead of quant_ref as described in the readme:

arcasHLA customize ~/arcas/sample1.genotype.json -o ~/arcas/ref

However, this results in the following error:

usage: arcasHLA customize [options] arcasHLA customize: error: unrecognized arguments: /arcas/sample1.genotype.json

/arcas/sample1.genotype.json is the output from my genotyping call.

Can you advise? In particular, I am confused about whether or not I need a reference and what the purpose of this is? Also, if I have multiple samples, do I need to create one reference per sample or one reference for all samples?

Error thrown when running quant

Hello, thanks for this tool. This issue was first described in #6 . I was wondering if there was any update on davetangs error and why it might be occurring? I'm getting the same error thrown when running the quantification step.

Traceback (most recent call last):
  File "scripts/quant.py", line 300, in <module>
    baf = allele_results[gene]['allele1_count'] / (allele_results[gene]['allele1_count'] + allele_results[gene]['allele2_count'])  
ZeroDivisionError: division by zero

Thanks again

Different genotypes acquired from arcasHLA and Optitype

I input my original bam file (which have the whole genome mapped reads) to arcasHLA and input chr6-splited bam file to optitype for genotyping. I notice that arcasHLA first splits the chr6 bam and then goes for the rest. So the input of arcasHLA and optitype should be the same.

But when I got genotyping results, I found they are largely different.

I'll paste one of my sample results as an example:

From arcasHLA:
Sample1: HLA-A\*24:361, HLA-A\*24:361,HLA-B\*59:10N, HLA-B\*59:01:01,HLA-C\*01:185Q, HLA-C\*01:185Q

From Optitype:
Sample1: A*02:06 A*24:02 B*51:01 B*59:01 C*01:02 C*14:02

I'm not very familiar with programming and HLA genotyping. I'm not sure what cause these differences. Any suggestions?

BTW, should the first four number extracted from 6-digit genotype equals the 4-digit genotye for the same patient?

Thank you.

Puzzled results of genotype and quant

Hi,

I firstly ran "arcasHLA genotype" and got a typing result:
"A": ["A*01:210:02", "A*26:130"]

Then I ran "customize" and "qunat", the quant result for the heterozygous alleles of geneA is:

gene allele1 allele2 allele1_count allele2_count allele2_tpm allele1_tpm baf
A A*01:210 A*26:130 0 6014 28402   0

So if the count for allele1 is 0, why it can be genotyped instead of Homozygous ?

Thanks!

Problems running `customize`: cannot create dummy HLA dict?

I have no issue running extract and genotype as part of my Snakemake pipeline. I am now testing adding quant to it.

My testing environment is a simple mamba env with only arcasHLA and its requirements. git-lfs is installed and up to date, accessible everywhere including the environment.

I got this error when trying to run customize according to how it's suggested in #6 .

$arcasHLA customize --transcriptome chr6 --genotype A0508148.genotype.json -o ./ref
None
{<genotyped output removed for privacy>}
Traceback (most recent call last):
  File "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/scripts/customize.py", line 317, in <module>
    build_custom_reference(subject, genotype, args.grouping, args.transcriptome, temp)
  File "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/scripts/customize.py", line 92, in build_custom_reference
    transcriptome.append(dummy_HLA_dict[transcript])

I went ahead and took a look at your code and what's going on at this line.
I ran the offending line from the error manually:

$ python
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21) 
[GCC 9.4.0] on linux

$ from Bio import SeqIO
$ dummy_hla_fa = "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/dat/ref/GRCh38.chr6.HLA.fasta"
$ dummy_HLA_dict = SeqIO.to_dict(SeqIO.parse(dummy_hla_fa, 'fasta')) 
$ dummy_HLA_dict
{}

So only an empty dictionary is returned. Maybe it's an issue with git-lfs but as I said I made sure to check that it's installed and working, and I had no issue genotyping.

The fasta file in question looks like this:

$ cat ~/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/dat/ref/GRCh38.chr6.HLA.fasta 
version https://git-lfs.github.com/spec/v1
oid sha256:b0df533500a0b7418214bb36a05368155148415f1a557f0a05dde78afb248b00
size 1693517

Any thoughts on what could be going wrong?

Quant command not available

Hello,

The quant command doesn't appear to be available to run from either the conda install or github clone of arcasHLA

Initially I installed using conda install arcas-hla to my current python environment. Then ran extraction and genotyping successfully on 42 BAM files. Finally, I merged all typed results to genotypes.tsv

Now i'd like to quantify the allele reads for each HLA type in each sample but only the following commands are available:
` Usage: arcasHLA [options]

extract        extracts chromosome 6 reads from bam
genotype       types HLA genes from extracted reads
partial        types partial HLA genes from extracted reads
merge          processes results into a tab-separated table
convert        converts HLA nomenclature/resolution
reference      check or update HLA reference

Note: run any command with --help to view required fields, options `

If I clone this GH repo I can see that there is a quant script in the scripts dir but can't seem to make it work manually by doing python scripts/quant.py ../genotypes.tsv

(The error msg is:
Traceback (most recent call last): File "arcasHLA/scripts/quant.py", line 178, in <module> indv_idx = args.ref + '.idx' TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' )

Please could you explain how to run the quant / quant_ref commands of this package? Or perhaps update the readme to be slightly clearer?
@tpereachamblee

Issues with arcasHLA customize & quant

Hi all,

I've been trying to run some of my samples through the arcasHLA toolkit and I've been encountering some errors when attempting to run my samples using both the customize and quant steps. For both, I've received the following traceback:

Traceback (most recent call last):
  File "/rnd/users/sdea/XXX/software/arcasHLA/scripts/customize.py", line 299, in <module>
    build_custom_reference(subject, genotype, args.grouping, args.transcriptome, temp)
  File "/rnd/users/sdea/XXX/software/arcasHLA/scripts/customize.py", line 77, in build_custom_reference
    dummy_HLA_dict = SeqIO.to_dict(SeqIO.parse(dummy_HLA_fa, 'fasta')) 
  File "/home/sdea/anaconda3/envs/arcashla_env/lib/python3.6/site-packages/Bio/SeqIO/__init__.py", line 627, in parse
    i = iterator_generator(handle)
  File "/home/sdea/anaconda3/envs/arcashla_env/lib/python3.6/site-packages/Bio/SeqIO/FastaIO.py", line 181, in __init__
    super().__init__(source, alphabet=alphabet, mode="t", fmt="Fasta")
  File "/home/sdea/anaconda3/envs/arcashla_env/lib/python3.6/site-packages/Bio/SeqIO/Interfaces.py", line 42, in __init__
    self.stream = open(source, "r" + mode)
FileNotFoundError: [Errno 2] No such file or directory: '/rnd/users/sdea/XXX/software/arcasHLA/scripts/../dat/ref/GRCh38.chr6.HLA.fasta'

Following the path to where this file is supposed to be in the arcasHLA directory shows that there is no such file present. Am I missing a step prior to generate this file?

Additionally, the "wget https://www.dropbox.com/s/pcmwc4o1dxw7owk/arcasHLA-quant_reference.zip" returns an error stating that the dropbox location has since been deleted.

Thank you for your time!

Analyze_Reads() Function No Defined

I was running this tool from the extraction step through the quantification step and I noticed that when trying to run arcasHLA quant, I received the following error:

arcasHLA quant --ref XXX -t 8 XXX.extracted.fq.gz Traceback (most recent call last): File "/rnd/users/sdea/software/arcasHLA/scripts/quant.py", line 162, in <module> num, avg, std = analyze_reads(args.file, paired, reads_file, False) NameError: name 'analyze_reads' is not defined

After looking into the source code to see if perhaps the function declaration was commented out, I noticed that it is seemingly not declared in quant.py nor in arcas_utilities.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.