roseorenbuch / arcashla-quant Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
I'm trying to get the allele specific expression of arcasHLA-quant to work. So far I've used arcasHLA to extract all reads that map to chr6, and genotyped my sample using genotype
(from the original arcasHLA repo).
I'm now trying to run the quantification command of arcasHLA-quant. I've tried this by running
arcasHLA quant -o /my_output_folder/ /extracted_fastqs/sample1.extracted.1.fq.gz /extracted_fastqs/sample1.extracted.2.fq.gz
This results in the following error:
Traceback (most recent call last): File "/home/arcasHLA-quant/scripts/quant.py", line 178, in <module> indv_idx = args.ref + '.idx' TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Looking at the quant.py
script, it seems to me that this is because I did not supply a --ref
flag, and the algorithm then tries to do None + '.idx'
.
My first question: what file do I need to provide for the --ref flag and how can I create this?
I'm guessing this should come from the quant_ref
command referenced in the readme, however, it appears as though this has been renamed to customize
? I've tried running customize
instead of quant_ref
as described in the readme:
arcasHLA customize ~/arcas/sample1.genotype.json -o ~/arcas/ref
However, this results in the following error:
usage: arcasHLA customize [options] arcasHLA customize: error: unrecognized arguments: /arcas/sample1.genotype.json
/arcas/sample1.genotype.json
is the output from my genotyping call.
Can you advise? In particular, I am confused about whether or not I need a reference and what the purpose of this is? Also, if I have multiple samples, do I need to create one reference per sample or one reference for all samples?
Hi,
I firstly ran "arcasHLA genotype" and got a typing result:
"A": ["A*01:210:02", "A*26:130"]
Then I ran "customize" and "qunat", the quant result for the heterozygous alleles of geneA is:
gene | allele1 | allele2 | allele1_count | allele2_count | allele2_tpm | allele1_tpm | baf |
---|---|---|---|---|---|---|---|
A | A*01:210 | A*26:130 | 0 | 6014 | 28402 | 0 |
So if the count for allele1 is 0, why it can be genotyped instead of Homozygous ?
Thanks!
Hello, thanks for this tool. This issue was first described in #6 . I was wondering if there was any update on davetangs error and why it might be occurring? I'm getting the same error thrown when running the quantification step.
Traceback (most recent call last):
File "scripts/quant.py", line 300, in <module>
baf = allele_results[gene]['allele1_count'] / (allele_results[gene]['allele1_count'] + allele_results[gene]['allele2_count'])
ZeroDivisionError: division by zero
Thanks again
Hi there,
I was just wondering what the difference is between this repository and https://github.com/RabadanLab/arcasHLA?
Thanks!
Dave
Hi all,
I've been trying to run some of my samples through the arcasHLA toolkit and I've been encountering some errors when attempting to run my samples using both the customize and quant steps. For both, I've received the following traceback:
Traceback (most recent call last):
File "/rnd/users/sdea/XXX/software/arcasHLA/scripts/customize.py", line 299, in <module>
build_custom_reference(subject, genotype, args.grouping, args.transcriptome, temp)
File "/rnd/users/sdea/XXX/software/arcasHLA/scripts/customize.py", line 77, in build_custom_reference
dummy_HLA_dict = SeqIO.to_dict(SeqIO.parse(dummy_HLA_fa, 'fasta'))
File "/home/sdea/anaconda3/envs/arcashla_env/lib/python3.6/site-packages/Bio/SeqIO/__init__.py", line 627, in parse
i = iterator_generator(handle)
File "/home/sdea/anaconda3/envs/arcashla_env/lib/python3.6/site-packages/Bio/SeqIO/FastaIO.py", line 181, in __init__
super().__init__(source, alphabet=alphabet, mode="t", fmt="Fasta")
File "/home/sdea/anaconda3/envs/arcashla_env/lib/python3.6/site-packages/Bio/SeqIO/Interfaces.py", line 42, in __init__
self.stream = open(source, "r" + mode)
FileNotFoundError: [Errno 2] No such file or directory: '/rnd/users/sdea/XXX/software/arcasHLA/scripts/../dat/ref/GRCh38.chr6.HLA.fasta'
Following the path to where this file is supposed to be in the arcasHLA directory shows that there is no such file present. Am I missing a step prior to generate this file?
Additionally, the "wget https://www.dropbox.com/s/pcmwc4o1dxw7owk/arcasHLA-quant_reference.zip" returns an error stating that the dropbox location has since been deleted.
Thank you for your time!
I input my original bam file (which have the whole genome mapped reads) to arcasHLA and input chr6-splited bam file to optitype for genotyping. I notice that arcasHLA first splits the chr6 bam and then goes for the rest. So the input of arcasHLA and optitype should be the same.
But when I got genotyping results, I found they are largely different.
I'll paste one of my sample results as an example:
From arcasHLA:
Sample1: HLA-A\*24:361, HLA-A\*24:361,HLA-B\*59:10N, HLA-B\*59:01:01,HLA-C\*01:185Q, HLA-C\*01:185Q
From Optitype:
Sample1: A*02:06 A*24:02 B*51:01 B*59:01 C*01:02 C*14:02
I'm not very familiar with programming and HLA genotyping. I'm not sure what cause these differences. Any suggestions?
BTW, should the first four number extracted from 6-digit genotype equals the 4-digit genotye for the same patient?
Thank you.
Hello,
The quant command doesn't appear to be available to run from either the conda install or github clone of arcasHLA
Initially I installed using conda install arcas-hla
to my current python environment. Then ran extraction and genotyping successfully on 42 BAM files. Finally, I merged all typed results to genotypes.tsv
Now i'd like to quantify the allele reads for each HLA type in each sample but only the following commands are available:
` Usage: arcasHLA [options]
extract extracts chromosome 6 reads from bam
genotype types HLA genes from extracted reads
partial types partial HLA genes from extracted reads
merge processes results into a tab-separated table
convert converts HLA nomenclature/resolution
reference check or update HLA reference
Note: run any command with --help to view required fields, options `
If I clone this GH repo I can see that there is a quant script in the scripts dir but can't seem to make it work manually by doing python scripts/quant.py ../genotypes.tsv
(The error msg is:
Traceback (most recent call last): File "arcasHLA/scripts/quant.py", line 178, in <module> indv_idx = args.ref + '.idx' TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
)
Please could you explain how to run the quant / quant_ref commands of this package? Or perhaps update the readme to be slightly clearer?
@tpereachamblee
I have no issue running extract
and genotype
as part of my Snakemake pipeline. I am now testing adding quant
to it.
My testing environment is a simple mamba env with only arcasHLA
and its requirements. git-lfs
is installed and up to date, accessible everywhere including the environment.
I got this error when trying to run customize
according to how it's suggested in #6 .
$arcasHLA customize --transcriptome chr6 --genotype A0508148.genotype.json -o ./ref
None
{<genotyped output removed for privacy>}
Traceback (most recent call last):
File "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/scripts/customize.py", line 317, in <module>
build_custom_reference(subject, genotype, args.grouping, args.transcriptome, temp)
File "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/scripts/customize.py", line 92, in build_custom_reference
transcriptome.append(dummy_HLA_dict[transcript])
I went ahead and took a look at your code and what's going on at this line.
I ran the offending line from the error manually:
$ python
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:21)
[GCC 9.4.0] on linux
$ from Bio import SeqIO
$ dummy_hla_fa = "/home/a.vliet/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/dat/ref/GRCh38.chr6.HLA.fasta"
$ dummy_HLA_dict = SeqIO.to_dict(SeqIO.parse(dummy_hla_fa, 'fasta'))
$ dummy_HLA_dict
{}
So only an empty dictionary is returned. Maybe it's an issue with git-lfs
but as I said I made sure to check that it's installed and working, and I had no issue genotyping.
The fasta file in question looks like this:
$ cat ~/miniconda3/envs/arcas/share/arcas-hla-0.5.0-3/dat/ref/GRCh38.chr6.HLA.fasta
version https://git-lfs.github.com/spec/v1
oid sha256:b0df533500a0b7418214bb36a05368155148415f1a557f0a05dde78afb248b00
size 1693517
Any thoughts on what could be going wrong?
I was running this tool from the extraction step through the quantification step and I noticed that when trying to run arcasHLA quant, I received the following error:
arcasHLA quant --ref XXX -t 8 XXX.extracted.fq.gz Traceback (most recent call last): File "/rnd/users/sdea/software/arcasHLA/scripts/quant.py", line 162, in <module> num, avg, std = analyze_reads(args.file, paired, reads_file, False) NameError: name 'analyze_reads' is not defined
After looking into the source code to see if perhaps the function declaration was commented out, I noticed that it is seemingly not declared in quant.py nor in arcas_utilities.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.