mortazavilab / lapa Goto Github PK

View Code? Open in Web Editor NEW

23.0 1.0 13.0 349.98 MB

Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.

Home Page: https://www.biorxiv.org/content/10.1101/2022.11.08.515683v1

Python 2.25% Jupyter Notebook 97.72% R 0.03%

bioinformatics long-reads polyadenylation rna-seq transcription

lapa's People

Contributors

Stargazers

Watchers

Forkers

hoeze sambryce-smith mustafaelshani seungbeom-han weberjoachim fairliereese lipingshu mordziarz smaegol shr4vya leetaiyi whatever60 miaot2901

lapa's Issues

Pychopper

Hi @MuhammedHasan

Would you recommend using pychopper upstream for ONT reads. Just as an additional QC step or do you think it wouldn't make a difference because of the way LAPA looks for PolyA signal?

Mustafa

lapa_correct_talon ValueError: Samples in abundance file do not match with read_annot file.

It seems that lapa_correct_talon is looking for a column named samples in the TALON generates read_annot file. This column is not available, however sample names are in dataset column

AttributeError: module 'numpy' has no attribute 'int'

Following the google colab jupyter notebook, I ran all the code prior to prepare config and gtf and fa successfully however when I ran the following

! lapa  --alignment sample_config.csv \
        --fasta /home/mustafa/projects/ReferenceGenomes/gencode/v41/GRCh38.primary_assembly.genome.fa \
        --annotation gencode.v41.primary_assembly.annotation.utr_fixed.gtf \
        --chrom_sizes gencode.v41.chrom_sizes \
        --output_dir LAPA_PolyAClusterCalling

After a while i get the error below

/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:594: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  df_all = df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py:598: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  _df = _df.groupby(cols).agg('sum').reset_index()
Traceback (most recent call last):
  File "/root/miniconda3/envs/LAPA/bin/lapa", line 8, in <module>
    sys.exit(cli_lapa())
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/main.py", line 112, in cli_lapa
    lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/lapa.py", line 497, in lapa
    _lapa(alignment)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/lapa.py", line 288, in __call__
    df_all_count, sample_counts = self.counting(alignment)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/lapa.py", line 143, in counting
    counter._to_bigwig(df_all_count, sample_counts, self.chrom_sizes,
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py", line 561, in _to_bigwig
    save_count_bw(df_all, output_dir, chrom_sizes, f'all_{prefix}')
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py", line 197, in save_count_bw
    BaseCounter._to_bigwig(df, chrom_sizes, output_dir, prefix)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/count.py", line 153, in _to_bigwig
    bw_from_pyranges(
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/lapa/utils/io.py", line 153, in bw_from_pyranges
    gr['+'].to_bigwig(bw_pos_file, chromosome_sizes=chrom_sizes,
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/pyranges.py", line 5381, in to_bigwig
    result = _to_bigwig(self, path, chromosome_sizes, rpm, divide, value_col, dryrun)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/out.py", line 189, in _to_bigwig
    gr = self.to_rle(rpm=rpm, strand=False, value_col=value_col).to_ranges()
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/pyranges.py", line 5745, in to_rle
    return _to_rle(self, value_col, strand=strand, rpm=rpm, nb_cpu=nb_cpu)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/methods/to_rle.py", line 22, in _to_rle
    result = pyrange_apply_single(coverage, ranges, **kwargs)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/multithreaded.py", line 382, in pyrange_apply_single
    result = call_f_single(function, nparams, df, **kwargs)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyranges/multithreaded.py", line 31, in call_f_single
    return f.remote(df, **kwargs)
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/pyrle/methods.py", line 167, in coverage
    runs, values = _coverage(_df.Position.values, _df.Value.values)
  File "pyrle/src/coverage.pyx", line 67, in pyrle.src.coverage._coverage
  File "/root/miniconda3/envs/LAPA/lib/python3.8/site-packages/numpy/__init__.py", line 284, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'int'

lapa was installed with pip in a new conda environment python=3.8.

Any help would very much be welcomed

Value error need at least one array to concatenate

Hi,
I am running LAPA with the command -
lapa --alignment /data/salomonis-archive/FASTQs/Grimes/RNA/scRNASeq/10X-Genomics/LGCHMC53-17GEX/PacbioPBMC/PacbioPBMC/outs/possorted_genome_bam.bam --fasta hg38.fa --annotation hg38.gtf --chrom_sizes hg38.chrom_sizes --output_dir pbmc_pacbio_1

I am getting this error -
a3-2020/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/main.py", line 112, in cli_lapa
lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/lapa.py", line 288, in call
df_all_count, sample_counts = self.counting(alignment)
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/lapa.py", line 142, in counting
df_all_count, sample_counts = counter.to_df()
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 583, in to_df
df = pd.concat([
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 584, in
self.build_counter(row['path'])
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 142, in to_df
return self.to_gr().df.astype({'Chromosome': 'str', 'Strand': 'str'})
File "/users/raw6jg/.local/lib/python3.8/site-packages/lapa/count.py", line 136, in to_gr
return pr.PyRanges(df).count_overlaps(
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/pyranges.py", line 1322, in count_overlaps
counts = pyrange_apply(_number_overlapping, self, other, **kwargs)
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/multithreaded.py", line 236, in pyrange_apply
result = call_f(function, nparams, df, odf, kwargs)
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/multithreaded.py", line 23, in call_f
return f.remote(df, odf, **kwargs)
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/pyranges/methods/coverage.py", line 27, in _number_overlapping
_self_indexes, _other_indexes = oncls.all_overlaps_both(
File "ncls/src/ncls32.pyx", line 76, in ncls.src.ncls32.NCLS32.all_overlaps_both
File "ncls/src/ncls32.pyx", line 122, in ncls.src.ncls32.NCLS32.all_overlaps_both
File "<array_function internals>", line 5, in resize
File "/usr/local/anaconda3-2020/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 1417, in resize
a = concatenate((a,) * n_copies)
File "<array_function internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

Please guide what could be the error due to ?

Thanks

latest version issue: RuntimeError: The entries you tried to add are out of order, precede already added entries, or otherwise use illegal values. Please correct this and try again.

Hi Muhammed,

Well done with the documentation updates! This is great. I have upgraded to the latest, as suggested. However, I have come across an issue: (full error at the bottom)

lapa command: lapa --alignment samples.csv --fasta GRCh38.primary_assembly.genome.fa --annotation hg39.utr_fixed.gtf --chrom_sizes chrom_sizes --output_dir lapa_c_vs_t

(these are the same input files which worked with the previous version, except I fixed the UTR, which was in the docs:

#gencode_utr_fix --input_gtf mm10.gtf --output_gtf mm10.utr_fixed.gtf

wget -O - https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/gencode.v40.annotation.gtf.gz | gunzip -c > hg38.gtf

gencode_utr_fix --input_gtf hg38.gtf --output_gtf hg39.utr_fixed.gtf

gencode_utr_fix --input_gtf gencode.v39.primary_assembly.annotation.gtf --output_gtf hg39.utr_fixed.gtf

Both of these fail in with the main lapa command

.....
.....
[E::idx_find_and_load] Could not retrieve index file for '/home/pthorpe/scratch/mustafa/lapa/reads_bams/R6_Trt_LONG.fastq.gz.temp.mapped.bam'
Traceback (most recent call last):
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/bin/lapa", line 8, in
sys.exit(cli_lapa())
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/main.py", line 112, in cli_lapa
lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/lapa.py", line 288, in call
df_all_count, sample_counts = self.counting(alignment)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/lapa.py", line 143, in counting
counter._to_bigwig(df_all_count, sample_counts, self.chrom_sizes,
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/count.py", line 561, in to_bigwig
save_count_bw(df_all, output_dir, chrom_sizes, f'all{prefix}')
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/count.py", line 197, in save_count_bw
BaseCounter._to_bigwig(df, chrom_sizes, output_dir, prefix)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/count.py", line 153, in _to_bigwig
bw_from_pyranges(
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/utils/io.py", line 153, in bw_from_pyranges
gr['-'].to_bigwig(bw_neg_file, chromosome_sizes=chrom_sizes,
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/pyranges.py", line 5339, in to_bigwig
result = _to_bigwig(self, path, chromosome_sizes, rpm, divide, value_col, dryrun)
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/out.py", line 203, in _to_bigwig
bw.addEntries(chromosomes, starts, ends=ends, values=values)
RuntimeError: The entries you tried to add are out of order, precede already added entries, or otherwise use illegal values.
Please correct this and try again.

Would you be able to help?

regards,

Pete

Fix sorted-nearest version 0.0.33

Dear Lapa,

FIX: this is a fix for all the blah below:

mamba install -c bioconda pyranges (it seems the pip version of pyranges does not have all that is required).

I have installed lap via pip install lapa (but I have to also do pip install cython). I have this in a conda environment running python 3.8.

If I run:

$ lapa
Traceback (most recent call last):
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/bin/lapa", line 5, in
from lapa.main import cli
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/init.py", line 1, in
from lapa.main import lapa
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/main.py", line 2, in
from lapa.lapa import lapa
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/lapa/lapa.py", line 3, in
import pyranges as pr
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/init.py", line 137, in
import pyranges.genomicfeatures as gf
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/genomicfeatures.py", line 7, in
from sorted_nearest.src.introns import find_introns
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/sorted_nearest/init.py", line 7, in
from sorted_nearest.src.k_nearest_ties import get_all_ties, get_different_ties
ImportError: cannot import name 'get_all_ties' from 'sorted_nearest.src.k_nearest_ties' (/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/sorted_nearest/src/k_nearest_ties.cpython-38-x86_64-linux-gnu.so)

Then if I force it to use python3.8 with a lapa.py version from github: I get the same error.

$ python3.8 "/mnt/shared/scratch/pthorpe/private/mustafa/lapa/lapa/lapa/lapa.py"
Traceback (most recent call last):
File "/mnt/shared/scratch/pthorpe/private/mustafa/lapa/lapa/lapa/lapa.py", line 3, in
import pyranges as pr
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/init.py", line 137, in
import pyranges.genomicfeatures as gf
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/pyranges/genomicfeatures.py", line 7, in
from sorted_nearest.src.introns import find_introns
File "/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/sorted_nearest/init.py", line 7, in
from sorted_nearest.src.k_nearest_ties import get_all_ties, get_different_ties
ImportError: cannot import name 'get_all_ties' from 'sorted_nearest.src.k_nearest_ties' (/mnt/shared/scratch/pthorpe/apps/conda/envs/python38/lib/python3.8/site-packages/sorted_nearest/src/k_nearest_ties.cpython-38-x86_64-linux-gnu.so)

If I then run pip install pyranges. It says it is already satisfied ..

ValueError: Invalid win_type gaussian

/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/count.py:594: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function. Traceback (most recent call last): File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/bin/lapa", line 33, in <module> sys.exit(load_entry_point('lapa==0.0.5', 'console_scripts', 'lapa')()) File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/main.py", line 112, in cli_lapa File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/lapa.py", line 497, in lapa File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/lapa.py", line 293, in __call__ File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/lapa.py", line 149, in clustering File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/cluster.py", line 377, in to_df File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/cluster.py", line 378, in <listcomp> File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/cluster.py", line 255, in to_dict File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/cluster.py", line 135, in to_dict File "/home/sutai/.local/lib/python3.10/site-packages/lapa-0.0.5-py3.10.egg/lapa/cluster.py", line 118, in peak File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/pandas/core/generic.py", line 11986, in rolling return Window( File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/pandas/core/window/rolling.py", line 165, in __init__ self._validate() File "/home/sutai/biosoft/localcolabfold/localcolabfold/colabfold-conda/lib/python3.10/site-packages/pandas/core/window/rolling.py", line 1168, in _validate raise ValueError(f"Invalid win_type {self.win_type}") ValueError: Invalid win_type gaussian

Refining transcript 3' and 5' ends with LAPA

Hi @MuhammedHasan

I see that you have 'lapa.correction.Transcript' how would I be able to use it.

To my understanding default LAPA deals with 'gene_id' how would I change the analysis so that it looks at the 'transcript_id' instead?

As always thank you
Mustafa

Support for non stranded paired ended data

Hi there, I encountered this error when running LAPA on single short read BAM file. What do you advise to solve this? Thanks!

lapa --alignment ${illumina_bam_dir}/${bamfile} --fasta ${reference_genome_fa} --annotation ${reference_gtf} --chrom_sizes ${chrom_sizes} --output_dir ${outdir}/vb_annot/${samplename}_illumina

Error:

Traceback (most recent call last): File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/bin/lapa", line 8, in sys.exit(cli_lapa()) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/click/core.py", line 1130, in call return self.main(*args, **kwargs) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/main.py", line 112, in cli_lapa lapa(alignment, fasta, annotation, chrom_sizes, output_dir, File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/lapa.py", line 497, in lapa _lapa(alignment) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/lapa.py", line 288, in call df_all_count, sample_counts = self.counting(alignment) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/lapa.py", line 142, in counting df_all_count, sample_counts = counter.to_df() File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/count.py", line 583, in to_df df = pd.concat([ File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/count.py", line 584, in self.build_counter(row['path']) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/lapa/count.py", line 142, in to_df return self.to_gr().df.astype({'Chromosome': 'str', 'Strand': 'str'}) File "/hpcdata/bcbb/homc/conda_envs/envs/lapa_env/lib/python3.9/site-packages/pandas/core/generic.py", line 6212, in astype raise KeyError( KeyError: "Only a column name can be used for the key in a dtype mappings argument. 'Chromosome' not found in columns."

My chrom.sizes file for Anopheles gambiae looks like this, in case that helps (it was generated using samtools faidx as instructed)

AgamP4_2L 49364325
AgamP4_2R 61545105
AgamP4_3L 41963435
AgamP4_3R 53200684
AgamP4_UNKN 42389979
AgamP4_X 24393108
AgamP4_Y_unplaced 237045
AAAB01000047 21505
AAAB01000163 28420
AAAB01000448 22809
AAAB01000791 62303
(..more contigs..)
AgamP4_Mt 15363

error: "Only a column name can be used for the key in a dtype mappings argument"

Hi Muhammed,
I'm trying to test lapa with RNAseq short reads. I'm using hisat2 for the mapping ( I built the hg38 with transcript index using the files suggested in the lapa tutorial). And my python version is 3.9

After fixing the gtf file and gave it the right format to all the inputs. Lapa failed after trying to process the bam for the first sample with the following error:

$ lapa --alignment samples.csv --fasta genome.fa --annotation genome_utr.gtf --chrom_sizes chrom_sizes --output_dir lapa_test
Traceback (most recent call last):
File "/home/eortiz/.local/bin/lapa", line 8, in
sys.exit(cli_lapa())
File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/main.py", line 112, in cli_lapa
lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/lapa.py", line 288, in call
df_all_count, sample_counts = self.counting(alignment)
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/lapa.py", line 142, in counting
df_all_count, sample_counts = counter.to_df()
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/count.py", line 583, in to_df
df = pd.concat([
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/count.py", line 584, in
self.build_counter(row['path'])
File "/home/eortiz/.local/lib/python3.9/site-packages/lapa/count.py", line 142, in to_df
return self.to_gr().df.astype({'Chromosome': 'str', 'Strand': 'str'})
File "/zfs/gcl/software/gbf/anaconda3/2021.11/lib/python3.9/site-packages/pandas/core/generic.py", line 5791, in astype
raise KeyError(
KeyError: 'Only a column name can be used for the key in a dtype mappings argument.'

I know this error is generated when the names in the columns don't match exactly, but I'm not so sure how to fix it.
Any suggestion is welcome.

Thanks.

other PolyA motifs

Dear @MuhammedHasan

Great work on LAPA, planning to start using it very soon. I have a question regarding the poly A motifs, as per preprint you have mentioned that LAPA looks for the canonical AATAAA, is this the only motif it searches for before determining a polyA site usage? Have you considered adding other motifs such as

aataaa
attaaa
agtaaa
tataaa
cataaa
gataaa
aatata
aataca
aataga
aaaaag
actaaa
aagaaa
aatgaa
tttaaa
aaaaca
ggggct

I have done some preliminary analysis with SQANTI3 and the distribution change between cells type from AATAAA being used 70% in one to 46% in another. The second, most used is ATTAAA. The others such as AAAAAG and AGTAAA change the most as percentages between cell types

Kind Regards
Mustafa

output files

Dear Lapa,

I have run your tool, currently as a test. I was wondering if you could add some documentation/ pass on some wisdom on what output files to expect?

I have bw and bed files for all my conditions. I have polyA_cluster.bed (what do the columns stand for? What does "None@None" in the last columns mean? .

I have no other files ... ... I was wondering if the program terminated early, however, there is no error or warning in any out files via slurm .. (RAM limit was not exceeded)

can you please advise?

Kind regards,

Pete

AssertionError: Can only do stranded operations when both PyRanges contain strand info

I use LAPA on aligned pacbio data from minimap2

I got the following error:

File "/usr/nzx-cluster/apps/lapa/python3.8.11/bin/lapa", line 8, in
sys.exit(cli_lapa())
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/lapa/main.py", line 112, in cli_lapa
lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/lapa/lapa.py", line 297, in call
df_cluster = self.annotate_cluster(df_cluster)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/lapa/lapa.py", line 155, in annotate_cluster
df = self.create_genomic_regions().annotate(gr)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/lapa/genomic_regions.py", line 66, in annotate
gr_ann = pr.PyRanges(gr.df, int64=True).join(
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/pyranges/pyranges_main.py", line 2433, in join
dfs = pyrange_apply(_write_both, self, other, **kwargs)
File "/usr/nzx-cluster/apps/lapa/python3.8.11/lib/python3.8/site-packages/pyranges/multithreaded.py", line 207, in pyrange_apply
assert (
AssertionError: Can only do stranded operations when both PyRanges contain strand info

ValueError: new categories must not include old categories

Hi, I am using lapa for the DRS and cDNA ONT data. While it runs smoothly in DRS, in case of the cDNA reads, it throws an error at the clustering stage.

I used the following command:
lapa --alignment alignment.csv --fasta /references/reference/ucsc/rn7.fa --annotation /references/reference/ucsc/lapa_utrs_ncbiRefSeq.gtf --chrom_sizes /references/reference/ucsc/chrom_sizes.txt --output_dir /ANALYSES/rat/cDNA/LAPA

And here is the traceback:
Traceback (most recent call last):
File "/usr/local/software/lapa/eb16fee/bin/lapa", line 11, in
load_entry_point('lapa==0.0.5', 'console_scripts', 'lapa')()
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/main.py", line 122, in cli_lapa
non_replicates_read_threhold=non_replicates_read_threhold)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/lapa.py", line 297, in call
df_cluster = self.annotate_cluster(df_cluster)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/lapa.py", line 155, in annotate_cluster
df = self.create_genomic_regions().annotate(gr)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/lapa/genomic_regions.py", line 67, in annotate
gr_gtf, strandedness='same', how='left')
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/pyranges.py", line 2257, in join
dfs = pyrange_apply(_write_both, self, other, **kwargs)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/multithreaded.py", line 236, in pyrange_apply
result = call_f(function, nparams, df, odf, kwargs)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/multithreaded.py", line 23, in call_f
return f.remote(df, odf, **kwargs)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/methods/join.py", line 129, in _write_both
scdf, ocdf = _both_dfs(scdf, ocdf, how=how)
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/methods/join.py", line 83, in _both_dfs
oh = null_types(ocdf.head(1))
File "/usr/local/software/lapa/eb16fee/lib/python3.6/site-packages/pyranges/methods/join.py", line 67, in null_types
tmp_cat = tmp_cat.cat.add_categories("-1")
File "/usr/local/software/python/3.6.11/lib/python3.6/site-packages/pandas/core/accessor.py", line 89, in f
return self._delegate_method(name, *args, **kwargs)
File "/usr/local/software/python/3.6.11/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 2403, in _delegate_method
res = method(*args, **kwargs)
File "/usr/local/software/python/3.6.11/lib/python3.6/site-packages/pandas/core/arrays/categorical.py", line 1023, in add_categories
raise ValueError(msg.format(already_included=already_included))
ValueError: new categories must not include old categories: {'-1'}

I would be grateful for solving the issue.

pyrange error "ValueError: all elements of `new_shape` must be non-negative"

Hi Mihammed,

I got the following pyrange error:

Traceback (most recent call last):
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/bin/lapa", line 8, in
sys.exit(cli_lapa())
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/main.py", line 112, in cli_lapa
lapa(alignment, fasta, annotation, chrom_sizes, output_dir,
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/lapa.py", line 497, in lapa
_lapa(alignment)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/lapa.py", line 288, in call
df_all_count, sample_counts = self.counting(alignment)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/lapa.py", line 142, in counting
df_all_count, sample_counts = counter.to_df()
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/count.py", line 583, in to_df
df = pd.concat([
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/count.py", line 584, in
self.build_counter(row['path'])
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/count.py", line 142, in to_df
return self.to_gr().df.astype({'Chromosome': 'str', 'Strand': 'str'})
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/lapa/count.py", line 136, in to_gr
return pr.PyRanges(df).count_overlaps(
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/pyranges/pyranges_main.py", line 1385, in count_overlaps
counts = pyrange_apply(_number_overlapping, self, other, **kwargs)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/pyranges/multithreaded.py", line 231, in pyrange_apply
result = call_f(function, nparams, df, odf, kwargs)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/pyranges/multithreaded.py", line 21, in call_f
return f.remote(df, odf, **kwargs)
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/pyranges/methods/coverage.py", line 26, in _number_overlapping
_self_indexes, _other_indexes = oncls.all_overlaps_both(starts, ends, indexes)
File "ncls/src/ncls.pyx", line 74, in ncls.src.ncls.NCLS64.all_overlaps_both
File "ncls/src/ncls.pyx", line 115, in ncls.src.ncls.NCLS64.all_overlaps_both
File "<array_function internals>", line 5, in resize
File "/cluster/work/bewi/members/dondia/Anaconda3/envs/snakemake/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 1423, in resize
raise ValueError('all elements of new_shape must be non-negative')
ValueError: all elements of new_shape must be non-negative

If it can help you, here is the format of one bam read :

molecule/4051_GGCAATACTCGTGACC_B900_Tum_B900_Tum 16 chr1 14424 12 406M140N69M757N108M1I44M659N159M92N198M177N56M GATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAACACTGTGGCGCAGGCTGGGTGGAGCCGTCCCCCCATGGAGCACAGGCAGACAGAAGTCCCCGCCCCAGCTGTGTGGCCTCAAGCCAGCCTTCCGCTCCTTGAAGCTGGTCTCCACACAGTGCTGGTTCCGTCACCCCCTCCCAAGGAAGTAGGTCTGAGCAGCTTGTCCTGGCTGTGTCCATGTCAGAGCAACGGCCCAAGTCTGGGTCTGGGGGGGAAGGTGTCATGGAGCCCCCTACGATTCCCAGTCGTCCTCGTCCTCCTCTGCCTGTGGCTGCTGCGGTGGCGGCAGAGGAGGGATGGAGTCTGACACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGCCTCATGACCAGCTTGTTGAAGAGATCCGACATCAAGTGCCCACCTTGGCTCGTGGCTCTCACTTGCTCCTGCTCCTTCTGCTGCTTCTTCTCCAGCTTTCGCTCCTTCATGCTGCGCAGCTTGGCCTTGCCGATGCCCCCAGCTTGGCGGATGGACTCTAGCAGAGTGGCCCAGCCACCGGAGGGGTCAACCACTTCCCTGGGAGCTCCCTGGACTGAAGGAGACGCGCTGCTGCTGCTGTCGTCCTGCCTGGCGCCTTGGCCTACAGGGGCCGCGGTTGAGGGTGGGAGTGGGGGTGCACTGGCCAGCACCTCAGGAGCTGGGGGTGGTGGTGGGGGCGGTGGGGGTGGTGTTAGTACCCCATCTTGTAGGTCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATGGTGCCAGGGGCAGAGGGGGCAATGCCGGGGCCCAGGTCGGCAATGTACATGAGGTCGTTGGCAATGCCGGGCAGGTCAGGCAGGTAGGATGGAACATCAATCTCAGGCACCTGGCCCAGGTCTGGCACATAGAAGTAGTTCTCTGGGACCTGCTGTTCCAGCTGCTCTCTCTTGCTGATGGACAAGGGGGCATCAAACAGCTTCT * NM:i:3 ms:i:1031 AS:i:87nn:i:0 ts:A:+ tp:A:P cm:i:307 s1:i:987 s2:i:975 de:f:0.0029 rl:i:0

Let me know if you need any further detail.

Thanks for the help