Code Monkey home page Code Monkey logo

drukbam's Introduction

DrukBam

DrukBam is a program for plotting alignment files (.bam) for all comandline aficionados.

DrukBam can be used with or without a reference fasta file and allows fast plotting multiple variants or regions of interest. Please provide feedback like bugs or options you might miss, I wrote this programm because I did not found a convicning tool to provide fast plotting of alignemnts withput using a GUI like in IGV or Tablet.

reference free

including a reference

split reads by strand

bigger span

changing plt style to dark or bmh

highlight soft/hard clipped reads by threshold --> visualize insertion points

Installing

requirements

  • pysam
  • pandas
  • matplotlib
  • tqdm

installation

DrukBam is available via pypi:

pip install drukbam==1.1.4

docker image

docker pull stephanholgerdrukewitz/drukbam:1.1.4

docker usage


docker  run -it --rm -v $PWD:/data drukbam:1.1.4 DrukBam region  -s 281367 -e 281468   -c 19 -b /data/test_data/test_small.bam  --outfmt png  -i example_out_small --maxcoverage 60 --outlineoff


❗ ❗ versions <1.1.4 are deprecated and should not be used anymore ❗ ❗


Usage

DrukBam vcf
usage: DrukBam vcf [-h] -b BAM -v VCF [-p PADDING] [--highlight]
                   [--threads THREADS] [--maxcoverage MAXCOVERAGE]
                   [--direction] [--schematic] [--style STYLE] [--fasta FASTA]
                   [--outputdir OUTPUTDIR] [-i ID] [--chunksize CHUNKSIZE]
                   [--outfmt OUTFMT] [--outlineoff]

optional arguments:
  -h, --help            show this help message and exit

required arguments:
  -b BAM, --bam BAM     Pos. sorted and indexed bam file
  -v VCF, --vcf VCF     vcf file with variants of interest
  -p PADDING, --padding PADDING
                        number of nt around the variant
  --highlight           highlight the position of interest

optional arguments:
  --threads THREADS     number of cpu's to run in paralell, ROI <1000 will
                        always use 1 core
  --maxcoverage MAXCOVERAGE
                        max cov to plot
  --direction           split reads by forward and reverse
  --schematic           plot no nucleotide, recommended for ROI>1000
  --style STYLE         different style options for the plot, provide .ini
                        file
  --fasta FASTA         fasta file for reference related plotting
  --outputdir OUTPUTDIR
                        directory for output
  -i ID, --id ID        output filename
  --chunksize CHUNKSIZE
                        max size of visualized area, can be increases but will
                        sow down calculation
  --outfmt OUTFMT       format of plot, choose between pdf,svg,png
  --outlineoff          plotting of read outline

DrukBam region
usage: DrukBam region [-h] -b BAM -c CHROMOSOME -s START -e END
                    [--threads THREADS] [--maxcoverage MAXCOVERAGE]
                    [--direction] [--schematic] [--style STYLE]
                    [--fasta FASTA] [--outputdir OUTPUTDIR] [-i ID]
                    [--chunksize CHUNKSIZE] [--outfmt OUTFMT] [--outlineoff]

optional arguments:
-h, --help            show this help message and exit

required arguments:
-b BAM, --bam BAM     Pos. sorted and indexed bam file
-c CHROMOSOME, --chromosome CHROMOSOME
                      name of chromosome/contig
-s START, --start START
                      start of region of interest
-e END, --end END     end of the region of interest

optional arguments:
--threads THREADS     number of cpu's to run in paralell, ROI <1000 will
                      always use 1 core
--maxcoverage MAXCOVERAGE
                      max cov to plot
--direction           split reads by forward and reverse
--schematic           plot no nucleotide, recommended for ROI>1000
--style STYLE         different style options for the plot, provide .ini
                      file
--fasta FASTA         fasta file for reference related plotting
--outputdir OUTPUTDIR
                      directory for output
-i ID, --id ID        output filename
--chunksize CHUNKSIZE
                      max size of visualized area, can be increases but will
                      sow down calculation
--outfmt OUTFMT       format of plot, choose between pdf,svg,png
--outlineoff          plotting of read outline


Usage Examples:

The following command will create an image of that region:

DrukBam region  -s 281367 -e 281468   -c 19 -b test_data/test_small.bam  --outfmt png  -i example_out --maxcoverage 60 --outlineoff --fasta test_data/chr19_first500k.fasta

The arguments used above are:

-s start of ROI

-e end of ROI

-c chromosome of ROI

-b alignment file, sorted and indexed

--outfmt format of plot

-i ID which is used for naming the plot

--maxcoverage yaxis max of plot

--outlineoff dont draw outlines around every read

--fasta location of ref. fasta

The following command will plot all positions in a vcf file:

DrukBam vcf -b test_data/test_small.bam  -v example.vcf --padding 100  -i example_vcf --maxcoverage 60  --fasta test_data/chr19_first500k.fasta --threads 12

The arguments used above are:

-b alignment file, sorted and indexed

-v vcf file

-i ID which is used for naming the plot

--maxcoverage yaxis max of plot

--fasta location of ref. fasta

--threads number of cpu's to use,

Style changes:

  • color and style can be changed using the --style option and providing a style.ini file
  • official matplotlib colors are allowed color list
  • pltstyle can be changed style list

drukbam's People

Contributors

stephanholgerd avatar stephanholgerdrukewitz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

adrisede jiangchb

drukbam's Issues

Conda install not working

Hello,

I am getting an error when trying to install with conda:

$ conda install -c stephanholgerd drukbam
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  - drukbam

Current channels:

  - https://conda.anaconda.org/stephanholgerd/osx-64
  - https://conda.anaconda.org/stephanholgerd/noarch
  - https://repo.anaconda.com/pkgs/main/osx-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/r/osx-64
  - https://repo.anaconda.com/pkgs/r/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page

Could you please help me?

Thanks

Index error: string index out of range when trying to plot alignments using a reference fasta

Hi! :) Great tool to visualize specific regions of alignments! I am trying to use it together with a reference fasta, but I keep getting the following error:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/leti/miniconda3/envs/nanoporeProcessing/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Users/leti/miniconda3/envs/nanoporeProcessing/lib/python3.9/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/Users/leti/miniconda3/envs/nanoporeProcessing/lib/python3.9/site-packages/DrukBam/vcfParse.py", line 51, in PlotV
    ploter.Plot()
  File "/Users/leti/miniconda3/envs/nanoporeProcessing/lib/python3.9/site-packages/DrukBam/MapPlot.py", line 77, in Plot
    self.PlotNuc()
  File "/Users/leti/miniconda3/envs/nanoporeProcessing/lib/python3.9/site-packages/DrukBam/MapPlot.py", line 297, in PlotNuc
    self.CalcPlot.PlotNucChunk(d,self.ax,chunk[1][1],chunk[1][2],flag=self.flag)
  File "/Users/leti/miniconda3/envs/nanoporeProcessing/lib/python3.9/site-packages/DrukBam/PlotCalc.py", line 287, in PlotNucChunk
    if self.fasta != 'None' and fastaChunk[fastapos]==query_alignment_sequence[p]:
IndexError: string index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/leti/miniconda3/envs/nanoporeProcessing/bin/DrukBam", line 11, in <module>
    sys.exit(main())
  File "/Users/leti/miniconda3/envs/nanoporeProcessing/lib/python3.9/site-packages/DrukBam/__main__.py", line 101, in main
    ploter.MultiPlot()
  File "/Users/leti/miniconda3/envs/nanoporeProcessing/lib/python3.9/site-packages/DrukBam/vcfParse.py", line 59, in MultiPlot
    results = pool.starmap(self.PlotV, multi)
  File "/Users/leti/miniconda3/envs/nanoporeProcessing/lib/python3.9/multiprocessing/pool.py", line 372, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/Users/leti/miniconda3/envs/nanoporeProcessing/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
IndexError: string index out of range
/Users/leti/miniconda3/envs/nanoporeProcessing/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown  
  warnings.warn('resource_tracker: There appear to be %d '

This is the command I am running
vcf -b ../risk_alleles/sorted_reads_risk.bam -v ../VCF/testparsing.vcf --padding 100 -i test_region_1 --maxcoverage 20 --highlight --fasta ref/chr11_500k.fasta

I do not have any issues when running it without the --fasta flag, so I am assuming there is some problem with the reference file, but I cannot figure out what it is - I made sure that all the SNPs I want to show fall within the interval of the reference genome that I dowloaded. Do you have any idea of what could be going wrong?

`ValueError: invalid literal for int() with base 10: '151=1'` in PlotCalc.py

Hi!

Your tool looks amazing and perfect for programmatically creating visualisations for a pipeline I am working on.
However I haven't been able to get it to run on my data, I'm getting this back, from my command:

DrukBam region \
    -s 1 -e 16569 -c chrM \
    -b hifi_reads.bam \
    --fasta ref.fasta \
    --outfmt png \
    -i test \
    --chunksize 16569 \
    --threads 4

Output:

9449it [01:37, 97.30it/s] 
/home/UNIXHOME/cnolan/miniconda3/envs/drukbam/lib/python3.9/site-packages/DrukBam/PlotCalc.py:149: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_xticklabels([str("{:,}".format(start)),str("{:,}".format(end))], rotation=40, ha='right')
Traceback (most recent call last):
  File "/home/UNIXHOME/cnolan/miniconda3/envs/drukbam/bin/DrukBam", line 11, in <module>
    sys.exit(main())
  File "/home/UNIXHOME/cnolan/miniconda3/envs/drukbam/lib/python3.9/site-packages/DrukBam/__main__.py", line 82, in main
    ploter.Plot()
  File "/home/UNIXHOME/cnolan/miniconda3/envs/drukbam/lib/python3.9/site-packages/DrukBam/MapPlot.py", line 77, in Plot
    self.PlotNuc()
  File "/home/UNIXHOME/cnolan/miniconda3/envs/drukbam/lib/python3.9/site-packages/DrukBam/MapPlot.py", line 297, in PlotNuc
    self.CalcPlot.PlotNucChunk(d,self.ax,chunk[1][1],chunk[1][2],flag=self.flag)
  File "/home/UNIXHOME/cnolan/miniconda3/envs/drukbam/lib/python3.9/site-packages/DrukBam/PlotCalc.py", line 218, in PlotNucChunk
    chunk_cigarstring=self.CigChunker(cig)
  File "/home/UNIXHOME/cnolan/miniconda3/envs/drukbam/lib/python3.9/site-packages/DrukBam/PlotCalc.py", line 180, in CigChunker
    cigL=cigL+parseCig(cig)[1]
  File "/home/UNIXHOME/cnolan/miniconda3/envs/drukbam/lib/python3.9/site-packages/DrukBam/PlotCalc.py", line 176, in parseCig
    liste=([cig[p]]*int(cig[:p]))
ValueError: invalid literal for int() with base 10: '151=1'

Looks like a problem with parsing a CIGAR string? Any help troubleshooting would be appreciated

Is --threads parameter working?

I run this program with "--threads 32" option, but the CPU utilization still shows only one process working.
Should I miss something or is it a bug?
The drukbam version was 1.1.4, and my configuration is Ubuntu 22.04.4 LTS x86_64 with AMD Ryzen 9 5950X.

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.