joshuagryphon / plastid Goto Github PK

Position-wise analysis of sequencing and genomics data

Home Page: https://plastid.readthedocs.io

License: Other

Makefile 0.34% Python 42.35% C 55.69% C++ 0.11% Dockerfile 0.03% Shell 0.31% M4 0.08% Roff 0.22% PostScript 0.02% Cython 0.84%

plastid's Introduction

Welcome to plastid!

For documentation, see our home page on ReadtheDocs.

To run the tests, download the test dataset and unpack it into plastid/test.

Introduction

plastid is a Python library for genomic analysis -- in particular, high-throughput sequencing data -- with an emphasis on simplicity for users. It was written by Joshua Dunn in Jonathan Weissman's lab at UCSF, initially for analysis of ribosome profiling and RNA-seq data. Versions of it have been used in several publications.

plastid intended audience includes computational and traditional biologists, software developers, and even those who are new to sequencing analysis. It is released under the BSD 3-Clause license.

This package provides:

A set of scripts that implement common sequencing analyses
A set of classes for exploratory data analysis. These provide simple and consistent interfaces for manipulating genomic features, read alignments, and quantitative data; and readily interface with existing scientific tools, like the SciPy stack.
Script writing tools that make it easy to use the objects implemented in plastid.
Extensive documentation, both in source code and at our home page on ReadtheDocs.

Installation

Bioconda

Bioconda is a channel for the conda package manager with a focus on bioinformatics software. Once you have Bioconda installed, installing plastid is as easy as running:

$ conda create -n plastid plastid
$ source activate plastid

This will install all of the necessary dependencies for plastid in an isolated environment.

PyPI

plastid can be installed directly from PyPI

$ pip install numpy pysam cython $ pip install plastid

If you get any runtime warnings about numpy versions having changed, or about a missing module in Pysam, or about some object being the wrong size, try regenerating the included C source files from the original Cython code. To do this type:

$ pip install --upgrade  --install-option='--recythonize' plastid

Running the tests

NOTE: to run the entire test suite you'll first need to download our test dataset and unpack it into plastid/test/data.

We use nose as our test runner, and test under different versions of Python using tox. To completely control the environment (e.g. compilers et c), we recommend running the tests inside the Docker container, which contains large data files needed for the tests that aren't packaged with plastid by default:

# build & run the Docker image from within the project folder
$ docker build --pull -t plastid .
$ docker run -it --rm plastid

# inside the container, run the tests over all default configurations
root@plastid $ tox

Our tox config lets developers run subsets of tests rather than the full suite. All positional arguments are passed through to nosetests

# run all tests within the plastid.test.unit subpackage
root@plastid $ tox plastid.test.unit

# run tests in two files
root@plastid $ tox plastid.test.unit.genomics.readers.test_bed plastid.test.unit.util.io.test_binary

By default, tox recompiles all C extensions before running the tests. This can be slow. To avoid doing that, set the environment variable PLASTID_NOREBUILD to true:

# run unit tests without rebuilding the C extensions
root@plastid $ env PLASTID_NOREBUILD=true tox plastid.test.unit

Finally, if you only want to test in some, not all environments, you can do so with typical tox syntax:

# list available test environments
root@plastid $ tox -l
py36-pinned
py36-latest
py39-latest

# run only in 2 selected environments
root@plastid $ tox -e py36-pinned,py39-latest plastid.test.unit

Links & help

plastid's People

Contributors

Stargazers

Watchers

Forkers

cyang-2014 binayaad davbzh monikaperez zhouyu bakerwm lparsons vreuter mropat jpmarks zzygyx9119 biogeeker genostack jdcla aydinyangayane

plastid's Issues

Cython not running during `python setup.py build` : missing c_common.c file when building version 0.4.6

I downloaded the source from https://files.pythonhosted.org/packages/source/p/plastid/plastid-0.4.6.tar.gz, untarred it and tried to run python setup.py build in the obtained source directory. This resulted in the following error:

gcc: error: /home/bli/src/plastid-0.4.6/plastid/genomics/c_common.c: No such file or directory

Is there some missing file or some missing building step somewhere?

Building wheel is failing in local install

I am trying to install this package locally from source using any of

pip install --user .
pip install --user -e .
pip install --user --global-option="--recythonize" .  # since --install-option was removed by pip

but I always get the following error

  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting scipy>=0.15.1 (from plastid==0.6.1)
  Using cached scipy-1.11.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36.3 MB)
Collecting pandas>=0.17.0 (from plastid==0.6.1)
  Using cached pandas-2.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
Requirement already satisfied: matplotlib>=1.4.0 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from plastid==0.6.1) (3.7.2)
Collecting biopython>=1.64 (from plastid==0.6.1)
  Using cached biopython-1.81-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
Collecting twobitreader>=3.0.0 (from plastid==0.6.1)
  Using cached twobitreader-3.1.7-py3-none-any.whl
Collecting termcolor (from plastid==0.6.1)
  Using cached termcolor-2.3.0-py3-none-any.whl (6.9 kB)
Requirement already satisfied: numpy>=1.9.4 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from plastid==0.6.1) (1.23.5)
Requirement already satisfied: pysam>=0.8.4 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from plastid==0.6.1) (0.21.0)
Requirement already satisfied: cython>=0.22.0 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from plastid==0.6.1) (3.0.0)
Requirement already satisfied: contourpy>=1.0.1 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from matplotlib>=1.4.0->plastid==0.6.1) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from matplotlib>=1.4.0->plastid==0.6.1) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from matplotlib>=1.4.0->plastid==0.6.1) (4.41.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from matplotlib>=1.4.0->plastid==0.6.1) (1.4.4)
Requirement already satisfied: packaging>=20.0 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from matplotlib>=1.4.0->plastid==0.6.1) (23.1)
Requirement already satisfied: pillow>=6.2.0 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from matplotlib>=1.4.0->plastid==0.6.1) (9.5.0)
Requirement already satisfied: pyparsing<3.1,>=2.3.1 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from matplotlib>=1.4.0->plastid==0.6.1) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from matplotlib>=1.4.0->plastid==0.6.1) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from pandas>=0.17.0->plastid==0.6.1) (2023.3)
Collecting tzdata>=2022.1 (from pandas>=0.17.0->plastid==0.6.1)
  Using cached tzdata-2023.3-py2.py3-none-any.whl (341 kB)
Requirement already satisfied: six>=1.5 in /home/moritzloewer/anaconda3/lib/python3.10/site-packages (from python-dateutil>=2.7->matplotlib>=1.4.0->plastid==0.6.1) (1.16.0)
Building wheels for collected packages: plastid
  Building wheel for plastid (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for plastid (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [89 lines of output]
      running bdist_wheel
      running build
      running build_py
      copying plastid/genomics/roitools.pyx -> build/lib.linux-x86_64-cpython-310/plastid/genomics
      copying plastid/genomics/map_factories.c -> build/lib.linux-x86_64-cpython-310/plastid/genomics
      copying plastid/genomics/roitools.c -> build/lib.linux-x86_64-cpython-310/plastid/genomics
      copying plastid/genomics/c_common.c -> build/lib.linux-x86_64-cpython-310/plastid/genomics
      copying plastid/readers/bigwig.c -> build/lib.linux-x86_64-cpython-310/plastid/readers
      copying plastid/readers/bbifile.c -> build/lib.linux-x86_64-cpython-310/plastid/readers
      copying plastid/readers/bigbed.c -> build/lib.linux-x86_64-cpython-310/plastid/readers
      running build_ext
      building 'plastid.genomics.c_common' extension
      gcc -pthread -B /home/moritzloewer/anaconda3/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/moritzloewer/anaconda3/include -fPIC -O2 -isystem /home/moritzloewer/anaconda3/include -fPIC -I/home/moritzloewer/Bachelor/plastid/plastid/kent/src/htslib -I/home/moritzloewer/Bachelor/plastid/plastid/kent/src/inc -I/tmp/pip-build-env-i6xqx771/normal/lib/python3.10/site-packages/numpy/core/include -I/tmp/pip-build-env-i6xqx771/normal/lib/python3.10/site-packages/pysam -I/tmp/pip-build-env-i6xqx771/normal/lib/python3.10/site-packages/pysam/include/htslib -I/tmp/pip-build-env-i6xqx771/normal/lib/python3.10/site-packages/pysam/include/samtools -I/home/moritzloewer/anaconda3/include/python3.10 -c /home/moritzloewer/Bachelor/plastid/plastid/plastid/genomics/c_common.c -o build/temp.linux-x86_64-cpython-310/home/moritzloewer/Bachelor/plastid/plastid/plastid/genomics/c_common.o
      gcc -pthread -B /home/moritzloewer/anaconda3/compiler_compat -shared -Wl,--allow-shlib-undefined -Wl,-rpath,/home/moritzloewer/anaconda3/lib -Wl,-rpath-link,/home/moritzloewer/anaconda3/lib -L/home/moritzloewer/anaconda3/lib -Wl,--allow-shlib-undefined -Wl,-rpath,/home/moritzloewer/anaconda3/lib -Wl,-rpath-link,/home/moritzloewer/anaconda3/lib -L/home/moritzloewer/anaconda3/lib build/temp.linux-x86_64-cpython-310/home/moritzloewer/Bachelor/plastid/plastid/plastid/genomics/c_common.o -lssl -lcrypto -lz -o build/lib.linux-x86_64-cpython-310/plastid/genomics/c_common.cpython-310-x86_64-linux-gnu.so
      warning: plastid/genomics/map_factories.pyx:134:0: The 'IF' statement is deprecated and will be removed in a future Cython version. Consider using runtime conditions or C macros instead. See https://github.com/cython/cython/issues/4310
      warning: plastid/genomics/map_factories.pyx:145:0: The 'DEF' statement is deprecated and will be removed in a future Cython version. Consider using global variables, constants, and in-place literals instead. See https://github.com/cython/cython/issues/4310
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
      LONG   = np.long
      
      ctypedef np.int_t    INT_t
      ctypedef np.float_t  FLOAT_t
      ctypedef np.double_t DOUBLE_t
      ctypedef np.long_t   LONG_t
               ^
      ------------------------------------------------------------
      
      plastid/genomics/map_factories.pyx:154:9: 'long_t' is not a type identifier
      warning: plastid/genomics/map_factories.pyx:542:24: Index should be typed for more efficient access
      Compiling /home/moritzloewer/Bachelor/plastid/plastid/plastid/genomics/map_factories.pyx because it changed.
      [1/1] Cythonizing /home/moritzloewer/Bachelor/plastid/plastid/plastid/genomics/map_factories.pyx
      Traceback (most recent call last):
        File "/home/moritzloewer/anaconda3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/moritzloewer/anaconda3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/moritzloewer/anaconda3/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 416, in build_wheel
          return self._build_with_temp_dir(['bdist_wheel'], '.whl',
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 401, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 476, in <module>
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/__init__.py", line 107, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-i6xqx771/normal/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 346, in run
          self.run_command("build")
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 131, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "<string>", line 398, in run
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
          self._build_extensions_serial()
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
          self.build_extension(ext)
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/Cython/Distutils/build_ext.py", line 122, in build_extension
          new_ext = cythonize(
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/Cython/Build/Dependencies.py", line 1134, in cythonize
          cythonize_one(*args)
        File "/tmp/pip-build-env-i6xqx771/overlay/lib/python3.10/site-packages/Cython/Build/Dependencies.py", line 1301, in cythonize_one
          raise CompileError(None, pyx_file)
      Cython.Compiler.Errors.CompileError: /home/moritzloewer/Bachelor/plastid/plastid/plastid/genomics/map_factories.pyx
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for plastid
Failed to build plastid
ERROR: Could not build wheels for plastid, which is required to install pyproject.toml-based projects

Alternatively is there a way to be able to import plastid in the python interpreter running in the anaconda environment (where installation succeeded) ?

the p site offset problem

psite offset text show 26nt and 32 nt offset is 13,but in the figure there is empty,that's why?

thank you!

Conda / Cython incompatibility: GCC finding different versions of numpy; Cython also not called by pip (formerly: Can't install)

I have upgraded cython, pysam, and numpy.

jfreimer@Mason: ~/build-conda/plastid> pip install --upgrade numpy pysam cython
Requirement already up-to-date: numpy in /N/home/j/f/jfreimer/Mason/miniconda/lib/python2.7/site-   packages
Requirement already up-to-date: pysam in /N/home/j/f/jfreimer/Mason/miniconda/lib/python2.7/site-packages
Requirement already up-to-date: cython in /N/home/j/f/jfreimer/Mason/miniconda/lib/python2.7/site-packages

However, I get the following error when I try to install plastid:

jfreimer@Mason: ~/build-conda/plastid> pip install plastid
Collecting plastid
  Using cached plastid-0.4.4.tar.gz
    Complete output from command python setup.py egg_info:


*** IMPORTANT INSTALLATION INFORMATION ***

plastid setup requires numpy>=1.9.0, pysam>=0.8.4, and cython>=0.22 to be preinstalled. Please
install these via pip, and retry:

    $ pip install --upgrade numpy pysam
    $ pip install plastid





----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-2_hiEj/plastid

Generated .bed file has "0" score in the 3rd line - When generating .bedgraph out of this .bed, no coverage of Ribo-seq data.

Dear Plastid Team,

I have been using plastid for about a week now. Everything runs fine, no errors. Metagene profile looks representative of regular riboseq data.

However, the .bed file that is returned (after the first 31 lines) when running the metagene generate function looks like this:

6 41079039 41079164 ENSG00000001167 0 + 41079089 41079090 0,0,0 1 125, 0,
6 46130172 46139783 ENSG00000001561 0 + 46139583 46139584 0,0,0 2 17,233, 0,9378,
4 11399805 11400055 ENSG00000002587 0 - 11400004 11400005 0,0,0 1 250, 0,
17 28355837 28357463 ENSG00000004142 0 - 28357447 28357448 0,0,0 2 39,176, 0,1450,
12 21491532 21499615 ENSG00000004700 0 - 21499569 21499570 0,0,0 2 184,61, 0,8022,
19 35755892 35757009 ENSG00000004776 0 - 35757007 35757008 0,0,0 2 2,199, 0,918,
X 11112018 11114934 ENSG00000004961 0 + 11112060 11112061 0,0,0 2 142,100, 0,2816,

4th line, on what I have checked on .BED file format, would result on a "score" which on this case I would assume would be the ribosome-protected-fragment counts on the selected windows. Is that right?

I then make a .bedgraph file using this command:
$ cut -f1-3,5 my.bed > my.bedgraph . It looks like this (again, after the first 31 lines):

6 41079039 41079164 0
6 46130172 46139783 0
4 11399805 11400055 0
17 28355837 28357463 0
12 21491532 21499615 0
19 35755892 35757009 0
X 11112018 11114934 0

I would assume that the 3rd line should have some numbers so as to allow visualization into genome browsers or other tools.

Is there something I am doing wrong?

Thank you in advance,
Alexandros

KeyError: 'zero_point' when running metagene count

Dear Plastid Team,

I am working through the tutorial for the metagene analysis and have come up with an error that neither me or our IT department was able to solve. The metatgene_generate command runs successfully and I generated ROI files. However, the metagene count command generates the following error. I would appreciate any assistance or guidance in solving this problem.

-James

metagene [2018-07-09 18:29:33]: Opening ROI file merlin_cds_start_rois.txt ...
Traceback (most recent call last):
File "/usr/local/Anaconda/envs_app/plastid/0.4.8/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 3064, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'zero_point'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/apps/plastid/0.4.8/bin/metagene", line 11, in
load_entry_point('plastid==0.4.8', 'console_scripts', 'metagene')()
File "/usr/local/Anaconda/envs_app/plastid/0.4.8/lib/python3.5/site-packages/plastid/bin/metagene.py", line 1188, in main
do_count(args,al,pp,printer)
File "/usr/local/Anaconda/envs_app/plastid/0.4.8/lib/python3.5/site-packages/plastid/bin/metagene.py", line 821, in do_count
norm_start, norm_end = _get_norm_region(roi_table, args)
File "/usr/local/Anaconda/envs_app/plastid/0.4.8/lib/python3.5/site-packages/plastid/bin/metagene.py", line 744, in _get_norm_region
flank_upstream = roi_table["zero_point"][0]
File "/usr/local/Anaconda/envs_app/plastid/0.4.8/lib/python3.5/site-packages/pandas/core/frame.py", line 2688, in getitem
return self._getitem_column(key)
File "/usr/local/Anaconda/envs_app/plastid/0.4.8/lib/python3.5/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/usr/local/Anaconda/envs_app/plastid/0.4.8/lib/python3.5/site-packages/pandas/core/generic.py", line 2486, in _get_item_cache
values = self._data.get(item)
File "/usr/local/Anaconda/envs_app/plastid/0.4.8/lib/python3.5/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/usr/local/Anaconda/envs_app/plastid/0.4.8/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 3066, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'zero_point'

`crossmap` output file contains very short reads?

Hi,

I am new to ribosomal profiling, plastid is very helpful. I followed your guides on website, in masking genome section, I came across a problem. The command I use:
crossmap -k 26 --offset 12 --mismatches 2
I found that there were many short reads(even 1 base) in the output bed file:

chr10   12      513     chr10:12-513(+) 0       +       12      12      0,0,0   1       501,    0,
chr10   13      514     chr10:13-514(-) 0       -       13      13      0,0,0   1       501,    0,
chr10   539     540     chr10:539-540(+)        0       +       539     539     0,0,0   1       1,      0,
chr10   540     541     chr10:540-541(-)        0       -       540     540     0,0,0   1       1,      0,
chr10   590     593     chr10:590-593(+)        0       +       590     590     0,0,0   1       3,      0,
chr10   591     594     chr10:591-594(-)        0       -       591     591     0,0,0   1       3,      0,
chr10   603     631     chr10:603-631(+)        0       +       603     603     0,0,0   1       28,     0,
chr10   604     632     chr10:604-632(-)        0       -       604     604     0,0,0   1       28,     0,
chr10   694     695     chr10:694-695(+)        0       +       694     694     0,0,0   1       1,      0,
chr10   695     696     chr10:695-696(-)        0       -       695     695     0,0,0   1       1,      0,
chr10   697     698     chr10:697-698(+)        0       +       697     697     0,0,0   1       1,      0,
chr10   698     699     chr10:698-699(-)        0       -       698     698     0,0,0   1       1,      0,
chr10   730     733     chr10:730-733(+)        0       +       730     730     0,0,0   1       3,      0,
chr10   731     734     chr10:731-734(-)        0       -       731     731     0,0,0   1       3,      0,
chr10   764     1534    chr10:764-1534(+)       0       +       764     764     0,0,0   1       770,    0,
chr10   765     1535    chr10:765-1535(-)       0       -       765     765     0,0,0   1       770,    0,
chr10   1656    1819    chr10:1656-1819(+)      0       +       1656    1656    0,0,0   1       163,    0,
chr10   1657    1820    chr10:1657-1820(-)      0       -       1657    1657    0,0,0   1       163,    0,
chr10   1821    1873    chr10:1821-1873(+)      0       +       1821    1821    0,0,0   1       52,     0,
chr10   1822    1874    chr10:1822-1874(-)      0       -       1822    1822    0,0,0   1       52,     0,
chr10   1875    1877    chr10:1875-1877(+)      0       +       1875    1875    0,0,0   1       2,      0,
chr10   1876    1878    chr10:1876-1878(-)      0       -       1876    1876    0,0,0   1       2,      0,
chr10   1986    1987    chr10:1986-1987(+)      0       +       1986    1986    0,0,0   1       1,      0,
chr10   1987    1988    chr10:1987-1988(-)      0       -       1987    1987    0,0,0   1       1,      0,
chr10   2140    2142    chr10:2140-2142(+)      0       +       2140    2140    0,0,0   1       2,      0,
chr10   2141    2143    chr10:2141-2143(-)      0       -       2141    2141    0,0,0   1       2,      0,
chr10   2143    2147    chr10:2143-2147(+)      0       +       2143    2143    0,0,0   1       4,      0,
chr10   2144    2148    chr10:2144-2148(-)      0       -       2144    2144    0,0,0   1       4,      0,
chr10   2198    2200    chr10:2198-2200(+)      0       +       2198    2198    0,0,0   1       2,      0,
chr10   2199    2201    chr10:2199-2201(-)      0       -       2199    2199    0,0,0   1       2,      0,
chr10   2256    2258    chr10:2256-2258(+)      0       +       2256    2256    0,0,0   1       2,      0,
chr10   2257    2259    chr10:2257-2259(-)      0       -       2257    2257    0,0,0   1       2,      0,
chr10   2303    2336    chr10:2303-2336(+)      0       +       2303    2303    0,0,0   1       33,     0,
chr10   2304    2337    chr10:2304-2337(-)      0       -       2304    2304    0,0,0   1       33,     0,

Did I miss some key parameters setting? Is it the right output file? Thanks in advance!

Remove deprecated features

Plastid 0.4.9 contains many deprecated features (largely in plastid.util.scriptlib.argparsers) that are scheduled to be removed in Plastid 0.5.0. These need to be removed, and tests for the deprecated code migrated to the replacement code.

Conda / Cython incompatibility: GCC finding different version of Pysam from that in Conda environment (formerly: possible pysam 0.9.1 compatibility issue)

Hi Josh!

Just installed plastid in a mac os x conda environment and thought I'd post the issues I ran into. Even in the virtual environment pip build failed, I think due to pysam 0.6 being the latest on the defaults conda channel. I updated to the latest pysam I could find (0.9.1, on the bioconda channel), and voila, pip install plastid ran fine.

However, importing plastid raised the following error:
...from plastid.genomics.map_factories import *
File "cfaidx.pxd", line 40, in init plastid.genomics.map_factories (.../plastid/genomics/map_factories.c:24002)
ValueError: pysam.cfaidx.FastaFile has the wrong size, try recompiling

I rolled back to plastid's minimum pysam requirement (0.8.4) and now it's all peachy keen. Thought you'd want to know.

Thanks,
Calvin

Conda packages hang

It seems there is an issue with the regular expression here: https://github.com/joshuagryphon/plastid/blob/master/plastid/util/io/openers.py#L253

When conda packages are built and tested, the paths can be rather long and it seems that this is hanging. I'm not quite sure what the intent of the code is, but I suspect it could be simplified using some os.path functions.

Support CRAM files

A number of aligners are moving to CRAM from BAM. Because Pysam includes CRAM support, plastid could support these fairly easily by:

adjusting argument parsers in plastid.util.scriptlib.argparsers
minor changes to file handling/opening BAMGenomeArray

Undefined symbol caused by compilation error only under Python 2.7.17 (3.x, 2.7.15 are ok)

Phenomenon

Build is silently failing on recently upgraded versions of Ubuntu 18.04, in the sense that build completes, but importing plastid.genomics.map_factories results in an undefined symbol:

>>> from plastid import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "plastid/__init__.py", line 45, in <module>
    from plastid.genomics.genome_array import (
  File "plastid/genomics/genome_array.py", line 215, in <module>
    from plastid.genomics.map_factories import *
ImportError: plastid/genomics/map_factories.so: undefined symbol: bgzf_read

Terms

computer A: Running Ubuntu 18.04, but has not experienced a sudo apt-get upgrade in a long time
computer B: Running Ubuntu 18.04, upgraded weekly

Notes

undefined symbol smells of an issue with one or more of: the C compiler, header files, or source code. However:

the plastid source code hasn't changed
the relevant header files, which do come from pysam have changed across versions of pysam, but:
- all versions of pysam (including latest) work on computer A
- no version of pysam (including that in requirements.txt) works on computer B

This points to the C compiler or something close to it

Tests

GCC

This can't be the whole picture:

Computer A is running GCC 7.4.0. Computer B is running 7.5.0
Downgrading to GCC 7.4.0 on Computer B did not solve the problem
Upgrading to GCC 8.0.0 on Computer B also did not solve the problem
Build succeeds & all tests pass in a Docker container of Ubuntu 16.04 running GCC 4.x.x or similar

Python packages

These tests have been done in virtualenvs as well as Docker containers:

the error is not a function of Python dependencies, whether imported or cimported (working versions of all dependencies from computer A, when installed on computer B, triggered same error)
this is not a function of Python build chain (versions of setuptools, pip, and Cython more up-to-date on computer A than on computer B. Upgrading on computer B to versions on computer A did not solve problem)
Install succeeds on computer A in an Ubuntu 16.04 Docker container with the latest versions of everything in PyPI and the Ubuntu 16.04 repositories
Install fails on computer A in an Ubuntu 18.04 Docker container with the latest versions of everything in PyPI and the Ubuntu 16.04 repositories

Procedure for P-site estimation using stop codon peaks

Hi,

First of all, thank you for developing such a great tool.
It is easy to use and very helpful for my analysis.

I'd like to discuss my procedure for P-site estimation using stop codon peaks, not initiation peaks.

I am analyzing a ribosome profiling data, but it shows diminished initiation peaks due to poor stress response to lysis. So the P-site estimation using the initiation peaks generated noisy results.

Thus, I tried the P-site estimation using stop codon peaks.
Your tutorial mentions it but suggests using another way instead of psite script.
(http://plastid.readthedocs.io/en/latest/examples/p_site.html#id13)

but I think- it could be accomplished by simply,

run metagene generate with --landmark stop_codon option (set R.O.I to the stop codons)
run psite script with the R.O.I.
find peaks around 16-21nt, instead of 10-15nt to 5' end (+6nt then using initiation peaks. Because the stop codon would be at immediately downstream of A-site)

I think this is an easier way and actually, I found some peaks at 16~17nt for my data.
N. Ingolia's Sciences paper (Sciences 2009) also used this estimation. (See fig. 2B in the paper)
http://science.sciencemag.org/content/324/5924/218.figures-only

I look forward to hearing from you whether it makes sense or not.
Please correct me if I am wrong.

Thanks.

YY.

Roitools module error when running psite: "ValueError: too many values to unpack (expected 2)"

Hi,

I am getting this error when running psite :

psite -q my_rois.txt test --min_length 22 --max_length 34 --require_upstream --count_files harr.bam

psite [2020-10-30 10:19:10]: Opening count files harr.bam ...
psite [2020-10-30 10:19:10]: Counted 942136 total reads.
psite [2020-10-30 10:19:10]: Counted 1 ROIs ...
psite [2020-10-30 10:19:11]: Counted 1001 ROIs ...
psite [2020-10-30 10:19:11]: Counted 2001 ROIs ...
psite [2020-10-30 10:19:12]: Counted 3001 ROIs ...
psite [2020-10-30 10:19:13]: Counted 4001 ROIs ...
psite [2020-10-30 10:19:13]: Counted 5001 ROIs ...
psite [2020-10-30 10:19:14]: Counted 6001 ROIs ...
psite [2020-10-30 10:19:15]: Counted 7001 ROIs ...
psite [2020-10-30 10:19:15]: Counted 8001 ROIs ...
psite [2020-10-30 10:19:16]: Counted 9001 ROIs ...
psite [2020-10-30 10:19:17]: Counted 10001 ROIs ...
psite [2020-10-30 10:19:18]: Counted 11001 ROIs ...
psite [2020-10-30 10:19:18]: Counted 12001 ROIs ...
Traceback (most recent call last):
File "/home/user/anaconda2/envs/bin/psite", line 8, in
sys.exit(main())
File "/home/user/anaconda2/envs/lib/python2.7/site-packages/plastid/bin/psite.py", line 345, in main
printer=printer)
File "/home/user/anaconda2/envs/lib/python2.7/site-packages/plastid/bin/psite.py", line 157, in do_count
roi = SegmentChain.from_str(row["region"])
File "plastid/genomics/roitools.pyx", line 3311, in plastid.genomics.roitools.SegmentChain.from_str
ValueError: too many values to unpack (expected 2)

I've found this error when running Plastid 0.5.1, went back to Plastid 0.4.7 and still happened. I have installed plastid through conda, and this is my environment:

_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
biopython 1.77 pypi_0 pypi
bowtie 1.3.0 py38hed8969a_1 bioconda
bzip2 1.0.8 h516909a_3 conda-forge
c-ares 1.11.0 h470a237_1 bioconda
ca-certificates 2020.6.20 hecda079_0 conda-forge
certifi 2020.6.20 py38h924ce5b_2 conda-forge
cycler 0.10.0 py_2 conda-forge
cython 0.29.21 py38h348cfbe_1 conda-forge
fastx_toolkit 0.0.14 0 bioconda
freetype 2.10.4 he06d7ca_0 conda-forge
jpeg 9d h516909a_0 conda-forge
kiwisolver 1.2.0 py38hbf85e49_1 conda-forge
krb5 1.17.1 hfafb76e_3 conda-forge
lcms2 2.11 hbd6801e_0 conda-forge
ld_impl_linux-64 2.35 h769bd43_9 conda-forge
libblas 3.9.0 2_openblas conda-forge
libcblas 3.9.0 2_openblas conda-forge
libcurl 7.71.1 hcdd3856_8 conda-forge
libdeflate 1.6 h516909a_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libffi 3.2.1 he1b5a44_1007 conda-forge
libgcc-ng 9.3.0 h5dbcf3e_17 conda-forge
libgfortran-ng 9.3.0 he4bcb1c_17 conda-forge
libgfortran5 9.3.0 he4bcb1c_17 conda-forge
libgomp 9.3.0 h5dbcf3e_17 conda-forge
libgtextutils 0.7 he1b5a44_6 bioconda
liblapack 3.9.0 2_openblas conda-forge
libnghttp2 1.41.0 hab1572f_1 conda-forge
libopenblas 0.3.12 pthreads_h4812303_1 conda-forge
libpng 1.6.37 hed695b0_2 conda-forge
libssh2 1.9.0 hab1572f_5 conda-forge
libstdcxx-ng 9.3.0 h2ae2ef3_17 conda-forge
libtiff 4.1.0 hc7e4089_6 conda-forge
libwebp-base 1.1.0 h516909a_3 conda-forge
lz4-c 1.9.2 he1b5a44_3 conda-forge
matplotlib-base 3.3.2 py38h4d1ce4f_1 conda-forge
ncurses 6.2 he1b5a44_2 conda-forge
nose 1.3.7 py_1006 conda-forge
numpy 1.19.2 py38hf89b668_1 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openssl 1.1.1h h516909a_0 conda-forge
pandas 1.1.3 py38hddd6c8b_2 conda-forge
perl 5.30.3 h516909a_1 conda-forge
pillow 8.0.1 py38h9776b28_0 conda-forge
pip 20.2.4 py_0 conda-forge
plastid 0.5.1 py38h197edbe_1 bioconda
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pysam 0.16.0.1 py38hbdc2ae9_1 bioconda
python 3.8.6 h852b56e_0_cpython conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.8 1_cp38 conda-forge
pytz 2020.1 pyh9f0ad1d_0 conda-forge
readline 8.0 he28a2e2_2 conda-forge
scipy 1.5.2 py38hd9480d8_2 conda-forge
setuptools 49.6.0 py38h924ce5b_2 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
sqlite 3.33.0 h4cf870e_1 conda-forge
tbb 2020.2 hc9558a2_0 conda-forge
termcolor 1.1.0 py_2 conda-forge
tk 8.6.10 hed695b0_1 conda-forge
tornado 6.0.4 py38h1e0a361_2 conda-forge
twobitreader 3.1.7 pyh864c0ab_1 bioconda
wheel 0.35.1 pyh9f0ad1d_0 conda-forge
xz 5.2.5 h516909a_1 conda-forge
zlib 1.2.11 h516909a_1010 conda-forge
zstd 1.4.5 h6597ccf_2 conda-forge

Thank you!

installation problems

Dear all,

I am having problems with the installation and the solutions proposed in the documentation didn't work so far. Your help will be really appreciated.

1st attempt: installing plastid with pip install gave me the following error:
"ERROR: Could not build wheels for plastid, which is required to install pyproject.toml-based projects"
python version: Python 3.7.10; pip version: pip version 22.0.4; Linux distribution: "CentOS Linux 7 (Core)"

2nd attempt: installing plastid in a new conda environment gave me this error:
PackagesNotFoundError: The following packages are not available from current channels:

3rd attempt: installing plastid in a vanilla environment (virtualenv) so that I make sure that there are no conflicts with other packages etc. "ERROR: Could not build wheels for plastid, which is required to install pyproject.toml-based projects"

Is there anything I am missing?? any system library or something like that? a problem with the gcc compiler?

Thank you very much in advance!

Find below the log of one of the installation attempts.

###########################################################################################
Using pip 22.0.4 from /home/lmateo/miniconda3/envs/orfrater/lib/python3.7/site-packages/pip (python 3.7)
Collecting plastid
Downloading plastid-0.6.1.tar.gz (1.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 8.5 MB/s eta 0:00:00
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Installing backend dependencies: started
Installing backend dependencies: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Collecting pysam>=0.8.4
Downloading pysam-0.19.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.0/15.0 MB 58.4 MB/s eta 0:00:00
Collecting twobitreader>=3.0.0
Downloading twobitreader-3.1.7.tar.gz (9.2 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting biopython>=1.64
Downloading biopython-1.79-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (2.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 12.2 MB/s eta 0:00:00
Link requires a different Python (3.7.10 not in: '>=3.8,<3.11'): https://files.pythonhosted.org/packages/c0/ad/e3c052ed4e0027a8abef0a5e8441a044427d252d17d9aee06d56e62fc698/scipy-1.8.0rc1.tar.gz#sha256=54adf5c1197d6c3de2e131dc71660bb11d4e449aff79c8c231bd05dc6ad307eb (from https://pypi.org/simple/scipy/) (requires-python:>=3.8,<3.11)
Link requires a different Python (3.7.10 not in: '>=3.8,<3.11'): https://files.pythonhosted.org/packages/29/d2/151a54944b333e465f98804dced31dab1284f3c37b752b9cefa710b64681/scipy-1.8.0rc2.tar.gz#sha256=d73b13eb0452c178f946b4db60b27e400225df02e926609652ed67798054e77d (from https://pypi.org/simple/scipy/) (requires-python:>=3.8,<3.11)
Link requires a different Python (3.7.10 not in: '>=3.8,<3.11'): https://files.pythonhosted.org/packages/e4/26/83dd1c6378513a6241d984bda9f08c512b6e35fff13fba3acc1b3c195f02/scipy-1.8.0rc3.tar.gz#sha256=7e4f827b37275b1264102f2cc063ba17a40fb858af7a98f63b3b1be2670a34b1 (from https://pypi.org/simple/scipy/) (requires-python:>=3.8,<3.11)
Link requires a different Python (3.7.10 not in: '>=3.8,<3.11'): https://files.pythonhosted.org/packages/22/78/056cc43e7737811b6f50886788a940f852773dd9804f5365952805db9648/scipy-1.8.0rc4.tar.gz#sha256=bfa5b17b108203c31388d362fe666b96c2f0aa1a094d460aab85440c22e6f22f (from https://pypi.org/simple/scipy/) (requires-python:>=3.8,<3.11)
Link requires a different Python (3.7.10 not in: '>=3.8,<3.11'): https://files.pythonhosted.org/packages/b4/a2/4faa34bf0cdbefd5c706625f1234987795f368eb4e97bde9d6f46860843e/scipy-1.8.0.tar.gz#sha256=31d4f2d6b724bc9a98e527b5849b8a7e589bf1ea630c33aa563eda912c9ff0bd (from https://pypi.org/simple/scipy/) (requires-python:>=3.8,<3.11)
Link requires a different Python (3.7.10 not in: '>=3.8,<3.11'): https://files.pythonhosted.org/packages/26/b5/9330f004b9a3b2b6a31f59f46f1617ce9ca15c0e7fe64288c20385a05c9d/scipy-1.8.1.tar.gz#sha256=9e3fb1b0e896f14a85aa9a28d5f755daaeeb54c897b746df7a55ccb02b340f33 (from https://pypi.org/simple/scipy/) (requires-python:>=3.8,<3.11)
Collecting scipy>=0.15.1
Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.1/38.1 MB 40.1 MB/s eta 0:00:00
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/29/e3/6bd596d81eaf9f5b35398fdac0c535efadd9bbf8d0f859739badf9f90c63/pandas-1.4.0rc0.tar.gz#sha256=c0d453fda0a87d51f5fe65c16a89b64f13a736f4f17c0202cfcff67e6b341a57 (from https://pypi.org/simple/pandas/) (requires-python:>=3.8)
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/4d/aa/e7078569d20f45e8cf6512a24bf2945698f13a7975650773c01366ea96dc/pandas-1.4.0.tar.gz#sha256=cdd76254c7f0a1583bd4e4781fb450d0ebf392e10d3f12e92c95575942e37df5 (from https://pypi.org/simple/pandas/) (requires-python:>=3.8)
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/c4/eb/cfa96ba42695b3c28d4864a796d492f188471dd536df7e5e5e0c54b629a6/pandas-1.4.1.tar.gz#sha256=8db93ec98ac7cb5f8ac1420c10f5e3c43533153f253fe7fb6d891cf5aa2b80d2 (from https://pypi.org/simple/pandas/) (requires-python:>=3.8)
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/5a/ac/b3b9aa2318de52e40c26ae7b9ce6d4e9d1bcdaf5da0899a691642117cf60/pandas-1.4.2.tar.gz#sha256=92bc1fc585f1463ca827b45535957815b7deb218c549b7c18402c322c7549a12 (from https://pypi.org/simple/pandas/) (requires-python:>=3.8)
Collecting pandas>=0.17.0
Downloading pandas-1.3.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 37.2 MB/s eta 0:00:00
Collecting cython>=0.22.0
Downloading Cython-0.29.30-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 43.7 MB/s eta 0:00:00
Collecting termcolor
Downloading termcolor-1.1.0.tar.gz (3.9 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting matplotlib>=1.4.0
Downloading matplotlib-3.5.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.2/11.2 MB 39.1 MB/s eta 0:00:00
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/7a/7f/5384d4c85a2349bd89ff1c0253bff77c7b617e48af201f2d823fc619189a/numpy-1.22.0rc1.zip#sha256=bc991b3f8ea7c0f6703df2bc23c098cfe6f1a3a5e8a3a901eb6a5619275d53ff (from https://pypi.org/simple/numpy/) (requires-python:>=3.8)
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/a1/4a/8fd68d9a0a9fe5419cdbb697ffc96259fb716bd952c5c64225991add7551/numpy-1.22.0rc2.zip#sha256=01810dc32c5ac4c895b5c0d285497e1eb52038834919f3d2eaddfb9526b20dc9 (from https://pypi.org/simple/numpy/) (requires-python:>=3.8)
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/17/d4/d3ac79fca81154e8d3e11232967024f480cfdc87dfeef479803716fb20cb/numpy-1.22.0rc3.zip#sha256=0b5642efe2a36f2191102b44bb95ee1479f14c1adb2d7155303e50b2517e43bc (from https://pypi.org/simple/numpy/) (requires-python:>=3.8)
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/50/e1/9b0c184f04b8cf5f3c941ffa56fbcbe936888bdac9aa7ba6bae405ac752b/numpy-1.22.0.zip#sha256=a955e4128ac36797aaffd49ab44ec74a71c11d6938df83b1285492d277db5397 (from https://pypi.org/simple/numpy/) (requires-python:>=3.8)
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/0a/c8/a62767a6b374a0dfb02d2a0456e5f56a372cdd1689dbc6ffb6bf1ddedbc0/numpy-1.22.1.zip#sha256=e348ccf5bc5235fc405ab19d53bec215bb373300e5523c7b476cc0da8a5e9973 (from https://pypi.org/simple/numpy/) (requires-python:>=3.8)
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/e9/6c/c0a8130fe198f27bab92f1b28631e0cc2572295f6b7a31e87efe7448aa1c/numpy-1.22.2.zip#sha256=076aee5a3763d41da6bef9565fdf3cb987606f567cd8b104aded2b38b7b47abf (from https://pypi.org/simple/numpy/) (requires-python:>=3.8)
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/64/4a/b008d1f8a7b9f5206ecf70a53f84e654707e7616a771d84c05151a4713e9/numpy-1.22.3.zip#sha256=dbc7601a3b7472d559dc7b933b18b4b66f9aa7452c120e87dfb33d02008c8a18 (from https://pypi.org/simple/numpy/) (requires-python:>=3.8)
Link requires a different Python (3.7.10 not in: '>=3.8'): https://files.pythonhosted.org/packages/f6/d8/ab692a75f584d13c6542c3994f75def5bce52ded9399f52e230fe402819d/numpy-1.22.4.zip#sha256=425b390e4619f58d8526b3dcf656dde069133ae5c240229821f01b5f44ea07af (from https://pypi.org/simple/numpy/) (requires-python:>=3.8)
Collecting numpy>=1.9.4
Downloading numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.7/15.7 MB 42.2 MB/s eta 0:00:00
Collecting python-dateutil>=2.7
Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 KB 102.7 MB/s eta 0:00:00
Collecting packaging>=20.0
Downloading packaging-21.3-py3-none-any.whl (40 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.8/40.8 KB 53.7 MB/s eta 0:00:00
Collecting cycler>=0.10
Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting fonttools>=4.22.0
Downloading fonttools-4.33.3-py3-none-any.whl (930 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 930.9/930.9 KB 66.3 MB/s eta 0:00:00
Collecting kiwisolver>=1.0.1
Downloading kiwisolver-1.4.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 50.2 MB/s eta 0:00:00
Collecting pillow>=6.2.0
Downloading Pillow-9.1.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 47.6 MB/s eta 0:00:00
Collecting pyparsing>=2.2.1
Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.3/98.3 KB 90.8 MB/s eta 0:00:00
Collecting pytz>=2017.3
Downloading pytz-2022.1-py2.py3-none-any.whl (503 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 503.5/503.5 KB 50.5 MB/s eta 0:00:00
Collecting typing-extensions
Downloading typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting six>=1.5
Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Building wheels for collected packages: plastid, twobitreader, termcolor
Building wheel for plastid (pyproject.toml): started
Building wheel for plastid (pyproject.toml): finished with status 'error'
Building wheel for twobitreader (setup.py): started
Building wheel for twobitreader (setup.py): finished with status 'done'
Created wheel for twobitreader: filename=twobitreader-3.1.7-py3-none-any.whl size=9602 sha256=48feed40c805dbbcffafd2fcbc7731b1b7b30f4b3ee9e461f75cd207f46827ae
Stored in directory: /tmp/pip-ephem-wheel-cache-_hfd3tv8/wheels/d7/4f/70/282382e10dc13fd946f157888d9652437b01e3ac4541ac4b4b
Building wheel for termcolor (setup.py): started
Building wheel for termcolor (setup.py): finished with status 'done'
Created wheel for termcolor: filename=termcolor-1.1.0-py3-none-any.whl size=4848 sha256=31dede504b8826f170c8c0b1e06eb37fc09f3a70ce4b543f7beae55ea0fb5050
Stored in directory: /tmp/pip-ephem-wheel-cache-_hfd3tv8/wheels/3f/e3/ec/8a8336ff196023622fbcb36de0c5a5c218cbb24111d1d4c7f2
Successfully built twobitreader termcolor
Failed to build plastid

Requesting help on how to exclude codons from rfp density measurement..

Hi,
First - thank you so much for sharing this tool! I'm relatively new to the data analysis world, and your tool helped me understand how to deal with my ribosome profiling results.
There is one problem I think I don't understand yet how to approach - specifically, I would like to count ribosome footprints over CDS, but effectively exclude several first and last codons. You do mention this briefly in your tutorial, however I'm not sure how to go about that - should I basically try creating a mask file that would mask the codons for all the transcripts, then use it during "cs generate"? I saw the example on the masking page for creating a mask for codons for a singe transcript in the demo bed file - so I supposed there should be a way to apply this to the whole bed/gtf file?
Thank you,
PC

Plastid 0.4.7 completing installation but not importable on OSX

Two users have reported that Plastid passes installation, but fails during runtime on OSX. This appears to be conda-independent, and occurs even when installing inside virtualenv. After a successful install, import plastid yields:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/username/env/lib/python2.7/site-packages/plastid/__init__.py", line 40, in <module>
    from plastid.genomics.roitools import (GenomicSegment,
ImportError: dlopen(/Users/username/env/lib/python2.7/site-packages/plastid/genomics/roitools.so, 2): Library not loaded: @rpath/pysam/libctabixproxies.so
  Referenced from: /Users/username/env/lib/python2.7/site-packages/plastid/genomics/roitools.so
  Reason: image not found

Collecting more information

single gene for phase_by_size

Hi,

First I would like to thank you for providing this useful and easy-to-use pipeline. It is a great tool for a biologist like me who has limit ability to write codes. I have used metagene and phase_by_size commands and both work well (maybe a bug with UTR in metagene but I fixed it with a pseudo annotation).

I wonder if you can add a feature to phase_by_size to pull out the phasing data for each gene instead of metagene info grouped by read sizes. That would be super helpful for me to look at frame shifting at single-gene level since I want to check if any gene is frame shifted in my yeast mutants and it cannot be seen in metagene data.

Thank you very much!

Using 'centered' for read mapping - why and when?

Hi Joshua,

Thanks for helping with my previous post re: mapfactories not working. Your workaround was great. This is less of an 'issue' and hopefully more of a discussion.

My main question is why and when to use 'centered' based read mapping. Following along in your docs, it appears this normalization is used for RNA-seq data specifically, when manipulating mapping dynamics for downstream analyses.

My next question is under what circumstances would you want to 'nibble' reads as you mapped them? The only thing I can come up with is to increase your signal after reducing it substantially with fractional counting mandated by 'centered' mapping.

Lastly, and a bit of a switch in topics, what would your workflow look like to identify the proportion of various start sites used in two different conditions in riboseq data, especially if alt start sites are located in 5' UTRs? I know that plastid likely contains all of the necessary tools to ask this question, but I am having trouble conceptualizing where to begin.

My best guess is to use the segment chain and genomic segment functionality as well as the 'get_counts' method combined with some type of sub-setting command to specifically target AUGs, NUGs, etc. If you could give me a big picture idea here, that would be super helpful!

Thanks for your time and attention. Really enjoying learning how to use Plastid and all of its functionality!

Respectfully,
Brad

Crossmap fails on hg38 unplaced and unlocalized scaffolds

Hello,

I'm having an issue running crossmap on the human genome, specifically it seems like when it gets to the unlocalized or unplaced scaffolds. I have limited the dataset by not using the "alt" versions of the chromosomes, but these other scaffolds seem to be having issues within the program causing it to exit out without a response. Have you encountered this problem?

Grant

Error when converting GFF3 to GTF2

reformat_transcripts --annotation_files Drosophila_melanogaster.BDGP6.28.100.sorted.gff3 --annotation_format GFF3 --sorted --output_format GTF2 Drosophila_melanogaster.BDGP6.28.100.sorted.gtf2
Traceback (most recent call last):
File "###/opt/anaconda3/envs/plastid/bin/reformat_transcripts", line 11, in
load_entry_point('plastid==0.4.7', 'console_scripts', 'reformat_transcripts')()
File "###/opt/anaconda3/envs/plastid/lib/python3.6/site-packages/pkg_resources/init.py", line 488, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "###/opt/anaconda3/envs/plastid/lib/python3.6/site-packages/pkg_resources/init.py", line 2861, in load_entry_point
return ep.load()
File "###/opt/anaconda3/envs/plastid/lib/python3.6/site-packages/pkg_resources/init.py", line 2461, in load
return self.resolve()
File "###/opt/anaconda3/envs/plastid/lib/python3.6/site-packages/pkg_resources/init.py", line 2467, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "###/opt/anaconda3/envs/plastid/lib/python3.6/site-packages/plastid/init.py", line 44, in
from plastid.genomics.genome_array import (BAMGenomeArray,
File "###/opt/anaconda3/envs/plastid/lib/python3.6/site-packages/plastid/genomics/genome_array.py", line 215, in
from plastid.genomics.map_factories import *
File "libchtslib.pxd", line 2530, in init plastid.genomics.map_factories (/anaconda/conda-bld/plastid_1494346258083/work/plastid-0.4.7/plastid/genomics/map_factories.c:26281)
ValueError: pysam.libchtslib.HTSFile has the wrong size, try recompiling. Expected 88, got 80

fatal error: 'stdio.h' file not found

"sudo pip install numpy pysam cython" command fails with the following error,
....
......
copying htslib/htslib/regidx.h -> build/lib.macosx-10.10-intel-2.7/pysam/include/htslib/htslib
copying htslib/htslib/sam.h -> build/lib.macosx-10.10-intel-2.7/pysam/include/htslib/htslib
copying htslib/htslib/synced_bcf_reader.h -> build/lib.macosx-10.10-intel-2.7/pysam/include/htslib/htslib
copying htslib/htslib/tbx.h -> build/lib.macosx-10.10-intel-2.7/pysam/include/htslib/htslib
copying htslib/htslib/thread_pool.h -> build/lib.macosx-10.10-intel-2.7/pysam/include/htslib/htslib
copying htslib/htslib/vcf.h -> build/lib.macosx-10.10-intel-2.7/pysam/include/htslib/htslib
copying htslib/htslib/vcf_sweep.h -> build/lib.macosx-10.10-intel-2.7/pysam/include/htslib/htslib
copying htslib/htslib/vcfutils.h -> build/lib.macosx-10.10-intel-2.7/pysam/include/htslib/htslib
running build_ext
building 'pysam.libchtslib' extension
creating build/temp.macosx-10.10-intel-2.7
creating build/temp.macosx-10.10-intel-2.7/pysam
creating build/temp.macosx-10.10-intel-2.7/htslib
creating build/temp.macosx-10.10-intel-2.7/htslib/cram
cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -Ipysam -I. -Ihtslib -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c pysam/libchtslib.c -o build/temp.macosx-10.10-intel-2.7/pysam/libchtslib.o -Wno-unused -Wno-strict-prototypes -Wno-sign-compare -Wno-error=declaration-after-statement
In file included from pysam/libchtslib.c:4:
/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7/Python.h:33:10: fatal error: 'stdio.h' file not found
#include <stdio.h>
^
1 error generated.
error: command 'cc' failed with exit status 1

----------------------------------------

Command "/usr/bin/python -u -c "import setuptools, tokenize;file='/private/tmp/pip-build-qWGO4X/pysam/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-tPjAoV-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/tmp/pip-build-qWGO4X/pysam/

PLEASE SEE

Plastid now available through bioconda

I've created a bioconda recipe for Plastid 0.4.7.

Hopefully this helps users of conda to more easily install and use Plastid. If you'd like, I'd encourage you to include a bioconda badge with the following Markdown in the Readme to help point users to Bioconda.

[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat-square)](http://bioconda.github.io/recipes/plastid/README.html)

I'd be happy to submit a PR with that (and possibly other) documentation updates regarding bioconda.

‘phase_by_size’ command-line script problem with Numpy(1.18.1)

Hello All,

This script does not run on my new linux box (plastid 0.4.8 py37h84994c4_4 bioconda; numpy 1.18.1) but runs on my old linux box (plastid 0.4.8 py27_0 bioconda; numpy 1.14.1) when modified for the directory structure there. It gives the float object error for phase_by_size - and another also another error - on the new box. Advice gratefully appreciated. Thank you.

Script:

for i in *Coord.out.bam
do samtools index $i
done

for i in Coord.out.bam
do psite /media/mrbmk1000/data/ms-pool01/annotationdb/cneoformans/FungDB-38_Cneoformans_cds_start200_rois.txt ${i%.} --min_length 16 --max_length 34 --require_upstream --count_files $i
done

for i in Coord.out.bam
do phase_by_size /media/mrbmk1000/data/ms-pool01/annotationdb/cneoformans/FungDB-38_Cneoformans_cds_start200_rois.txt ${i%.}
--count_files $i
--fiveprime --offset 12
--codon_buffer 5
--min_length 16 --max_length 34
done

New linux box:

(plastid) mrbmk1000@rbmk1000-home-linux:/media/mrbmk1000/data/ms-pool01/C_neoformans/ribo-seq/phasing$ bash ms-pool01_lazy_psites_cneoformans-200min16max34.sh*

RuntimeWarning
pysam.libcalignedsegment.PileupColumn size changed, may indicate binary
incompatibility. Expected 72 from C header, got 88 from PyObject
in /home/mrbmk1000/miniconda3/envs/plastid/lib/python3.7/importlib/_bootstrap.py, line 219:

217 module code)
218 """
219 return f(*args, **kwds)

psite [2020-04-16 16:24:55]: Opening ROI file /media/mrbmk1000/data/ms-pool01/annotationdb/cneoformans/FungDB-38_Cneoformans_cds_start200_rois.txt ...
psite [2020-04-16 16:24:55]: Opening count files rbmk1000-4-59-1_S1_L00X_R1_001_trimmed_811_final_Aligned.sortedByCoord.out.bam ...
psite [2020-04-16 16:24:55]: Counted 17318765 total reads.
psite [2020-04-16 16:24:55]: Counted 1 ROIs ...
psite [2020-04-16 16:24:58]: Counted 1001 ROIs ...
psite [2020-04-16 16:25:01]: Counted 2001 ROIs ...
psite [2020-04-16 16:25:03]: Counted 3001 ROIs ...
psite [2020-04-16 16:25:07]: Counted 4001 ROIs ...
psite [2020-04-16 16:25:10]: Counted 5001 ROIs ...
psite [2020-04-16 16:25:12]: Counted 6001 ROIs ...
psite [2020-04-16 16:25:14]: Counted 6841 ROIs total.
psite [2020-04-16 16:25:15]: Plotting and determining offsets ...
psite [2020-04-16 16:25:15]: Writing offset table to rbmk1000-4-59-1_S1_L00X_R1_001_trimmed_811_final_Aligned.sortedByCoord.out_p_offsets.txt ...
psite [2020-04-16 16:25:15]: Saving plot to rbmk1000-4-59-1_S1_L00X_R1_001_trimmed_811_final_Aligned.sortedByCoord.out_p_offsets.png ...
psite [2020-04-16 16:25:15]: Done.

217 module code)
218 """
219 return f(*args, **kwds)

phase_by_size [2020-04-16 16:25:16]: Counted 17318765 total reads.
phase_by_size [2020-04-16 16:25:16]: Counted 1 ROIs ...
Traceback (most recent call last):
File "/home/mrbmk1000/miniconda3/envs/plastid/bin/phase_by_size", line 11, in
load_entry_point('plastid==0.4.8', 'console_scripts', 'phase_by_size')()
File "/home/mrbmk1000/miniconda3/envs/plastid/lib/python3.7/site-packages/plastid/bin/phase_by_size.py", line 190, in main
counts = counts.reshape((len(counts)/3,3))
TypeError: 'float' object cannot be interpreted as an integer

Script running fine on old linux box with same data:

bp-data01@bp-data01:/bp-pool01/c_neoformans/ribo_seq/phasing$ ./lazy_psites_cneoformans-200min16max34.sh*
psite [2020-04-11 15:01:32]: Opening ROI file /bp-pool01/annotationdb/cneoformans/FungDB-38_Cneoformans_cds_start200_rois.txt ...
psite [2020-04-11 15:01:33]: Opening count files rbmk1000-4-59-1_S1_L00X_R1_001_trimmed_811_final_Aligned.sortedByCoord.out.bam ...
psite [2020-04-11 15:01:33]: Counted 17318765 total reads.
psite [2020-04-11 15:01:33]: Counted 1 ROIs ...
psite [2020-04-11 15:01:40]: Counted 1001 ROIs ...
psite [2020-04-11 15:01:48]: Counted 2001 ROIs ...
psite [2020-04-11 15:01:54]: Counted 3001 ROIs ...
psite [2020-04-11 15:02:08]: Counted 4001 ROIs ...
psite [2020-04-11 15:02:14]: Counted 5001 ROIs ...
psite [2020-04-11 15:02:22]: Counted 6001 ROIs ...
psite [2020-04-11 15:02:28]: Counted 6841 ROIs total.
psite [2020-04-11 15:02:34]: Plotting and determining offsets ...
psite [2020-04-11 15:02:34]: Writing offset table to rbmk1000-4-59-1_S1_L00X_R1_001_trimmed_811_final_Aligned.sortedByCoord.out_p_offsets.txt ...
psite [2020-04-11 15:02:34]: Saving plot to rbmk1000-4-59-1_S1_L00X_R1_001_trimmed_811_final_Aligned.sortedByCoord.out_p_offsets.png ...
psite [2020-04-11 15:02:34]: Done.
phase_by_size [2020-04-11 15:02:35]: Counted 17318765 total reads.
phase_by_size [2020-04-11 15:02:35]: Counted 1 ROIs ...
phase_by_size [2020-04-11 15:02:39]: Counted 1001 ROIs ...
phase_by_size [2020-04-11 15:02:44]: Counted 2001 ROIs ...
phase_by_size [2020-04-11 15:02:48]: Counted 3001 ROIs ...
phase_by_size [2020-04-11 15:02:56]: Counted 4001 ROIs ...
phase_by_size [2020-04-11 15:03:00]: Counted 5001 ROIs ...
phase_by_size [2020-04-11 15:03:04]: Counted 6001 ROIs ...
phase_by_size [2020-04-11 15:03:08]: Counted 6841 ROIs total.
phase_by_size [2020-04-11 15:03:08]: Saving phasing table to rbmk1000-4-59-1_S1_L00X_R1_001_trimmed_811_final_Aligned.sortedByCoord.out_phasing.txt ...
phase_by_size [2020-04-11 15:03:08]: Plotting to rbmk1000-4-59-1_S1_L00X_R1_001_trimmed_811_final_Aligned.sortedByCoord.out_phasing.png ...

No module name `plastid`

Hello,
I installed plastid, have updated numpy, pysam and Cython.
when in python I type import plastid I always get this message:
Traceback (most recent call last):
File "< stdin >", line 1, in < module >
ImportError: No module named 'plastid'

What`s wrong and what can I do with it?
Thanks.

Compatilibity update for Py3.7: fix generator implementation

Running unit tests under python3.7 elicits this bug, caused by Python 3.7's implementation of PEP 479, which converts StopIteration exceptions in generators — which, until and including Python 3.6 were the proper way to signal termination of a generator — into RuntimeError instances.

PEP479 proposes a resolution, of wrapping basically everything in try-except blocks.

Incompatibility with Pysam 0.10.0: Cythonize error while compiling map_factories.pyx

Hi Josh,

i am trying to install plastid in an Ubuntu 14.04.5 environment using python 2.7. All dependencies are installed correctly to my knowledge. During installation the setup fails because the cython compiling of map_factories fails.

kevinklann@pqc-Virtual-Machine:~/Downloads/plastid-0.4.6$ python setup.py install
running install
Could not find .c files. Regenerating via recythonize.
Cythonizing
running recythonize
running clean
clean_c_files: removing /home/kevinklann/Downloads/plastid-0.4.6/plastid/genomics/map_factories.c ...
clean_c_files: removing /home/kevinklann/Downloads/plastid-0.4.6/plastid/genomics/c_common.c ...
build_c_from_pyx: regenerating .c files from Cython
/home/kevinklann/Downloads/plastid-0.4.6/plastid/genomics/map_factories.pyx: cannot find cimported module 'pysam.calignmentfile'
Compiling /home/kevinklann/Downloads/plastid-0.4.6/plastid/genomics/map_factories.pyx because it changed.
Compiling /home/kevinklann/Downloads/plastid-0.4.6/plastid/genomics/roitools.pyx because it changed.
Compiling /home/kevinklann/Downloads/plastid-0.4.6/plastid/genomics/c_common.pyx because it changed.
Compiling plastid/readers/bbifile.pyx because it changed.
Compiling plastid/readers/bigwig.pyx because it changed.
Compiling plastid/readers/bigbed.pyx because it changed.
[1/6] Cythonizing /home/kevinklann/Downloads/plastid-0.4.6/plastid/genomics/c_common.pyx
[2/6] Cythonizing /home/kevinklann/Downloads/plastid-0.4.6/plastid/genomics/map_factories.pyx

Error compiling Cython file:
------------------------------------------------------------
...

import numpy as np
cimport numpy as np
cimport cython

from pysam.calignmentfile cimport AlignedSegment
^
------------------------------------------------------------

plastid/genomics/map_factories.pyx:134:0: 'pysam/calignmentfile.pxd' not found

Error compiling Cython file:
------------------------------------------------------------
...

import numpy as np
cimport numpy as np
cimport cython

from pysam.calignmentfile cimport AlignedSegment
^
------------------------------------------------------------

plastid/genomics/map_factories.pyx:134:0: 'pysam/calignmentfile/AlignedSegment.pxd' not found

Error compiling Cython file:
------------------------------------------------------------
...
            raise ValueError("Alignment size filter: min read length must be >= 1. Got %s" % min)

        self.min_ = min
        self.max_ = max

    def __call__(self,AlignedSegment read not None):
                     ^
------------------------------------------------------------

plastid/genomics/map_factories.pyx:812:22: 'AlignedSegment' is not a type identifier

Error compiling Cython file:
------------------------------------------------------------
...
            raise ValueError("Alignment size filter: min read length must be >= 1. Got %s" % min)

        self.min_ = min
        self.max_ = max

    def __call__(self,AlignedSegment read not None):
                     ^
------------------------------------------------------------

plastid/genomics/map_factories.pyx:812:22: Only Python type arguments can have 'not None'

Error compiling Cython file:
------------------------------------------------------------
...

        cdef list reads_out = []
        cdef list read_positions

        cdef:
            AlignedSegment read
           ^
------------------------------------------------------------

plastid/genomics/map_factories.pyx:233:12: 'AlignedSegment' is not a type identifier

Error compiling Cython file:
------------------------------------------------------------
...
            np.ndarray[LONG_t,ndim=1] count_array = np.zeros(seg_len,dtype=LONG)
            long [:] count_view = count_array

            list reads_out = []
            list read_positions
            AlignedSegment read
           ^
------------------------------------------------------------

plastid/genomics/map_factories.pyx:328:12: 'AlignedSegment' is not a type identifier

Error compiling Cython file:
------------------------------------------------------------
...
            np.ndarray[LONG_t,ndim=1] count_array = np.zeros(seg_len,dtype=LONG)
            long [:] count_view = count_array

            list reads_out = []
            list read_positions
            AlignedSegment read
           ^
------------------------------------------------------------

plastid/genomics/map_factories.pyx:422:12: 'AlignedSegment' is not a type identifier

Error compiling Cython file:
------------------------------------------------------------
...
            int no_offset_length
            int do_no_offset_warning = 0

            int            read_length, offset
            long           p_site
            AlignedSegment read
           ^
------------------------------------------------------------

plastid/genomics/map_factories.pyx:596:12: 'AlignedSegment' is not a type identifier

Error compiling Cython file:
------------------------------------------------------------
...
            int min_length             = self.min_length

            list reads_out             = []
            int read_length, offset, p_site

            AlignedSegment read
           ^
------------------------------------------------------------

plastid/genomics/map_factories.pyx:734:12: 'AlignedSegment' is not a type identifier
warning: plastid/genomics/map_factories.pyx:523:35: Index should be typed for more efficient access
Traceback (most recent call last):
  File "setup.py", line 591, in <module>
    test_suite        = "nose.collector",
  File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
    dist.run_commands()
  File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "setup.py", line 437, in run
    self.run_command(CYTHONIZE_COMMAND)
  File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python2.7/distutils/dist.py", line 971, in run_command
    cmd_obj.ensure_finalized()
  File "/usr/lib/python2.7/distutils/cmd.py", line 109, in ensure_finalized
    self.finalize_options()
  File "setup.py", line 485, in finalize_options
    extensions = cythonize(ext_modules,compiler_directives=CYTHON_ARGS)
  File "/usr/local/lib/python2.7/dist-packages/Cython/Build/Dependencies.py", line 934, in cythonize
    cythonize_one(*args)
  File "/usr/local/lib/python2.7/dist-packages/Cython/Build/Dependencies.py", line 1056, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: /home/kevinklann/Downloads/plastid-0.4.6/plastid/genomics/map_factories.pyx

The same is true when i try to install it via pip.

Hope you can help me.

Cheers,

Kevin

Supported opposite-strand protocols (for metagene)

Hi Plastid team,

I have found a lot of the functionality provided by plastid very useful, but I got frustratingly stuck today when I was trying to run metagene count on data originating from a stranded protocol that produces reads on the opposite of the original strand. I know in your FAQ you suggest reverse-complementing the reads and re-aligning them, but honestly I feel like this is a lot more work than necessary, especially for a big project with many samples.

This issue was almost a deal-breaker for me, but then I realized I could get the same effect by simply reversing the strand of all of my window annotations in my ROI file. I did it with sed, but it would seem to be a pretty straightforward feature to add as a parameter in the actual command, either at the stage of generating the ROI or on the fly during counting. Such a feature would be a relatively easy way to expand usage of plastid by supporting a large number of users with this type of data.

Thanks!
Sarah

numpy 1.24.1 and plastid 0.6.1 appear to have an issue due to dropping of support for numpy.int by numpy.

Hello,

In installing a conda environment with plastid for a customer (python 3.9.15, plastid 0.6.1, numpy 1.24.1) my customer found that their test "metagene generate -h" failed with "AttributeError: module 'numpy' has no attribute 'int'".

From searching online copycode.org indicates that as of numpy 1.24 support for numpy.float and numpy.int were removed. I've downgraded the conda numpy to 1.23.5 and am waiting to hear back from my customer whether their test is successful with the downgraded version.

I'm not a developer, but I'm guessing it might warrant setting a requirement for plastid/0.6.1 for numpy to be <= 1.23.5.

Thank you.

the system cannot find the file specified. Please help!

(C:\Users\sunny\Anaconda3) C:\Users\sunny\Documents>cd surprise

(C:\Users\sunny\Anaconda3) C:\Users\sunny\Documents\surprise>python setup.py install
C:\Users\sunny\Anaconda3\lib\site-packages\setuptools\dist.py:360: UserWarning: The version specified ('latest') is an invalid version, this may not work as expected with newer versions of setuptools, pip, and PyPI. Please see PEP 440 for more details.
"details." % self.metadata.version
running install
running bdist_egg
running egg_info
writing scikit_surprise.egg-info\PKG-INFO
writing dependency_links to scikit_surprise.egg-info\dependency_links.txt
writing entry points to scikit_surprise.egg-info\entry_points.txt
writing requirements to scikit_surprise.egg-info\requires.txt
writing top-level names to scikit_surprise.egg-info\top_level.txt
reading manifest file 'scikit_surprise.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'scikit_surprise.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_py
running build_ext
error: [WinError 2] The system cannot find the file specified

Problem install plastid 'no attribute 'PileupColumn'

Hello,

I tried to install plastid in environment conda whith pysam, cython et numpy.

When I installed with pip3 (python3.8) the plastid package (v0.5.1) everithing was fine.
But when I try to used this package, I have this error
File "/usr/local/lib/python3.8/dist-packages/plastid/genomics/genome_array.py", line 215, in
from plastid.genomics.map_factories import *
File "plastid/genomics/map_factories.pyx", line 1, in init plastid.genomics.map_factories
AttributeError: module 'pysam.libcalignmentfile' has no attribute 'PileupColumn'

Do you know, where come from my issue?

Thanks a lot in advance,

ModuleNotFoundError: No module named 'plastid.genomics.roitools'

Hey, I wanted to try out plastid but got the following error:

>>> import plastid
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pkerpedjiev/miniconda3/envs/cenv4/lib/python3.8/site-packages/plastid/__init__.py", line 39, in <module>
    from plastid.genomics.roitools import (
ModuleNotFoundError: No module named 'plastid.genomics.roitools'

Any suggestions? I'm in a conda environment, Python 3.8 and installed plastid using pip install plastid.

‘phase_by_size’ command-line script problem with newer Numpy(1.13.1)

Hi,
I used the 'phase_by_size' script and it reports this error in '......../plastid/bin/phase_by_size.py' line 190 , in main：

counts = counts.reshape(len(counts)/3,3)
TypeError: 'float' object cannot be interpreted as an integer

I found it happens because of numpy.reshape (Numpy(1.13.1)) no longer support float as an input, but it works at Numpy 1.11.1. I changed the code and it works.

Stratified Variable Five Prime Map Factory Error when passed .txt file

I am attempting to recreate your R^2 first vs second half CDS plot in the BMC Genomics Paper, except with incorporating P-sites offsets via a txt file that I've determined based upon the plastid meta gene analysis profiles.

The first error, when attempting to implement the StratifiedVariableFivePrimeMapFactory is that it isn't defined in the package after running from plastid import *:

File "./first_v_second_half_RPF_analysis.py", line 12, in
footprints.set_mapping(StratifiedVariableFivePrimeMapFactory.from_file("/tomato/dev/job/BradW/2C_ribo/data_analysis/plastid_5-24/X3_riboprofile_new.txt"))
NameError: name 'StratifiedVariableFivePrimeMapFactory' is not defined.

Next - trying to use the VariableFivePrimeMapFactory function instead, the following error gets thrown:

Traceback (most recent call last):
File "./first_v_second_half_RPF_analysis.py", line 12, in
footprints.set_mapping(VariableFivePrimeMapFactory.from_file("~/tomato/dev/job/BradW/2C_ribo/data_analysis/plastid_5-24/X3_riboprofile_new.txt"))
File "plastid/genomics/map_factories.pyx", line 560, in plastid.genomics.map_factories.VariableFivePrimeMapFactory.from_file (/tmp/pip-build-z6XyxK/plastid/plastid/genomics/map_factories.c:7162)
TypeError: coercing to Unicode: need string or buffer, type found

In your documentation, you describe that 'from_file' is a method of this VariableFivePrimeMapFactory class. Could you please respond with an example of how you would implement a variable five prime map or stratified map when mapping footprints to P-sites?

RuntimeWarning

Hi Plastid team,

I am a beginner in ribosomal profiling and try to use plastid to analyze my data. It is a wonderful tool.

I run the "metagene count" command with the demo data, likes
metagene count merlin_cds_start_rois.txt SRR609197_riboprofile \ --count_files SRR609197_riboprofile_5hr_rep1.bam \ --fiveprime --offset 14 --normalize_over 30 200 \ --min_counts 50 --cmap Blues --title "Metagene demo"

And I got two RuntimeWarning, just like

But I still can get a similar result likes the demo.

Could you help me with this issue?

Thank you

pysam ImportError

Hi Joshua,

EDIT: Just noticed this was reported in #13 , will see if that solves it. Sorry!

I've installed plastid in a virtualenv as explained in your guide but I am getting an ImportError for a pysam library it seems? Do you have an idea what could be going wrong? Cheers!

(plastid_env) 133-33-123-5:~ jmcarter$ python -c "from plastid import *"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/jmcarter/BioUnix/plastid_env/lib/python2.7/site-packages/plastid/__init__.py", line 40, in <module>
    from plastid.genomics.roitools import (GenomicSegment,
ImportError: dlopen(/Users/jmcarter/BioUnix/plastid_env/lib/python2.7/site-packages/plastid/genomics/roitools.so, 2): Library not loaded: @rpath/pysam/libctabixproxies.so
  Referenced from: /Users/jmcarter/BioUnix/plastid_env/lib/python2.7/site-packages/plastid/genomics/roitools.so
  Reason: image not found

Request new version release - 0.4.9

There were a few useful fixes made since 0.4.8 was released. I was hoping you could release an official version (0.4.9?) with the latest updates.

Question on --gff_transcript_types and --mask_gff_transcript_types

Hello,
In the doc, the parameter

--gff_transcript_types  GFF_TRANSCRIPT_TYPES [GFF_TRANSCRIPT_TYPES ...] 	GFF3 feature types to include as transcripts, even if no exons are present (for GFF3 only; default: use SO v2.5.3 specification)

and

--mask_gff_transcript_types  MASK_GFF_TRANSCRIPT_TYPES [MASK_GFF_TRANSCRIPT_TYPES ...] 	GFF3 feature types to include as transcripts, even if no exons are present (for GFF3 only; default: use SO v2.5.3 specification)

have exactly same explation. So what's the difference bewteen them? --gff_exon_types and --gff_cds_types also have same problem.

Feature request: create / simulate UTRs in `metagene generate` when annotations do not include them. (formerly: Unable to generate proper cds_start ROIs using `metagene generate`)

I am running into issues attempting to generate ROIs for a p-site offset analysis on e. coli. The generated ROIs do not include any upstream region, and the "alignmnet_offset" seems to always be set to the value of the --upstream parameter. I've used various forms of annotation, but here is an example using the RefSeq GTF files from UCSC:

region_id   window_size region  masked  alignment_offset    zero_point
b0001   100 chr:189-239(+)  na  50  50
b0002   100 chr:336-386(+)  na  50  50

could not open alignment file - psite offset

Hi Joshua!

We've been working on the psite (subprogram) but have encountered an issue and would appreciate
some feedback on how to handle this, since it is not very intuitive:

Here's were we started...
(https://plastid.readthedocs.io/en/latest/examples/p_site.html)

The gtf file was obtained from gramene. We did some filtering by removing cp (chloroplast),
mithocondria (mt), tRNA and rRNA reads.

Our bam file was obtained from STAR after multiple sequential alignments to isolate nuclear genomic reads, and they were sorted in indexed with Samtools:

samtools sort --threads 27 ${in_file} -o sorted_${out_filename}
samtools index ${in_file} -b indexed_${out_filename}

After running the metagene generate on our maize reference and obtaining the *_rois.txt
file required for the next step (psite), we tested on our indexed bam file but encountered
the following warning messages:

"[E::hts_hopen] Failed to open file indexed_cp_genome_102617_RF-15Dark_Rep1_mergedAligned.out.bam
[E::hts_open_format] Failed to open file indexed_cp_genome_102617_RF-15Dark_Rep1_mergedAligned.out.bam
psite [2018-11-02 09:59:03]: Opening ROI file Zea_mays_orfs_rois.txt ...
psite [2018-11-02 09:59:04]: Opening count files indexed_cp_genome_102617_RF-15Dark_Rep1_mergedAligned.out.bam ...
Traceback (most recent call last):
File "/packages/anaconda3/5.1.0/envs/plastid/bin/psite", line 11, in
load_entry_point('plastid==0.4.8', 'console_scripts', 'psite')()
File "/packages/anaconda3/5.1.0/envs/plastid/lib/python3.5/site-packages/plastid/bin/psite.py", line 326, in main
ga = ap.get_genome_array_from_args(args,printer=printer)
File "/packages/anaconda3/5.1.0/envs/plastid/lib/python3.5/site-packages/plastid/util/scriptlib/argparsers.py", line 572, in get_genome_array_from_args
count_files = [pysam.Samfile(X,"rb") for X in args.count_files]
File "/packages/anaconda3/5.1.0/envs/plastid/lib/python3.5/site-packages/plastid/util/scriptlib/argparsers.py", line 572, in
count_files = [pysam.Samfile(X,"rb") for X in args.count_files]
File "pysam/libcalignmentfile.pyx", line 734, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 933, in pysam.libcalignmentfile.AlignmentFile._open
OSError: [Errno 8] could not open alignment file indexed_cp_genome_102617_RF-15Dark_Rep1_mergedAligned.out.bam: Exec format error"

The indexed bam files seem to be ok since we ran Subreads (featureCounts) on them, but for some reason, we cannot run plastid on them (counts_file argument, failed to open)

We appreciate your feedback and look forward to hearing back from you.

Ana and Mandie

todo: finish migration of SegmentChain & Transcript to Cython

This is mostly done & gives a 2-3x speedbump in a number of scripts

Incompatible with BioPython 1.78+

The GFF3_TranscriptAssembler (and presumably other pieces of code) are not compatible with a change made to BioPython 1.78. Namely, the removal of the Bio.Alphabet module. This page (https://biopython.org/wiki/Alphabet) has more information, but TL;DR, just remove Bio.Alphabet and most code will still work.

ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the ``molecule_type`` as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

Plotting a metagene plot for two different samples at the same time

Hi,
I am trying to generate a metagene plot for my samples treated and untreated. However I can only generate a plot having one of the samples ( either of them ) as in count the input bam files get pooled together.
So the output looks as follows for the 'treated' sample:

How can I add'untreated'input to this plot to compare their trend for metagene distribution?

Thank you.

Support Python 3.8

PLACEHOLDER: this ticket must be rewritten/edited before work begins.

Python 3.8 was released on October 14 2019. plastid may require updates to function under Python 3.8

At present, tests for plastid cannot be tested under Python 3.8 due to a blocking issue with Pysam.

make_wiggle output format & get_count_vectors zero counts

Dear Joshua,
Thanks for this tool, it is a great addition to the software I have been using so far.

I would like to mention two issues I encountered when using plastid:

All zero counts by get_count_vectors
After successfully determining p-site offsets for my set of samples as well as obtaining counts for different features (CDS, UTRs) via cs count, I have been trying to obtain codon-level counts for each gene via the get_count_vectors program. Unfortunately, although I use the same BAM files and offsets as for metagene count and cs count, get_count_vectors returns all zero count files.
I tried changing the annotation file format from BED12 to sorted GTF (official GENCODE v24 release for hg38), but to no avail.
Since I have PE data and only use the R1 reads, I checked whether read orientation and gene direction correspond via IGV and this is indeed the case. To be sure I flipped the orientation of all genes in the annotation file and re-ran the program, but still found no counts != 0.

To circumvent this issue, I used make_wiggle to generate coverage profiles, which I have supplied to get_count_vectors. However, I am also not sure if get_count_vectors can handle these files properly, since it's input formats comprise "BAM,bigwig,bowtie,wiggle" and my current test run only displays the following (after a runtime of >60 minutes) while the process keeps running using a core at 100%:

get_count_vectors [2017-02-08 11:34:48]: Opening wiggle files RPF_sample1 ...

This brings me to the second issue I have encountered.

The output format of make_wiggle
make_wiggle creates two .WIG files, one for each strand, which are by default in BEDGRAPH format. I find this extremely confusing, especially since there is no option to specify the output to be in WIGGLE format.

Kind regards

bug in PSLreader: end coordinates of protein->DNA query results are mis-parsed

the bam file choose

when i use star to alignment, i can get the bam file compared to the genome(01Aligned.sortedByCoord.out.bam) and the bam file compared to the transcript(01Aligned.toTranscriptome.out.bam), Which one should I choose for psite script analysis?
thank you!

logging dependency should be remove from setup.py when installing with Python3

Hello,
logging is part of the standard library in python3

having setup.py requires for 'logging' causes problems while logging is grabbed from pypi and as logging module from pypi is python2 syntax.

regards

Eric

joshuagryphon / plastid Goto Github PK

plastid's Introduction

Welcome to plastid!

Introduction

Installation

Bioconda

PyPI

Running the tests

Links & help

plastid's People

Contributors

Stargazers

Watchers

Forkers

plastid's Issues

Phenomenon

Terms

Notes

Tests

GCC

Python packages

I am attempting to recreate your R^2 first vs second half CDS plot in the BMC Genomics Paper, except with incorporating P-sites offsets via a txt file that I've determined based upon the plastid meta gene analysis profiles.

The first error, when attempting to implement the StratifiedVariableFivePrimeMapFactory is that it isn't defined in the package after running from plastid import *:

Next - trying to use the VariableFivePrimeMapFactory function instead, the following error gets thrown:

In your documentation, you describe that 'from_file' is a method of this VariableFivePrimeMapFactory class. Could you please respond with an example of how you would implement a variable five prime map or stratified map when mapping footprints to P-sites?

Recommend Projects

Recommend Topics

Recommend Org