colomemaria / episcanpy Goto Github PK

Episcanpy: Epigenomics Single Cell Analysis in Python

License: BSD 3-Clause "New" or "Revised" License

Python 99.70% R 0.30%

episcanpy's Introduction

EpiScanpy – Epigenomics single cell analysis in python

EpiScanpy is a toolkit to analyse single-cell open chromatin (scATAC-seq) and single-cell DNA methylation (for example scBS-seq) data. EpiScanpy is the epigenomic extension of the very popular scRNA-seq analysis tool Scanpy (Genome Biology, 2018) [Wolf18]. For more information on scanpy, read the following documentation.

EpiScanpy paper is now accessible on Nature Communications. For more information on how to use the package, more information are available on the website.

Releases
Build status via Conda
Python versions
Operating System
Document status
Github status

Documentation

The documentation for epiScanpy is available here. If epiScanpy is useful to your research, consider citing epiScanpy.

episcanpy's People

Contributors

Stargazers

Watchers

episcanpy's Issues

Plans to include other QC parameters for scATAC-seq data?

Hi,

I notice that the episcanpy tutorial for scATAC-seq doesn't mention some QC parameters like nucleosome banding pattern or fragment length, are there any plans to incorporate these metrics?

Problems when generating matrix (episcanpy.ct.bld_mtx_fly())

Dear Anna,
congratulations for this package! I am a big fan of the scanpy environment.

I was wondering whether you have any tutorial for scATAC-seq from 10XGenomics ( or scRNA-seq-scATA-seq).
Specifically, I have the following problem when reading ATAC-seq 10X data:

epi.ct.bld_mtx_fly(tsv_file="atac_fragments.tsv.gz", annotation="atac_peak_annotation.tsv", save="test.h5ad", )

ERROR:

loading barcodes

---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
<ipython-input-12-4093bfdf2ff8> in <module>
      4 filename = P + "test.h5ad"
      5 
----> 6 epi.ct.bld_mtx_fly(tsv_file="atac_fragments.tsv.gz",
      7                    annotation="atac_peak_annotation.tsv",
      8                    save="test.h5ad",

~/opt/anaconda3/lib/python3.8/site-packages/episcanpy/count_matrix/_bld_atac_mtx.py in bld_mtx_fly(tsv_file, annotation, csv_file, genome, save)
     39 
     40         print('loading barcodes')
---> 41         barcodes = sorted(pd.read_csv(tsv_file, sep='\t', header=None).loc[:, 3].unique().tolist())
     42 
     43         # barcodes

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    603     kwds.update(kwds_defaults)
    604 
--> 605     return _read(filepath_or_buffer, kwds)
    606 
    607 

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    461 
    462     with parser:
--> 463         return parser.read(nrows)
    464 
    465 

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in read(self, nrows)
   1050     def read(self, nrows=None):
   1051         nrows = validate_integer("nrows", nrows)
-> 1052         index, columns, col_dict = self._engine.read(nrows)
   1053 
   1054         if index is None:

~/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py in read(self, nrows)
   2054     def read(self, nrows=None):
   2055         try:
-> 2056             data = self._reader.read(nrows)
   2057         except StopIteration:
   2058             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 53, saw 5

These are lines 52 and 53 in the tsv file:

# primary_contig=JH584295.1 | &nbsp; | &nbsp;
-- | -- | --
chr1 | 3000087 | 3000282 | GCCAATTAGCACTAAC-1 | 1
chr1 | 3001599 | 3001786 | AAGGTATAGCAGGTGG-1 | 1

Many thanks!
Marcos

build_mtx_fly missing shutil import

Hi Anna,

there seems to be a very small bug in build_mtx_fly():

I get:

loading the barcodes

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-1ca708327c27> in <module>
----> 1 epi.ct.bld_mtx_fly(paths['rawdir']+'fragments.tsv.gz', paths['rawdir']+'fragments.tsv.gz.tbi', paths['writedir']+'../../data_other/atac_geneid_features.bed')

~/.local/lib/python3.7/site-packages/episcanpy/count_matrix/_bld_atac_mtx.py in bld_mtx_fly(tsv_file, tbi_file, annotation, csv_file, genome, DATADIR, save)
    135         with gzip.open(tsv_file, 'rb') as f_in:
    136             with open(tsv_file.rstrip('.gz'), 'wb') as f_out:
--> 137                 shutil.copyfileobj(f_in, f_out)
    138         df = pd.read_csv(tsv_file.rstrip('.gz'), sep='\t', header=None)
    139         barcodes = list(sorted(set(df.loc[:,3].tolist())))

NameError: name 'shutil' is not defined

I guess there is just an import statement missing

Can episcanpy transform scATAC-seq peak matrix into "Gene activity matrix"?

Sometimes we need to collapse the single cell ATAC-seq peak matrix to a "gene activity matrix", as same as in seurat, I wish the episcanpy can also provide this function.

Error in Generating Methylation Count Matrices

Dear EpiScanpy team,

I am having trouble using episcanpy's ct.build_count_mtx() function. I strictly followed the tutorial and used the test datasets given to build the count matrices but found that there was no output files even the code chunk ran successfully. If setting the outout_file to "None" and keeping it as loaded matrix, none_type was produced.

Thanks!

Installation on Mac M2

Hello, I got the following error while trying to install episcanpy on mac with the following config:

MacOS Ventura 13.4
Model: M2 Max

Collecting episcanpy
Using cached episcanpy-0.4.0.tar.gz (50.5 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [52 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/private/var/folders/5f/wd12d4k54y92nf_37yzs57wc0000gn/T/pip-install-bos65vr8/episcanpy_34180352814c4e248866de19a15909cb/setup.py", line 9, in
from episcanpy import author, email
File "/private/var/folders/5f/wd12d4k54y92nf_37yzs57wc0000gn/T/pip-install-bos65vr8/episcanpy_34180352814c4e248866de19a15909cb/episcanpy/init.py", line 3, in
from .utils import check_versions, annotate_doc_types
File "/private/var/folders/5f/wd12d4k54y92nf_37yzs57wc0000gn/T/pip-install-bos65vr8/episcanpy_34180352814c4e248866de19a15909cb/episcanpy/utils.py", line 17, in
from . import settings
File "/private/var/folders/5f/wd12d4k54y92nf_37yzs57wc0000gn/T/pip-install-bos65vr8/episcanpy_34180352814c4e248866de19a15909cb/episcanpy/settings.py", line 77, in
import scanpy as sc
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/scanpy/init.py", line 6, in
from ._utils import check_versions
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/scanpy/_utils/init.py", line 21, in
from anndata import AnnData, version as anndata_version
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/anndata/init.py", line 7, in
from ._core.anndata import AnnData
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/anndata/_core/anndata.py", line 27, in
from .raw import Raw
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/anndata/_core/raw.py", line 10, in
from .index import _normalize_index, _subset, unpack_index, get_vector
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/anndata/_core/index.py", line 10, in
from ..compat import AwkArray, DaskArray, Index, Index1D
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/anndata/compat/init.py", line 69, in
from dask.array import Array as DaskArray
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/dask/array/init.py", line 2, in
from dask.array import backends, fft, lib, linalg, ma, overlap, random
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/dask/array/backends.py", line 6, in
from dask.array.core import Array
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/dask/array/core.py", line 63, in
from dask.sizeof import sizeof
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/dask/sizeof.py", line 264, in
_register_entry_point_plugins()
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/dask/sizeof.py", line 254, in _register_entry_point_plugins
for entry_point in importlib_metadata.entry_points(group="dask.sizeof"):
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/importlib_metadata/init.py", line 933, in entry_points
return EntryPoints(eps).select(**params)
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/importlib_metadata/init.py", line 930, in
eps = itertools.chain.from_iterable(
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/importlib_metadata/_itertools.py", line 16, in unique_everseen
k = key(element)
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/importlib_metadata/_py39compat.py", line 18, in normalized_name
return dist._normalized_name
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/setuptools/_vendor/importlib_metadata/init.py", line 778, in _normalized_name
or super()._normalized_name
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/setuptools/_vendor/importlib_metadata/init.py", line 445, in normalized_name
return Prepared.normalize(self.name)
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/site-packages/setuptools/vendor/importlib_metadata/init.py", line 700, in normalize
return re.sub(r"[-.]+", "-", name).lower().replace('-', '')
File "/Users/philippemartin/miniconda3/envs/scenicplus/lib/python3.8/re.py", line 210, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.

Any clue please ?

multimodality

Glad to be the first one here!
Looking forward to the final release version of epiScanpy and it's full tutorial and the download link of test data.
By the way, Could epiScanpy be used as a multi-modal analysis tool as discussed in scverse/scanpy#479 ?

Potential dependency conflicts between episcanpy and scipy

Hi, as shown in the following full dependency graph of episcanpy, episcanpy requires scipy <=1.3, episcanpy requires seaborn * (seaborn 0.10.0 will be installed, i.e., the newest version satisfying the version constraint), and directed dependency seaborn 0.10.0 transitively introduces scipy >=1.0.1.
Obviously, there are multiple version constraints set for scipy in this project. However, according to pip's “first found wins” installation strategy, scipy 1.3 (i.e., the newest version satisfying constraint <=1.3) is the actually installed version.
Although the first found package version scipy 1.3 just satisfies the later dependency constraint （scipy >=1.0.1), such installed version is very close to the upper bound of the version constraint of Scipy specified by seaborn 0.10.0 .

Once seaborn upgrades，its newest version will be installed, as episcanpy does not specify the upper bound of version constraint for seaborn. Therefore, it will easily cause a dependency conflict (build failure), if the upgraded seaborn version introduces a higher version of Scipy, violating its another version constraint <=1.3.
According to the release history of seaborn, it habitually upgrates Scipy in its recent releases. For instance, seaborn 0.9.0 upgrated Scipy’s constraint from * to >=0.14.0, seaborn 0.9.1.rc0 upgrated Scipy’s constraint from >=0.14.0 to >=0.17.1, and seaborn 0.10.0rc0 upgrated Scipy’s constraint from >=0.17.1 to >=1.0.1.
As such, it is a warm warning of a potential dependency conflict issue for episcanpy.

Dependency tree

episcanpy - 0.1.10
| +- anndata(install version:0.6.22rc1 version range:==0.6.22rc1)
| | +- h5py(install version:2.10.0 version range:*)
| | +- natsort(install version:6.2.1 version range:*)
| | +- numpy(install version:1.18.4 version range:<2,>=1.14)
| | +- pandas(install version:1.0.3 version range:>=0.23.0)
| | +- scipy(install version:1.3 version range:<2,>=1.0)
| +- bamnostic(install version:1.1.4 version range:>=1.0.12)
| +- matplotlib(install version:3.2.1 version range:>=3.0.0)
| | +- cycler(install version:0.10.0 version range:>=0.10)
| | | +- six(install version:1.14.0 version range:*)
| | +- kiwisolver(install version:1.2.0 version range:>=1.0.1)
| | +- numpy(install version:1.18.4 version range:>=1.11)
| | +- pyparsing(install version:3.0.0a1 version range:>=2.0.1)
| | +- python-dateutil(install version:2.8.1 version range:>=2.1)
| +- pandas(install version:1.0.3 version range:>=0.21)
| +- scanpy(install version:1.4.4 version range:==1.4.4)
| | +- anndata(install version:0.6.22rc1 version range:>=0.6.22rc1)
| | | +- h5py(install version:2.10.0 version range:*)
| | | +- natsort(install version:6.2.1 version range:*)
| | | +- numpy(install version:1.18.4 version range:<2,>=1.14)
| | | +- pandas(install version:1.0.3 version range:>=0.23.0)
| | | +- scipy(install version:1.3 version range:<2,>=1.0)
| | +- h5py(install version:2.10.0 version range:*)
| | +- importlib-metadata(install version:1.6.0 version range:>=0.7)
| | +- joblib(install version:0.14.1 version range:*)
| | +- matplotlib(install version:3.0. version range:==3.0.)
| | +- natsort(install version:6.2.1 version range:*)
| | +- networkx(install version:2.4 version range:*)
| | | +- decorator(install version:4.4.2 version range:>=4.3.0)
| | +- numba(install version:0.48.0 version range:>=0.41.0)
| | +- pandas(install version:1.0.3 version range:>=0.21)
| | +- patsy(install version:0.5.1 version range:*)
| | | +- numpy(install version:1.18.4 version range:>=1.4)
| | | +- six(install version:1.14.0 version range:*)
| | +- scikit-learn(install version:0.22.2.post1 version range:>=0.19.1)
| | +- scipy(install version:1.3 version range:>=1.3)
| | +- seaborn(install version:0.10.0 version range:*)
| | | +- matplotlib(install version:3.2.1 version range:>=2.1.2)
| | | | +- cycler(install version:0.10.0 version range:>=0.10)
| | | | +- kiwisolver(install version:1.2.0 version range:>=1.0.1)
| | | | +- numpy(install version:1.18.4 version range:>=1.11)
| | | | +- pyparsing(install version:3.0.0a1 version range:>=2.0.1)
| | | | +- python-dateutil(install version:2.8.1 version range:>=2.1)
| | | +- numpy(install version:1.18.4 version range:>=1.13.3)
| | | +- pandas(install version:1.0.3 version range:>=0.22.0)
| | | +- scipy(install version:1.3 version range:>=1.0.1)
| | +- statsmodels(install version:0.11.1 version range:>=0.10.0rc2)
| | | +- numpy(install version:1.18.4 version range:>=1.14)
| | | +- pandas(install version:1.0.3 version range:>=0.21)
| | | +- patsy(install version:0.5.1 version range:>=0.5)
| | | | +- numpy(install version:1.18.4 version range:>=1.4)
| | | | +- six(install version:1.14.0 version range:*)
| | | +- scipy(install version:1.3 version range:>=1.0)
| | +- tables(install version:3.6.1 version range:*)
| | | +- numexpr(install version:2.7.1 version range:>=2.6.2)
| | | | +- numpy(install version:1.18.4 version range:>=1.7)
| | | +- numpy(install version:1.18.4 version range:>=1.9.3)
| | +- tqdm(install version:4.45.0 version range:*)
| | +- umap-learn(install version:0.4.1 version range:>=0.3.0)
| +- scipy(install version:1.3 version range:<=1.3)
| +- seaborn(install version:0.10.0 version range:*)
| | +- matplotlib(install version:3.2.1 version range:>=2.1.2)
| | | +- cycler(install version:0.10.0 version range:>=0.10)
| | | | +- six(install version:1.14.0 version range:*)
| | | +- kiwisolver(install version:1.2.0 version range:>=1.0.1)
| | | +- numpy(install version:1.18.4 version range:>=1.11)
| | | +- pyparsing(install version:3.0.0a1 version range:>=2.0.1)
| | | +- python-dateutil(install version:2.8.1 version range:>=2.1)
| | +- numpy(install version:1.18.4 version range:>=1.13.3)
| | +- pandas(install version:1.0.3 version range:>=0.22.0)
| | +- scipy(install version:1.3 version range:>=1.0.1)

Thanks for your help.
Best,
Neolith

Future plans

Dear episcanpy team,

First of all, thanks for taking the initiative of bringing epigenomic analyses to the python programing environment. Together with scanpy, it has the potential of performing large scale multiomics projects without having to switch back and fort to R, making it easier to handle dependencies.

When reading the documentation though, I was quite disappointed to see that at the current status, episcanpy is not as complete as scanpy (barely any documentation nor notebooks, few processing and plotting functions), nor offers basic downstream analyses after integration like for example ArcheR does (TF footprinting, genome browser, differential accessible peak analysis, motif enrichment, peak2gene links and so on).

Are there any plans on extending episcanpy in the future? Unfortunately, right now it does not have enough functionality to justify using it. It would be a pity if these features were to be omitted since episcanpy has the potential to make the whole scanpy suite more attractive, especially now that multiome data-sets are more and more common.

binarize issues

I've encountered an issue when trying to binarize data. Specifically, if the data matrix is sparse, the instruction

admatrix = np.where(admatrix>threshold, upper, lower)

raises sequence error. This solution may be applied preserving the type of datamatrix:

admatrix = (adata.X > threshold).astype(float)

Type in import of `tss_enrichment_score`

Currently the import statement in pl.py at line 20 is importing tss_enrichment_score as tss_enrichment_SOCRE instead of tss_enrichment_score. I fixed this in my fork and plan to make a pull request.

File episcanpy/api/pl.py:20
18 from ..preprocessing._quality_control import cal_var, variability_features
19 from ..preprocessing._tss_enrichment import tss_enrichment_plot as tss_enrichment
---> 20 from ..preprocessing._tss_enrichment import tss_enrichment_score_plot as tss_enrichment_socre

installation should say python 3.7 else SyntaxError: future feature annotations is not defined

from future import annotations
SyntaxError: future feature annotations is not defined

running python 3.6 causes an error related to future feature annotations

Add function to load gtf file as features

Potential edit required in building count matrix (methylation)

Dear madam/sir,

A few weeks ago I received my first single-cell methylation data. While exploring all possibilities for analyzing these samples, I ran into your Episcanpy paper. I was quite impressed by your user friendly and extensively described method, which is nicely substantiated by the associated Github.

While performing some quality control on cell/feature coverage, I realized something was wrong with the numbers (methylation values) in the count matrix. To be sure, I went through your code that builds the count matrix based on genomic windows of 100.000 basepairs: ‘_bld_met_mtx.py’. I think I found two mistakes:

1. The methylation values are assigned to right cells but to the wrong genomic windows, because the order of ‘chromosome’ and ‘sorted(features.keys())’ is different. Meaning that the methylation values end up in the wrong chromosome.

2. The code within the funciton ‘ def methylation_level(reduced_cyt, feature, chromosome, threshold=1):’, should loop through every cytosine (‘CG’ in my case) within a certain genomic window. Instead, it measures the methylation value of the first cytosine and even though it loops correctly, due to an iterator mistake it keeps adding the coverage values of this same cytosine, constantly noting the same methylation values. This results in an over-representation of zeros in the count matrix when the first cytosine in a window has meth.reads=0.

I will use the mouse brain data from your tutorial to show how this happens:

Above is seen how ‘chromosomes’ is specified (1-19, X and Y), used in the ‘def methylation_level(reduced_cyt, feature, chromosome, threshold=1):’ function, which is responsible for creating a list containing all methylation values (meth_reads/tot_reads), see point 2.

On the other hand ‘sorted(feature.keys())’ is specified in a sorted way (1, 10-19, 2-9, X and Y), used in the ‘def extract_feature_names(feature):’ function, which creates the genomic bins.

Later in the function ‘def build_count_mtx(cells, annotation, path="", output_file=None, writing_option="w", meth_context="CG", chromosome=None, feature_names=None, threshold=1, ct_mtx=None, sparse=False, copy=False):’, the methylation values from the first function are added to the genomic bins from the second functions, which means they are appointed in the wrong order.

Removing the ‘sorted’ from features solves this problem and gives the same order as ‘chromosome’.

Solution:

Here we are looking at cell 1 ('../methylation_play_data/cell1.tsv'), more specific, the first three loops of the ‘def methylation_level(reduced_cyt, feature, chromosome, threshold=1):’ function, looping through the first three cytosines of the first genomic window (chr1_3000001_3100000) containing a ‘CG’. In the first loop it nicely takes the meth_reads (=5) and tot_reads (=5) and adds it to the starting point; meth_reads (=0) tot_reads (=1), resulting in meth_reads (=5) and tot_reads (=6). However, in the second loop it does not take the meth_reads (=1) and tot_reads (=1), but adds the meth_reads (=5) and tot_reads (=5) from the first cytosine again. This results in meth_reads (=10) and tot_reads (=11), instead of meth_reads (=6) and tot_reads (=7). The same story for every other cytosine in a certain genomic window.

Solution:

Let me know if you need some more information regarding this potential edit.

I am curious about your opinion and hope to hear from you soon!

Kind regards,
Tim Sakkers

Error when loading 10X Cellranger output with read_ATAC_10x()

Hi Anna!
I've been trying to load the output from 10X's CellRanger scATACseq aggregated pipeline into EpiScanpy:

mtx_file = "/home/pab/ESPACE_08/outs_aggregate/filtered_peak_bc_matrix/matrix.mtx"
tsv_file = "/home/pab/ESPACE_08/outs_aggregate/filtered_peak_bc_matrix/barcodes.tsv"
bed_file = "/home/pab/ESPACE_08/outs_aggregate/filtered_peak_bc_matrix/peaks.bed"
adata2 = epi.pp.read_ATAC_10x(mtx_file, cell_names=tsv_file, var_names=bed_file)

However, it seems that I'm encountering an error with the read_ATAC_10x() function.
The error log:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-028937a6930d> in <module>
      2 tsv_file = "/home/pab/ESPACE_08/outs_aggregate/filtered_peak_bc_matrix/barcodes.tsv"
      3 bed_file = "/home/pab/ESPACE_08/outs_aggregate/filtered_peak_bc_matrix/peaks.bed"
----> 4 adata2 = epi.pp.read_ATAC_10x(mtx_file, cell_names=tsv_file, var_names=bed_file)

~/miniconda3/envs/csg.p/lib/python3.9/site-packages/episcanpy/preprocessing/_load_matrix.py in read_ATAC_10x(matrix, cell_names, var_names, path_file)
     36         var_names = ["_".join(x[:-1].split('\t')) for x in var_names]
     37 
---> 38     adata = ad.AnnData(mat, obs=pd.DataFrame(index=barcodes), var=pd.DataFrame(index=var_names))
     39     adata.uns['omic'] = 'ATAC'
     40 

~/miniconda3/envs/csg.p/lib/python3.9/site-packages/anndata/_core/anndata.py in __init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
    305                 raise ValueError("`X` has to be an AnnData object.")
    306             self._init_as_view(X, oidx, vidx)
--> 307         else:
    308             self._init_as_actual(
    309                 X=X,

~/miniconda3/envs/csg.p/lib/python3.9/site-packages/anndata/_core/anndata.py in _init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
    470             elif isinstance(X, ZarrArray):
    471                 X = X.astype(dtype)
--> 472             else:  # is np.ndarray or a subclass, convert to true np.ndarray
    473                 X = np.array(X, dtype, copy=False)
    474             # data matrix and shape

TypeError: float() argument must be a string or a number, not 'coo_matrix'

I believe I'm doing the right procedure, as stated in your beta_tutorial_10x_pbmc.html tutorial.
Any hint on what might be causing this?
I'm using episcanpy==0.3.1 and anndata==0.7.5.

PD: Is there a way to directly load my filtered_peak_bc_matrix.h5 into EpiScanpy, in a similar manner to
some R packages like Seurat?

Thanks!

About snap2anndata

Hello. I attempted to run snap2anndata following the tutorial (SnapATAC_to_anndata_March26th2020.ipynb) using sample data (atac_v1_adult_brain_fresh_5k.snap.rds). However, I received the error message below. How should I convert snap object to anndata object? I'm using python 3.7.1

Thank you for your help.

During startup - Warning messages:
1: Setting LC_COLLATE failed, using "C"
2: Setting LC_TIME failed, using "C"
3: Setting LC_MESSAGES failed, using "C"
4: Setting LC_MONETARY failed, using "C"
5: Setting LC_PAPER failed, using "C"
6: Setting LC_MEASUREMENT failed, using "C"
UserWarning:convert_to_anndata_2.py:111:
featurepartially excluded from the conversion.
GRanges are not currently fully transfered into the Anndata.

UserWarning:convert_to_anndata_2.py:111:
peakpartially excluded from the conversion.
GRanges are not currently fully transfered into the Anndata.

Traceback (most recent call last):
File "convert_to_anndata_2.py", line 332, in
save='/home/ubuntu/result_dir/scATAC_Seq/snapATAC/200514_project/test/atac_v1_adult_brain_fresh_5k.snap.h5ad')
File "convert_to_anndata_2.py", line 309, in snap2anndata
copy=True))
File "convert_to_anndata_2.py", line 185, in make_Anndata
if (extra in input_data.keys()) and (extra not in adata.obs.columns) and (len(input_data[extra][1].tolist()) == len(adata.obs_names.tolist())):
AttributeError: 'StrVector' object has no attribute 'tolist'

Remind: error in function load_features

Regarding the tutorial:
https://nbviewer.jupyter.org/github/colomemaria/epiScanpy/blob/master/docs/tutorials/ATAC_bld_ct_mtx_tutorial.html

The bed file contains 4 columns

chr1	3120019	3122019	enhancer_3120019_3122019
chr1	3209819	3211819	enhancer_3209819_3211819
chr1	3292869	3294869	enhancer_3292869_3294869
chr1	3298619	3300619	enhancer_3298619_3300619
chr1	3309519	3311519	enhancer_3309519_3311519
chr1	3359619	3361619	enhancer_3359619_3361619
chr1	3398119	3400119	enhancer_3398119_3400119
chr1	3410669	3412669	enhancer_3410669_3412669
chr1	4138319	4140319	enhancer_4138319_4140319

For the bed file that contains 3 columns as like

chr1	10413	10625
chr1	13380	13624
chr1	16145	16354
chr1	96388	96812
chr1	115650	115812
chr1	237625	237888
chr1	240909	241193
chr1	521446	521676
chr1	540558	541085
chr1	713705	714611

It reports an error in function load_features

bld_atac_mtx doesn't work with some bam files

bld_atac_mtx doesn't work with these bam files

https://www.dropbox.com/sh/8o8f0xu6cvr46sm/AAB6FMIDvHqnG6h7athgcm5-a/Buenrostro_2018.tar.gz?dl=0

'IndexError: list index out of range' when running epi.tl.geneactivity()

Hi,

I am getting the following error when I run epi.tl.geneactivity():

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_105708/3365853572.py in <module>
----> 1 epi.tl.geneactivity(episcanpy_atac, gtf_file, key_added="gene_scores")

~/miniconda3/envs/csg.p/lib/python3.9/site-packages/episcanpy/tools/_geneactivity.py in geneactivity(adata, gtf_file, key_added, upstream, feature_type, annotation, layer_name, raw, copy)
     73         line = line.split('_')
     74         if line[0] not in raw_adata_features.keys():
---> 75             raw_adata_features[line[0]] = [[int(line[1]),int(line[2]), feature_index]]
     76         else:
     77             raw_adata_features[line[0]].append([int(line[1]),int(line[2]), feature_index])

IndexError: list index out of range

This has to do with the way _geneactivity.py iterates over the feature names from the anndata object. By looking at the code, I saw that line = line.split('_') tries to separate diverse string fields, but each line variable in my case is:

chr1:629499-630394
chr1:633580-634634
chr1:778282-779198
chr1:816872-817778
chr1:827063-827952
chr1:844145-844994
chr1:869467-870372
chr1:904350-905199
chr1:920760-921655

So there is nothing to split by '_' character.

I tried to tweak the code and separate myself into starting position and ending position for each feature to create the raw_adata_features dictionary, but If I do this I receive an empty gene_activity_X matrix later on. I am using the same GTF file as in the example gencode.v36.annotation.gtf.

Can you help me with this? Thanks!

Suggestion: add unit testing

add unit testing to the package

episcanpy does not install / work through any channel of download (conda, pip, git pull)

/home/mvinyard/.local/lib/python3.6/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
/home/mvinyard/.local/lib/python3.6/site-packages/numba/errors.py:137: UserWarning: Insufficiently recent colorama version found. Numba requires colorama >= 0.3.9
  warnings.warn(msg)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-18b9f5d5285b> in <module>
----> 1 import episcanpy.api as epi

~/.local/lib/python3.6/site-packages/episcanpy/__init__.py in <module>
     16 __version__ = get_versions()['version']
     17 
---> 18 check_versions()
     19 annotate_doc_types(sys.modules[__name__], 'episcanpy')
     20 del get_versions, sys, check_versions, annotate_doc_types

~/.local/lib/python3.6/site-packages/episcanpy/utils.py in check_versions()
     32     #       use the following hack...
     33     if anndata.__version__ != '0+unknown':
---> 34         if anndata.__version__ < LooseVersion('0.6.10'):
     35             raise ImportError('Scanpy {} needs anndata version >=0.6.10, not {}.\n'
     36                               'Run `pip install anndata -U --no-deps`.'

/usr/lib/python3.6/distutils/version.py in __gt__(self, other)
     62 
     63     def __gt__(self, other):
---> 64         c = self._cmp(other)
     65         if c is NotImplemented:
     66             return c

/usr/lib/python3.6/distutils/version.py in _cmp(self, other)
    333             other = LooseVersion(other)
    334 
--> 335         if self.version == other.version:
    336             return 0
    337         if self.version < other.version:

AttributeError: 'Version' object has no attribute 'version'

This is the most informative error I can get - I also tried installing anndata first, separately. no luck so far.

Meet UnicodeDecodeError when utilizing gtf file to produce a gene activity matrix

Hello!
When I use function 'episcanpy.tl.geneactivity' to collapse the peak matrix to a "gene activity matrix",
I met ‘UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte’ at
'with open(gtf_file) as f:
for line in f:'
The gtf file I use is gzip file 'hg38', acquired from 'genome.ucsc.edu'.

Episcanpy is incompatible with anndata 0.7.1

Hi Anna,

I just stumbled over another problem with episcanpy. With the latest anndata version installed (0.7.1), I cannot import episcanpy 0.1.8 anymore.

I get the following error:

Python 3.7.5 (default, Dec 22 2019, 13:37:12)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import episcanpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/leander.dony/.local/lib/python3.7/site-packages/episcanpy/__init__.py", line 18, in <module>
    check_versions()
  File "/home/leander.dony/.local/lib/python3.7/site-packages/episcanpy/utils.py", line 34, in check_versions
    if anndata.__version__ < LooseVersion('0.6.10'):
  File "/app/python37/lib/python3.7/distutils/version.py", line 64, in __gt__
    c = self._cmp(other)
  File "/app/python37/lib/python3.7/distutils/version.py", line 335, in _cmp
    if self.version == other.version:
AttributeError: 'Version' object has no attribute 'version'

This does not occur with anndata 0.6.22.post1

Would be great if episcanpy would soon be compatible with anndata again :)

Extremely long runtime bld_mtx_fly()

Hey,

I having some problems with the extremely long runtime of the bld_mtx_fly() function on my scATAC dataset. I have a 10x dataset with 2.5k cells and am trying to build a feature matrix from the fragments.tsv file using an geneid annotation BED file with 33k features.

I have added a tqdm progress bar to get an idea of the performance.

After 10 Minutes, this is that I see:

0%|          | 11/32774 [09:37<617:20:22, 67.83s/it]

This is telling me an expected runtime of 3 weeks, which seems to be too much, considering this statement in the function doc string "Expected running time for 10k cells X 100k features on a personal computer ~65min"

When I use 10kb windows from the make_windows() function, I get over 2 million features and an expected runtime of several years:

0%|          | 111/2399976 [55:19<46456:43:07, 69.69s/it]

Any suggestion how I could still use this function and get started with my ATAC analysis?

Thanks a lot and best regards,
Leander

Import Error when Importing on Python 3.9

The current layout of the pl.py file is importing tss_enrichment_plot as tss_enrichment and tss_enrichment_score_plot as tss_enrichment_score which is not a viable import name, I imagine this has slipped through some cracks as my previous contribution has been left unnoticed for a year.

File episcanpy/api/pl.py:20
     18 from ..preprocessing._quality_control import cal_var, variability_features
     19 from ..preprocessing._tss_enrichment import tss_enrichment_plot as tss_enrichment
---> 20 from ..preprocessing._tss_enrichment import tss_enrichment_score_plot as tss_enrichment_score

ImportError: cannot import name 'tss_enrichment_score_plot' from 'episcanpy.preprocessing._tss_enrichment' (./episcanpy/preprocessing/_tss_enrichment.

Is it safe to simply change these lines like this?

-from ..preprocessing._tss_enrichment import tss_enrichment_plot as tss_enrichment
-from ..preprocessing._tss_enrichment import tss_enrichment_score_plot as tss_enrichment_score
+from ..preprocessing._tss_enrichment import tss_enrichment_plot 
+from ..preprocessing._tss_enrichment import tss_enrichment

it does not seem to throw the error, testing is also successful, but I am still uncertain as I have no grasp over the code base and apparently the prior code also works.

Could someone fill me in how it was supposed to work?

'nb_features' and 'n_features', what is the difference?

To follow up after the hackathon, some participants found that after epi.pp.coverage_cells and epi.pp.filter_cells, there were 'nb_features' and 'n_features' added to AnnData. What is the difference?

Incompatibility between the old version of Anndata (0.6.22.post1) and the new one

EpiScanpy should be updated to use the new version of Anndata. The requirements of the package should be updated as well.

Bug in silhouette score calculation

silhoutte score calculation does nt work, there is a bug in the code, it should be labels not label

epi.tl.silhouette(adata,'louvain')

TypeError Traceback (most recent call last)
in
----> 1 epi.tl.silhouette(adata,'louvain')

/usr/local/lib/python3.9/dist-packages/episcanpy/tools/_silhouette.py in silhouette(adata_name, cluster_annot, value, metric, key_added, copy)
46
47 ## also, return sample_silhouette_values as adata.obs['silhouette_samples']
---> 48 silhouette_avg = silhouette_score(X=X, label=cluster_labels, metric=metric)
49 sample_silhouette_values = silhouette_samples(X=X, label=cluster_labels, metric=metric)
50

TypeError: silhouette_score() missing 1 required positional argument: 'labels'

It should be labels not label in silhouette_score and silhouette_samples

Problem building scATAC-seq count matrice from bams

When trying to follow the tutorial on creating scATAC-seq count matrices from bam files I encountered this issue in the second to last step:
epi.ct.bld_atac_mtx(list_bam_files=list_cells, loaded_feat=peaks, output_file_name='test_ATAC_mtx.txt', path=path_to_play_data, writing_option='w', header=peaks_names)

This yielded the following error log :

`AttributeError Traceback (most recent call last)
in
----> 1 epi.ct.bld_atac_mtx(list_bam_files=list_cells,
2 loaded_feat=peaks,
3 output_file_name='test_ATAC_mtx.txt',
4 path=path_to_play_data,
5 writing_option='w',

~/anaconda3/envs/bsh/lib/python3.8/site-packages/episcanpy/count_matrix/_atac_mtx.py in bld_atac_mtx(list_bam_files, loaded_feat, output_file_name, path, writing_option, header, mode, check_sq, chromosomes)
94 #for read in samfile.fetch(until_eof=True):
95 for read in samfile:
---> 96 line = str(read).split('\t')
97 if line[2][3:] in chromosomes:
98 keep_lines.append(line[2:4])

~/anaconda3/envs/bsh/lib/python3.8/site-packages/bamnostic/core.py in str(self)
462
463 def str(self):
--> 464 return self.repr()
465
466 def _range_popper(self, interval_start, interval_stop=None, front=True):

~/anaconda3/envs/bsh/lib/python3.8/site-packages/bamnostic/core.py in repr(self)
444 self.read_name,
445 "{}".format(self.flag),
--> 446 "{}".format(self.reference_name, self.tid),
447 "{}".format(self.pos + 1),
448 "{}".format(self.mapq),

AttributeError: 'AlignedSegment' object has no attribute 'reference_name'`

At first I suspected this error to come from the missing bam index files, however this is not the cause. Is this a familiar problem? Is there a quick fix to the problem?

Thanks for advice

Episcanpy is incompatible with scanpy 1.4.5post3

Hi Anna,

I just wanted to try out the shiny new episcanpy version. Unfortunately I found that it is incompatible with the current scanpy realease. This is because of conflicting requirements.

Episcanpy 0.1.8:

scipy<=1.3
h5py!=2.10.0

Scanpy 1.4.5.post3:

scipy>=1.3
h5py>=2.10.0

I wonder if the version pinning of episcanpy could just be adapted to the scanpy one or whether this actually requires updates to the code.

Would be great if episcanpy would soon be compatible with scanpy again :)

Episcanpy 0.1.8 should require pysam

Hi Anna,

when importing episcanpy 0.1.8, I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/app/python37/lib/python3.7/site-packages/episcanpy/__init__.py", line 25, in <module>
    from . import count_matrix as ct
  File "/app/python37/lib/python3.7/site-packages/episcanpy/count_matrix/__init__.py", line 6, in <module>
    from ..count_matrix._bld_atac_mtx import bld_mtx_fly
  File "/app/python37/lib/python3.7/site-packages/episcanpy/count_matrix/_bld_atac_mtx.py", line 5, in <module>
    import pysam
ModuleNotFoundError: No module named 'pysam'

Once I manually install pysam, this error disappears.
I guess pysam should be added to the requirements file of episcanpy.

How to generate count matrix in Episcanpy

Hi I am asking advice on how to upload the data into Episcanpy. I have access to the following four types of data for each ATAC sample:
atac-assembly-filtered-fragments-tsv-gz
atac-assembly-read-counts-gene-bodies-h5
atac-assembly-read-counts-per-windows-h5
atac-assembly-read-counts-per-region-h5

Which on should I use for generating count matrix? What other data do I need to upload before the actual analysis?

Thanks,
Tao

Remove python 2.7 tests

Some dependencies seem to require python >= 3.6/7, therefore it may make sense to remove the python 2.7 tests (which seem to fail), as that version is anyway no longer recommended.

To be updated: Building an ATAC-seq count matrices from bam file

https://nbviewer.jupyter.org/github/colomemaria/epiScanpy/blob/master/docs/tutorials/ATAC_bld_ct_mtx_tutorial.html

To update in the tutorial

epi.ct.bld_atac_mtx(list_bam_files=list_cells,
                    loaded_feat=enhancers,
                    output_file='test_ATAC_mtx.txt',
                    path=path_to_play_data,
                    writing_option='w',
                    header=enhancer_names)

output_file was update to output_file_name

Error running "epi.ct.save_sparse_mtx"

Hello, I get an error when I run your sample code.

sequence depth issues in algorithms of LSI, LSA, LDA, PCA for dimension reduction

Hi all,

The sequencing depth of single cells would be an important factor that may hinder true discovers like cell type identification, pseudo-time paths calculation etc. As far as I know, many scATAC tools (cistopic with LDA, signac with LSI, cellranger-atac with LSA, episcanpy with PCA) have difficulties to deduce a true dimension reduced clustering space without pre-filtering low-depth cells (correct me if I miss something).
However, in some cases, cells may perhaps indeed show less ATAC fragments (or low UMI transcription) for some biological reasons. Therefore how to precisely distinguish those cells from broken cells is a true challenging. There is very few information about this issue (one mentioned here stuart-lab/signac#106) and I think this is an important question and many researchers will be interested with it.
In my case, I compared the UMAP plots before and after removal of the first four dimensions (the first dimension are indeed correlated with sequence depth, I excluded the first four dims for safe), the shape of scatterplot looks similar and positions of cell clusters with low-depth (not too low, at least 3k fragments per cell after prefiltering) remain unchanged too much.
To summary my question, how to deal with cells with low depth to avoid false positive result but keep real cells? Any suggestions will be weIcome.

Help needed in preprocessing scATAC-seq

Hi,

I want to preprocess some scATAC-seq datasets properly. I have seen episcanpy tutorials. I do binarization, variant peak selection and use regress_out method but still get poor results. Would you suggest me what to do in order to get better results? I have uploaded two of my preprocessing notebooks here and here.

@DaneseAnna

build_mtx_fly tabix value error

I'm hitting another issue with build_mtx_fly() after overcoming the import issue.
i have built windows using the following function:
windows = epi.ct.make_windows(10000)

Building the feature matrix based on these windows unfortunately fails:

loading the barcodes
building the count matrix

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-4ae2376dfb31> in <module>
----> 1 epi.ct.bld_mtx_fly(paths['rawdir']+'fragments.tsv.gz', paths['rawdir']+'fragments.tsv.gz.tbi', windows)

~/.local/lib/python3.7/site-packages/episcanpy/count_matrix/_bld_atac_mtx.py in bld_mtx_fly(tsv_file, tbi_file, annotation, csv_file, genome, DATADIR, save)
    156     for tmp_feat in window_list:
    157         vector = [0]*nb_barcodes
--> 158         for row in tbx.fetch(tmp_feat[0], tmp_feat[1], tmp_feat[2], parser=pysam.asTuple()):
    159         #for row in tbx.fetch(start=tmp_feat[0], end=tmp_feat[1], region=tmp_feat[2], parser=pysam.asTuple()):
    160             line = str(row).split('\t')[-2]

pysam/libctabix.pyx in pysam.libctabix.TabixFile.fetch()

ValueError: could not create iterator for region '1:1-10000'

Error of version 0.2.0+7.g709bba8

episcanpy version 0.2.0+7.g709bba8
There is the error when import the library

>>> import episcanpy
/opt/miniconda3/envs/episcanpy/lib/python3.6/site-packages/scanpy/api/__init__.py:7: FutureWarning: 

In a future version of Scanpy, `scanpy.api` will be removed.
Simply use `import scanpy as sc` and `import scanpy.external as sce` instead.

  FutureWarning,
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/miniconda3/envs/episcanpy/lib/python3.6/site-packages/episcanpy/__init__.py", line 25, in <module>
    from . import tools as tl
  File "/opt/miniconda3/envs/episcanpy/lib/python3.6/site-packages/episcanpy/tools/__init__.py", line 12, in <module>
    from scanpy.api.tl import dpt
  File "/opt/miniconda3/envs/episcanpy/lib/python3.6/site-packages/scanpy/api/__init__.py", line 27, in <module>
    from . import pl
  File "/opt/miniconda3/envs/episcanpy/lib/python3.6/site-packages/scanpy/api/pl.py", line 1, in <module>
    from ..plotting._anndata import scatter, violin, ranking, clustermap, stacked_violin, heatmap, dotplot, matrixplot, tracksplot
ImportError: cannot import name 'stacked_violin'

Some configurations might be missing?

import Error

Hi! problem with importing episcanpy.api:

import episcanpy.api as epi

ModuleNotFoundError Traceback (most recent call last)
in
----> 1 import episcanpy.api as epi

c:\users\49176\anaconda3\envs\atac\lib\site-packages\episcanpy\api_init_.py in
8 from . import pp
9 from . import tl
---> 10 from . import ct
11 from . import pl
12

c:\users\49176\anaconda3\envs\atac\lib\site-packages\episcanpy\api\ct.py in
4 from ..count_matrix._atac_mtx import bld_atac_mtx, save_sparse_mtx
5 from ..count_matrix._load_met_ct_mtx import load_met_noimput
----> 6 from ..count_matrix._bld_atac_mtx2 import bld_mtx_fly

ModuleNotFoundError: No module named 'episcanpy.count_matrix._bld_atac_mtx2'

there is no _mtx2 file and when I removed it manually I got another error for importing bld_mtx_fly!
Would yo please check. I am using the latest version..

numpy>=1.21.2 requires python >= 3.7 but install docs show python 3.6

The installation docs here:

https://colomemaria.github.io/episcanpy_doc/installation.html

state python 3.5 or 3.6 needed. The example conda command explicitly install python=3.6.

However, the requirements.txt file has numpy>=1.21.2, which requires python 3.7 minimum.

How are feature groups grouped together (rank_feat_groups_matrixplot)?

Hi @DaneseAnna ,

now the rank_feature_groups_matrixplot is working however i don't quite understand the process of feature groups being grouped together.
Because most likely this:
epi.pl.rank_feat_groups(adata, feature_symbols='transcript_annotation')
groups features by their variability score right? Because i really can't make any sense of the '3 vs rest' or '10 vs rest' figures this command generates.
And then does it just create 14 groups of certain size with similiar high variability score?

And then these feature groups are written on top of the rank_feature_matrixplot, right. With the corresponding genes of features on the lower x axis.
To make more sense of this plot i specified grouby='cell_type' which at least made the y axis interpretable.

Thanks for advice,

Valentin

Error with normalize_total()

I'm running through the Buenrostro tutorial to test out using Episcanpy, but am hitting the following error when trying to normalize. I'd appreciate any pointers in what to do, please.

epi.pp.normalize_total(adata)

returns the following error:

FutureWarning:../lib/python3.9/site-packages/scanpy/preprocessing/_normalization.py:141: The `layer_norm` argument is deprecated. Specify the target size factor directly with `target_sum`.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_378952/1227047011.py in <module>
----> 1 epi.pp.normalize_total(adata)

~/lib/python3.9/site-packages/episcanpy/preprocessing/_scanpy_fct.py in normalize_total(adata, target_sum, exclude_highly_expressed, max_fraction, key_added, layers, layer_norm, inplace)
    426     """
    427 
--> 428     sc.pp.normalize_total(adata, target_sum, exclude_highly_expressed, 
    429         max_fraction, key_added, layers, layer_norm, inplace)
    430 

~/lib/python3.9/site-packages/scanpy/preprocessing/_normalization.py in normalize_total(adata, target_sum, exclude_highly_expressed, max_fraction, key_added, layer, layers, layer_norm, inplace, copy)
    203         after = None
    204     else:
--> 205         raise ValueError('layer_norm should be "after", "X" or None')
    206 
    207     for layer_to_norm in layers if layers is not None else ():

ValueError: layer_norm should be "after", "X" or None

Problems in importing the api after PyPI installation

Hello epiScanpy team,

First of all thanks for the great library!
I've just downloaded locally episcanpy with PyPI

pip install episcanpy

and I tried to import in a jupyter notebook

import episcanpy.api as epi

But I get the error:

ImportError                               Traceback (most recent call last)
Untitled-1.ipynb Cell 4 in <cell line: 5>()
      [3] import numpy as np
      [4] import pandas as pd
----> [5] import episcanpy.api as epi

File ~/my/path/to/python3.8/episcanpy/api/__init__.py:11, in <module>
      9 from . import tl
     10 from . import ct
---> 11 from . import pl
     13 from typing import Any, Union, Optional, Iterable, TextIO
     14 from typing import Tuple, List, ContextManager

File ~/my/path/to/python3.8/episcanpy/api/pl.py:20, in <module>
     18 from ..preprocessing._quality_control import cal_var, variability_features
     19 from ..preprocessing._tss_enrichment import tss_enrichment_plot as tss_enrichment
---> 20 from ..preprocessing._tss_enrichment import tss_enrichment_score_plot as tss_enrichment_socre

ImportError: cannot import name 'tss_enrichment_score_plot' from 'episcanpy.preprocessing._tss_enrichment' (/home/my/path/to/python3.8/episcanpy/preprocessing/_tss_enrichment.py)

and as you can see in line 20 of the file pl.py there is a typo "tss_enrichment_SOCRE" instead of "tss_enrichment_score" I guess.
I thought this would fix the ImportError, but there is more apparently.

Then tried the fix suggested in #130 and this apparently solves the ImportError.
But it looks like the user made a pull request that was also merged, so I am wondering why the fix was not part of the library download from PyPI.

Thanks for your help!

Vittorio

In DNA methylation tutorial, the data for matrices construction and secondary processing are not consistant.

Hi EpiScanpy team,

Thanks for such a powerful tool 👍

Recently, I'm working on using epiScanpy on single cell DNA methylation data by following your tutorials. I find one notebook for matrices preparation (link) and another for secondary processing (link)

However, see screenshots below, the data you provided for the first tutorial is based on promoters while it is based on enhancers for the second tutorial.

------------------------------------------------ Matrices preparation -------------------------------------------------------

------------------------------------------------ Secondary processing ------------------------------------------------------

So, do you know where I can find the consistent tutorials for single cell DNA methylation data? Or, I will appreciate it if you can share me the bed files you used to get the enhancer mCG anndata in the second notebook (link)

Thanks a lot,

Qian :)

Episcanpy still using scanpy.api

Hi Anna,
one more thing I noticed:

Episcanpy seems to still use scanpy.api This will soon break episcanpy when scanpy removes the api module as announced.

This FutureWarning appears when importing scanpy:

>>> import episcanpy
/app/python37/lib/python3.7/site-packages/scanpy/api/__init__.py:6: FutureWarning:

In a future version of Scanpy, `scanpy.api` will be removed.
Simply use `import scanpy as sc` and `import scanpy.external as sce` instead.

  FutureWarning,

colomemaria / episcanpy Goto Github PK

episcanpy's Introduction

EpiScanpy – Epigenomics single cell analysis in python

Documentation

episcanpy's People

Contributors

Stargazers

Watchers

Forkers

episcanpy's Issues

Dependency tree

Recommend Projects

Recommend Topics

Recommend Org